*** infobot has joined #maemo | 00:16 | |
*** ChanServ sets mode: +v infobot | 00:16 | |
*** Oksana_ has joined #maemo | 00:31 | |
*** Pali has quit IRC | 00:52 | |
brolin_empey | Maxdamantus: OK, thank you for the informative answer. I will try to get around to opening your two links. I only know how to write in languages that use an alphabet so was curious how text written in a language such as Chinese or Japanese that does not use an alphabet is sorted, assuming that it can be sorted. My friend from Beijing showed me how he uses an IME to write in Chinese on Android but I do not know enough about a language that does not use an | 01:04 |
---|---|---|
brolin_empey | alphabet to write in the language. | 01:04 |
brolin_empey | He said he thinks the stroke order or number of strokes, do not remember which one he said, is used for sorting text but at least one other person I asked said they do not think text written in Chinese can be sorted. I kept meaning to try using software that supports Chinese text to sort Chinese text to see what it does but I ran out of time then forgot about it or had other, higher priority things to do. | 01:08 |
*** Kilroo has joined #maemo | 01:11 | |
Maxdamantus | It will likely be ad-hoc to the writing system. I'm not sure how Chinese logograms work exactly, but in general I would expect a writing system to be made of a relatively small number of primitive concepts. | 01:25 |
Maxdamantus | eg, if you look at Hangul, you might have thousands of "characters", but each one is really just a combination of up to three primitive symbols denoting any start/middle/end sounds for a syllable. | 01:26 |
Maxdamantus | (Japanese kana are similar, but with the exception of "-n", their syllables all consist of one vowel, possibly preceded by a consonant, so only two primitive concepts in each glyph) | 01:28 |
L29Ah | no? | 01:28 |
Maxdamantus | and since that combination in Japanese kana only leads to around 50 symbols (5*10), it doesn't need to be as regular as Hangul. | 01:29 |
Maxdamantus | No what? | 01:29 |
L29Ah | ah nvm, for the ordering reason it's ok | 01:30 |
L29Ah | there's ゃ, ゅ and ょ to have a little fun with | 01:30 |
L29Ah | anyway though i don't see why don't you just grab unicode code points and be done with it | 01:31 |
Maxdamantus | Because Unicode code point ordering might not follow a well-understood pattern. It just depends on who designed the layout for that script in Unicode. | 01:43 |
Maxdamantus | Even in Latin-based scripts, you don't have that. An obvious example would be 'ı' in Turkish. | 01:43 |
Maxdamantus | or simply 'ü' in German. | 01:44 |
L29Ah | i think it can even change between languages using the same character set | 01:44 |
Maxdamantus | I imagine there are languages using Latin-based scripts that have orders that are inconsistent with English. | 01:45 |
Maxdamantus | also, I know that in Arabic there are at least two well-known orderings of letters (one starts with "alef, ba, gim, dal" like in Greek, the other starts with "alef, ba, ta, tha") | 01:46 |
Maxdamantus | and people use those different Arabic orders in different contexts. | 01:46 |
* enyc meows | 01:47 | |
brolin_empey | $ cat /dev/urandom >enyc | 01:47 |
*** inz has quit IRC | 02:04 | |
*** geaaru has quit IRC | 02:22 | |
*** Oksana_ is now known as Oksana | 02:23 | |
*** chainsawbike has quit IRC | 02:23 | |
*** chainsawbike has joined #maemo | 02:23 | |
*** FalconSpy has quit IRC | 02:25 | |
*** inz has joined #maemo | 02:27 | |
*** FalconSpy has joined #maemo | 02:28 | |
*** geaaru has joined #maemo | 02:34 | |
*** florian has quit IRC | 02:42 | |
*** xkr47 has quit IRC | 02:43 | |
*** xkr47 has joined #maemo | 02:55 | |
*** jskarvad has quit IRC | 03:00 | |
*** drathir_tor has quit IRC | 04:32 | |
*** tm has quit IRC | 04:33 | |
*** tm has joined #maemo | 04:36 | |
*** minicom has quit IRC | 05:17 | |
*** minicom7 has joined #maemo | 05:18 | |
*** pagurus has quit IRC | 06:58 | |
*** Kilroo has quit IRC | 06:58 | |
*** infobot has quit IRC | 07:25 | |
*** DocScrutinizer05 has quit IRC | 07:27 | |
*** DocScrutinizer05 has joined #maemo | 07:27 | |
*** infobot has joined #maemo | 07:38 | |
*** ChanServ sets mode: +v infobot | 07:38 | |
*** peetah has quit IRC | 08:05 | |
*** drathir_tor has joined #maemo | 08:50 | |
*** drathir_tor has quit IRC | 08:55 | |
*** drathir_tor has joined #maemo | 09:16 | |
*** Pali has joined #maemo | 09:40 | |
*** Pali has quit IRC | 10:05 | |
*** jskarvad has joined #maemo | 11:26 | |
*** xmn has quit IRC | 11:36 | |
*** florian_kc has joined #maemo | 11:42 | |
*** esaym153 has quit IRC | 11:50 | |
*** florian_kc is now known as florian | 12:19 | |
*** minicom7 is now known as minicom | 12:57 | |
*** esaym153 has joined #maemo | 14:08 | |
*** ab has quit IRC | 14:40 | |
*** peetah has joined #maemo | 14:41 | |
*** chainsawbike has quit IRC | 14:41 | |
*** chainsawbike has joined #maemo | 14:41 | |
*** ab has joined #maemo | 14:42 | |
*** ab has joined #maemo | 14:42 | |
*** Linkandzelda has quit IRC | 15:14 | |
*** Linkandzelda has joined #maemo | 15:17 | |
*** drathir_tor has quit IRC | 15:48 | |
*** drathir_tor has joined #maemo | 16:00 | |
*** drathir_tor has quit IRC | 16:00 | |
*** drathir_tor has joined #maemo | 16:05 | |
*** drathir_tor has quit IRC | 16:14 | |
*** florian_kc has joined #maemo | 16:23 | |
*** drathir_tor has joined #maemo | 16:35 | |
CcxWrk | You don't sort by codepoints, there's whole Unicode Collation Algorithm: https://www.unicode.org/reports/tr10/ | 16:55 |
L29Ah | > Siniform ideographs — most notably modern CJK (Han) ideographs — and Hangul syllables are not explicitly mentioned in the default table. Ideographs are mapped to collation elements that are derived from their Unicode code point value as described in Section 10.1.3, Implicit Weights. | 16:56 |
CcxWrk | Hm, even libicu pages on this seems to be full of TODOs http://site.icu-project.org/design/collation/script-reordering | 17:02 |
CcxWrk | Heh and the official document on Collation points to … PowerPoint file? :] | 17:05 |
CcxWrk | But no, we better focus on adding more emoji combinations /s | 17:05 |
KotCzarny | sticking to those funny chars is like keeping ebcdic around | 17:07 |
KotCzarny | sure, some legacy code uses it, but whole thing should be deprecated | 17:07 |
L29Ah | indeed, latin should be deprecated in favour of han | 17:08 |
KotCzarny | i think you've meant emojis | 17:10 |
L29Ah | nah emojis are ideographs like han, they're fine | 17:11 |
*** florian has quit IRC | 18:46 | |
*** drathir_tor has quit IRC | 18:47 | |
*** florian_kc has quit IRC | 18:49 | |
*** sedate has joined #maemo | 18:55 | |
*** sedate has quit IRC | 19:01 | |
*** drathir_tor has joined #maemo | 19:04 | |
*** Pali has joined #maemo | 19:45 | |
*** drathir_tor has quit IRC | 20:33 | |
*** drathir_tor has joined #maemo | 20:37 | |
*** xmn has joined #maemo | 21:08 | |
*** drathir_tor has quit IRC | 21:12 | |
*** drathir_tor has joined #maemo | 21:30 | |
*** drathir_tor has quit IRC | 21:37 | |
*** drathir_tor has joined #maemo | 21:54 | |
*** florian has joined #maemo | 22:11 | |
*** luke-jr has quit IRC | 22:22 | |
*** luke-jr has joined #maemo | 22:22 | |
*** akossh has joined #maemo | 22:50 | |
*** luke-jr has quit IRC | 23:10 | |
*** luke-jr has joined #maemo | 23:12 | |
*** luke-jr has quit IRC | 23:23 | |
*** luke-jr has joined #maemo | 23:24 | |
*** florian_kc has joined #maemo | 23:27 | |
*** florian has quit IRC | 23:28 | |
*** luke-jr has quit IRC | 23:45 | |
*** drathir_tor has quit IRC | 23:47 | |
*** florian_kc has quit IRC | 23:51 | |
*** luke-jr has joined #maemo | 23:55 | |
*** drathir_tor has joined #maemo | 23:56 |
Generated by irclog2html.py 2.15.1 by Marius Gedminas - find it at mg.pov.lt!