Log in

No account? Create an account

Previous Entry | Next Entry

The results from yesterday's poll

Thanks to the 97 people who have so far answered yesterday's scripts poll. The results are rather neatly bunched into four groups: 14 could be seen by at least 83 of the 97; there were a further 8 in the 47-62 range; there is a cluster of 3 which could be seen by 15-21 people; and one outlier which only I could see when logged in using Firefox.

95 Δεκέμβριος Greek (10-20 mn)
94 Декабрь Russian (100-200 mn)
92 ธันวาคม Thai (50-100 mn)
92 ديسمبر Arabic (200-500 mn)
92 דצמבר Hebrew (5-10 mn)
90 दिसंबर Hindi (200-500 mn)
89 டிசம்பர் Tamil (50-100 mn)
89 Դեկտեմբեր Armenian (5-10 mn)
88 ਦਸੰਬਰ Punjabi (50-100 mn)
88 ડિસેમ્બર Gujarati (50-100 mn)
85 クリスマス Japanese (100-200 mn)
84 聖誕節 Chinese (over 1 bn)
83 크리스마스 Korean (50-100 mn)
83 დეკემბერი Georgian (2-5 mn)

Taking it for granted that everyone could see the Latin alphabet clearly, this list includes the correct scripts for nine of the world's ten languages with most speakers, and 23 of the top 25. I'll address the missing languages when I get to them; the odd inclusions here are Greek and Hebrew (understandable for cultural reasons), Armenian and Georgian (which despite their small number of native speakers are geographically convenient to the massive information technology hub of Russia, and also relatively easy to code) and Thai, which quite probably says something abut the relative openness of Thailand compared to some of its neighbours.

62 ডিসেম্বর Bangla (100-200 mn)
61 డిసెంబర్ Telugu (50-100 mn)
61 ಡಿಸೆಂಬರ್ Kannada (20-50 mn)
61 ഡിസംബര്‍ Malayalam (20-50 mn)
60 ޑިސެމްބަރު Divehi (200,000-500,000)
58 ܟܢܘܢ ܐ Aramaic (2-5 mn)
54 ᑎᓯᒻᐳᕆ Inuit (20,000-50,000)
47 ᏓᏂᏍᏓᏲᎯᎲ Cherokee (20,000-50,000)

Actually these eight subdivide pretty clearly into three groups. Bangla, Telugu, Kannada and Malayalam are South Asian scripts which somehow have not achieved the penetration that their number of speakers would have suggested. This is particularly striking for Bangla which unlike the other three is the sole official language of a sovereign state. Divehi (which is the official language of the Maldives) and Aramaic may not be obvious partners, but in fact both scripts are related to Arabic, so if you have coded for one you may as well code for the other. Inuit and Cherokee are the two least-spoken languages on the entire list, and I suspect that their alphabets may not be all that widely used even by native speakers (Latin transcription of both languages is fairly common), but like Georgian and Armenian they have the advantage of being relatively easy to code and on a convenient continent for coders.

21 ዲሴምበር Amharic (10-20 mn)
18 දෙසැම්බර් Sinhalese (10-20 mn)
15 បុណ្យណូអែល Khmer (5-10 mn)

The Ge'ez script is used for Tigrinya as well as Amharic, so may need to be bumped up a population category; notably it is the only indigenous African script in the list. All three of these score rather lower than one would expect for the official language of a sovereign state (two sovereign states if one counts Eritrea as well as Ethiopia).

1 ဒီဇင်ဘာ Burmese (10-20 mn)

Isn't that shocking? Burmese script is not easy for us alphabet-users, but really is no more difficult than the other South Asian and South-East Asian scripts. I would be interested to know more about the politics and policies which have put Thai so far ahead and Burmese so far behind compared with their neighbours. You may remember that Cory Doctorow's book Little Brother is to be translated into Burmese, Karen (which also uses Burmese script), Shin and Kachin; I think this survey rather illustrates why that is a good idea.



Dec. 19th, 2009 12:06 pm (UTC)
I know very little of these lands apart from their scripts: back in a previous life, I worked on linux support for them.

I would suggest a good test case for the Indic scripts is to have a consonstant cluster that forms a conjunct followed by an 'i'. For example "ksi": क्षि

The 'i' is formed by the ि part, which should appear to the left of the other glyph, even though it is encoded (and typed) afterwards. The other glyph is a special one for "ks". It shouldn't look like a k (क) and a ष (s) just after each other.

The other wrinkle is that consonants have an implicit "a" after them, unless they are followed by a vowel. If you want to have a consonant cluster or a consonant at the end of the word, you need a virama ( ् ) to suppress the vowel. This resembles a a little grave accent attached to the bottom of the letter. Rendering algorithms are supposed to pick up the sequence of ka-virama-sa and use the special sign for "ksa". It gets even more confusing when you start wondering what should happen to the cursor.

Latest Month

April 2019


Page Summary

Powered by LiveJournal.com
Designed by yoksel