?

Log in

No account? Create an account

Previous Entry | Next Entry

The results from yesterday's poll

Thanks to the 97 people who have so far answered yesterday's scripts poll. The results are rather neatly bunched into four groups: 14 could be seen by at least 83 of the 97; there were a further 8 in the 47-62 range; there is a cluster of 3 which could be seen by 15-21 people; and one outlier which only I could see when logged in using Firefox.

95 Δεκέμβριος Greek (10-20 mn)
94 Декабрь Russian (100-200 mn)
92 ธันวาคม Thai (50-100 mn)
92 ديسمبر Arabic (200-500 mn)
92 דצמבר Hebrew (5-10 mn)
90 दिसंबर Hindi (200-500 mn)
89 டிசம்பர் Tamil (50-100 mn)
89 Դեկտեմբեր Armenian (5-10 mn)
88 ਦਸੰਬਰ Punjabi (50-100 mn)
88 ડિસેમ્બર Gujarati (50-100 mn)
85 クリスマス Japanese (100-200 mn)
84 聖誕節 Chinese (over 1 bn)
83 크리스마스 Korean (50-100 mn)
83 დეკემბერი Georgian (2-5 mn)

Taking it for granted that everyone could see the Latin alphabet clearly, this list includes the correct scripts for nine of the world's ten languages with most speakers, and 23 of the top 25. I'll address the missing languages when I get to them; the odd inclusions here are Greek and Hebrew (understandable for cultural reasons), Armenian and Georgian (which despite their small number of native speakers are geographically convenient to the massive information technology hub of Russia, and also relatively easy to code) and Thai, which quite probably says something abut the relative openness of Thailand compared to some of its neighbours.

62 ডিসেম্বর Bangla (100-200 mn)
61 డిసెంబర్ Telugu (50-100 mn)
61 ಡಿಸೆಂಬರ್ Kannada (20-50 mn)
61 ഡിസംബര്‍ Malayalam (20-50 mn)
60 ޑިސެމްބަރު Divehi (200,000-500,000)
58 ܟܢܘܢ ܐ Aramaic (2-5 mn)
54 ᑎᓯᒻᐳᕆ Inuit (20,000-50,000)
47 ᏓᏂᏍᏓᏲᎯᎲ Cherokee (20,000-50,000)

Actually these eight subdivide pretty clearly into three groups. Bangla, Telugu, Kannada and Malayalam are South Asian scripts which somehow have not achieved the penetration that their number of speakers would have suggested. This is particularly striking for Bangla which unlike the other three is the sole official language of a sovereign state. Divehi (which is the official language of the Maldives) and Aramaic may not be obvious partners, but in fact both scripts are related to Arabic, so if you have coded for one you may as well code for the other. Inuit and Cherokee are the two least-spoken languages on the entire list, and I suspect that their alphabets may not be all that widely used even by native speakers (Latin transcription of both languages is fairly common), but like Georgian and Armenian they have the advantage of being relatively easy to code and on a convenient continent for coders.

21 ዲሴምበር Amharic (10-20 mn)
18 දෙසැම්බර් Sinhalese (10-20 mn)
15 បុណ្យណូអែល Khmer (5-10 mn)

The Ge'ez script is used for Tigrinya as well as Amharic, so may need to be bumped up a population category; notably it is the only indigenous African script in the list. All three of these score rather lower than one would expect for the official language of a sovereign state (two sovereign states if one counts Eritrea as well as Ethiopia).

1 ဒီဇင်ဘာ Burmese (10-20 mn)

Isn't that shocking? Burmese script is not easy for us alphabet-users, but really is no more difficult than the other South Asian and South-East Asian scripts. I would be interested to know more about the politics and policies which have put Thai so far ahead and Burmese so far behind compared with their neighbours. You may remember that Cory Doctorow's book Little Brother is to be translated into Burmese, Karen (which also uses Burmese script), Shin and Kachin; I think this survey rather illustrates why that is a good idea.

Tags:

Comments

( 11 comments — Leave a comment )
bohemiancoast
Dec. 18th, 2009 07:16 am (UTC)
The operating system I'm using was produced in North America; while I understand its reach is global I don't think it particularly strange that native North American languages used by very few people have better support than Asian languages used by many. Consider the amount of support we provide for Welsh in the UK, despite the fact that there are many, many languages that have more UK native speakers than Welsh.

Now, what would be really interesting is if more of us could read Klingon than Burmese. I wouldn't be surprised.
redfiona99
Dec. 18th, 2009 10:19 am (UTC)
Because I'm in Leicester Uni, I'm really quite peeved that we don't support Bangla given that we have quite a few students either from Bangladesh or of Bangladeshi descent. I may well go on the hunt to see if it's just this computer or all of the ones on campus.
nickbarnes
Dec. 18th, 2009 10:36 am (UTC)
For most of us this just reflects the fonts which came with our computers, or with the random bits and pieces we have installed on them. Which in turn will just be determined by a mix of inertia, industrial politics, personal interests, and cock-up.
I don't see any tremendous advantage to being able to view Burmese (or, say, Hindi) on my computer. I can't read it at all, and it's unlikely that I ever will. It would be nice if I had complete Unicode fonts, but more from a sense of aesthetic completion - and to allow testing my code, especially outside the BMP - than any need to read or write Phoenician (or Aramaic, soon).
gareth_rees
Dec. 18th, 2009 11:40 am (UTC)
I was puzzled initially by "Thai (10-20 mn)". Surely there are more Thai speakers than that? But now I see that your data comes from the Ethnologue, which is counting first-language speakers, and makes fine language distinctions. The 20 million figure is for first-language speakers of Central Thai. Speakers of several mutually intelligible languages (Northern Thai, Southern Thai, Northeastern Thai, etc) are not counted.

But the Ethnologue is counting speakers, but you're writing about scripts, so you might want to include speakers of some these other languages, which also use the Thai script.
xipuloxx
Dec. 18th, 2009 01:44 pm (UTC)
I think there's something odd about my results. For the most part, they're pretty predictable, as I can see all those scripts except the last 6...

...except that I can't see Japanese, Chinese or Korean either. I'd be inclined to say that my software just doesn't recognise Far-Eastern scripts, but I can see Thai.

I'm not using anything weird, btw; just whatever came with my Windows XP computer / Firefox (version 3.5.6).
abigailb
Dec. 18th, 2009 01:54 pm (UTC)
Thai is encoded in Unicode in a way which is really easy to render: there is no different ordering of glyphs compared to the characters as encoded. This is in itself because of political reasons (Unicode adopted the model used by a Thai standard, which came about early). This is not true of most of the other south and south-east asian scripts. (Only Lao adopts the Thai model rather than the ISCII model, I believe).
nwhyte
Dec. 19th, 2009 10:54 am (UTC)
That is really helpful, thanks. Is there any easily accessible written account of how this all happened?
abigailb
Dec. 19th, 2009 11:43 am (UTC)
Not that I'm aware of, unfortunately. I think the fact that Thailand was never colonised by Europeans may have been related: it meant that Thailand didn't have a colonial administrative language it could use for computers, they had to have it there and then; whereas India always has the option of just using English, and in ISCII wasn't satisfying an immediate need so much as mapping out a future. There are other reasons why Thai was well-suited to be treated like it was: it can look reasonable in a monospaced font (can you imagine Burmese in one?), and it doesn't do conjuncts in the same way. So there were a whole host of circumstances all favourable for use of Thai script on computers...

Also: just because people are reporting seeing glyphs in the Indic scripts - or the right-to-left-scripts - doesn't mean that they're seeing the right glyphs in the right order.
nwhyte
Dec. 19th, 2009 11:50 am (UTC)
Good points. I wish I knew more about that part of the world; I read up on Burma when we started covering it at work, and have absorbed a bit about Bangladesh from relatives and India from cursory reading, but know very little about places further to the east (even though my father was born in Penang).

Your second point sent a chill down my spine; entirely possible that I have been measuring the wrong thing! Next time, I shall provide image files as well for people to compare with the glyphs they can see. Certainly I know that my Blackberry tends to put Arabic the wrong way round, if it sees the letters at all.
abigailb
Dec. 19th, 2009 12:06 pm (UTC)
I know very little of these lands apart from their scripts: back in a previous life, I worked on linux support for them.

I would suggest a good test case for the Indic scripts is to have a consonstant cluster that forms a conjunct followed by an 'i'. For example "ksi": क्षि

The 'i' is formed by the ि part, which should appear to the left of the other glyph, even though it is encoded (and typed) afterwards. The other glyph is a special one for "ks". It shouldn't look like a k (क) and a ष (s) just after each other.

The other wrinkle is that consonants have an implicit "a" after them, unless they are followed by a vowel. If you want to have a consonant cluster or a consonant at the end of the word, you need a virama ( ् ) to suppress the vowel. This resembles a a little grave accent attached to the bottom of the letter. Rendering algorithms are supposed to pick up the sequence of ka-virama-sa and use the special sign for "ksa". It gets even more confusing when you start wondering what should happen to the cursor.
mscongeniality
Dec. 18th, 2009 05:40 pm (UTC)
I have to admit that I've never really thought of Aramaic in relation to Arabic, I mainly think of it in relation to Hebrew. Probably because of its use in Jewish liturgical texts and prevalence in yeshivot.
( 11 comments — Leave a comment )

Latest Month

January 2018
S M T W T F S
 123456
78910111213
14151617181920
21222324252627
28293031   

Tags

Powered by LiveJournal.com
Designed by yoksel