Tuesday, June 26, 2012

Life and Death of Written Languages

First: a quiz: how many "primary written languages" are there in the world? By "primary written languages" I don't mean languages that merely have an orthography (writing system) developed in academia - I mean languages which are truly written in the real world - in other words, languages which their native speakers (or other nationals of their respective countries) use as their language of choice for reading and writing. Examples: English, Amharic, Bahasa Indonesia, Japanese. Examples of languages that are widely used verbally, but their native speakers do not use them as their primary language for writing: Swiss Deutsch, Hausa, Balinese, Zulu.

So how many written languages are there in the world? Ok, some stats to help you guess: there are about 6,000-7,000 living languages in the world today (depending on how you define "language") and there are 200 or so countries in the world (depending on how you define "country"...)

Your guess as to how many of these are written languages?

Wrong! The correct number is about 80. Yes, that's right, only 80 of the 6,000 or so living languages are used for writing. This includes all the written languages that originated in Europe (and are now the primary written languages in all of South, Central and North America, all of Oceania, almost all of Africa, and much of South Asia), only five non-European languages that are primary written languages in Africa (Amharic, Tigrinya, Somali, Kiswahili and Afrikaans), four in the Middle East (Hebrew, Arabic, Farsi and Turkish), the national languages of the countries South East Asia, Central Asia and East Asia, the regional languages of South Asia, and a very few other ones. If you count them (like I did), you reach about 80.

Before you jump in protest, yes, I have seen a few children books in Yoruba and Lingala, a couple of websites in Shona, one newspaper in each of Luganda, Zulu and Kikuyu, street signs in Dzongkha et centra. I am close with the passionate regional language promoters in Indonesia, Philippines, DR Congo, Bhutan, Laos, Nepal and others with whom I frequently interact and deeply admire. However, for the purpose of this discussion, I don't count these as "primary written languages" because their native speakers most of the time use another language for writing purposes (English, English, French, English, English, English, Bahasa Indonesia, Filipino/English, French, English and English respectively). I will get back to that in a sec.

The disappearance of verbal (aka vernacular) languages is a hot topic which gets a lot of media and academic attention under the title "endangered languages" - so let me address this first, before talking about written languages. Basically, the way I see it, there are two linked underlying causes for the acceleration of the mass disappearance of languages we are seeing in the past decade - and both are brand new: the first is telecommunications and the Internet - the main reason for the survival of language diversity in the first place is regional isolation of ethnic communities in remote locations. This is the reason DR Congo has 215 languages, Indonesia has 719 languages, and Papua New Guinea, population 6.8M, has 830 languages - more than any other country. The good news or bad news, depending on your point of view, is that regional isolation is so 80s. It's gone. With mobile phones, Internet, social networks and satellite TV all proliferating even in the most remote corners of the planet, there are hardly any people remaining today that are not exposed on a daily basis to the communication coming out of- and/or with the cities, communication that is most often in the dominant national or regional language. As people migrate to the cities, inter-marry, and get educated in schools using the national language, a growing number of them abandon their ethnic language for day-to-day speaking and don't pass it on to their children.

The second reason for the disappearance of verbal languages, is that in the past decade, again due to the influence of the Internet and the resulting global flow of ideas, most countries in the world have become significantly more socially liberal, which of course, is a good thing in general (in my opinion anyway). One of the aspects of this trend, is a higher willingness of mainstream society and its institutions, to include and accept ethnic minorities, reducing explicit and implicit discrimination. In reaction to these new opportunities to belong to the mainstream, many young people of a minority background are happily dumping their minority identity, and adopting the national identity instead as their primary identity. For example, since I am married to a Cambodian, I have recently noticed how very rapidly many of 1.5M people of the Khmer minority in the Isan region of Thailand, which maintained their distinct ethnicity while living as a minority in the midst Thai and Lao people for the past 600 years, are now rushing to become all-Thai, and their children can no longer speak Khmer. The same is also true for the 1.5M Khmer minority in Vietnam and also a much smaller group of Khmer speakers in southern Laos.

Now to written languages - while verbal languages evolve very fast, and become extinct as well as newly born (see for example the recent emergence of the Sheng language in Nairobi), verbal languages transform into written ones very, very rarely. In fact, the only cases I can think of in the past 100 years, in which a language gained significant new populations that adopted it as a primarily written language, other than through colonization, are Kiswahili in Tanzania, Bahasa Indonesia, Filipino and Hebrew in Israel. Unlike the spontaneous grassroots emergence of verbal languages - In all these cases of written languages, these languages gained new populations because of a highly centralized, ideologically-motivated campaign to unify a nation under a single language for writing purposes. In addition, all these languages were primary written languages for centuries (except for Hebrew, that was used only for religious purposes), but of much smaller populations.

Now, after this long background, I would like to introduce the concept of the "strength" of individual written languages, and point out the risk that some specific weaker languages, which I will discuss, will fall off the list of written languages very, very soon (within a decade or two). I would also like to point out the direct relationship between the strength of a language and the availability of Internet content in that language.

My hypothesis is actually very simple - people read a hundred times more text than they write (or maybe it is 2,865 times more, I don't know, let's just say a hundred because it is a round number). Or putting it on its head - people write 1/100 the volume of text that the read in a particular language. In particular, if they don't read anything in the language, they also don't write in the language. If you look at the 80 languages that are primary written languages, every single one of them has a large volume of highly-read written content - newspapers, websites etc. Dictionaries and children text books don't count as highly-read content. Let me ignore the rules of statistics for a minute and assume cause-and-effect here, and make the bold claim that unless there is a large volume of content to be read in the language, a language will not become a written language, or will cease to be one if it is currently a written language. This is a chicken-and-an-egg problem of course. You will say, hey, that's not fair, how can we create a large volume of content before a lot of people read and write in the language? Well, exactly. Here you go. This is precisely why no language in the past century has succeeded in becoming a written language except with massive government intervention.

Moreover, I will submit to you, that in the future offline content will almost not count - the strength and survivability of a language as a written language will depend entirely on the volume of online content in that language. When people become addicted to their smartphones, tablets etc, they are actually addicted to consuming a stream of fresh content 24 hours a day - news from multiple sites, blogs, social networking feeds, random information etc. Right now, only a few of us are as addicted as I am, but as devices and data plans plummet in price, it is inevitably going to become the norm rather than the exception. In this new world, the volume of information we will need to keep our addiction going will have to be produced by many thousands, if not millions, of people in parallel. Of that, the volume information produced by the handful of people that work for traditional newspapers will be negligible. In other words, for a language to survive as a written language, thousands of people must be posting interesting content online every day in my language, so that my content consumption bandwidth is saturated with content in my language. If I have too much bandwidth left, I will start looking at content in secondary languages I understand, and pretty fast they will become my primary language for content creation. By the way, did you notice that this blog is in English even though my native language is Hebrew? Its because gradually over time my primary written language switched from Hebrew to English, because there is just not enough fresh content in Hebrew published every day on the Internet to satisfy my content consumption addiction.

Finally, let's look at some weaker languages that are at an inflection point right now, and see where they are going:

(1) Kinyarwanda (national language of Rwanda) and Malagasy (national language of Madagascar) - the various governments of these countries over the years have had inconsistent policies regarding emphasizing the national languages vs. English or French. The situation is that most of the literate population can read these languages, but there is almost no content out there, resulting in the primary written language, and hence almost all written content being in English or French. There's still an opportunity for a U-Turn for the government and going in full force with these national languages, but it doesn't seem to be happening, therefore I predict that these languages will never turn into primary written languages, and that literacy rates in these languages will go down dramatically over time.

(2) Chichewa (national language of Malawi) and Setswana (national language of Botswana) - there seems to be a growing sentiment among the young generation to adopt these languages for writing. Clearly, today the primary written language in both these countries is English. However, if this sentiment becomes stronger, we might see a reversal in fortune. This will need massive government support, as well as a large number of skilled content creators, so I think that the chances are somewhat low, but I will keep my fingers crossed.

(3) Dzongkha (national language of Bhutan) - While this is certainly the lingua franca vernacular language, the sentiment among the young generation it to move away and use English for writing purposes. No reversal of fortune is in sight.

(4) Lao (national language of Laos) - Lao people have highly nationalistic feelings when it comes to their language, in particular in light of the historical animosity with Thailand (Thai and Lao languages are fairly similar, and Thailand's population is 11 times bigger). However, in practice, a large majority of all television programming, music, books and magazines are imported from Thailand. When you sit in a coffeeshop in Vientiane, you notice the young and educated Lao elite, exactly the ones that are supposed to be the content creators, reading Thai websites, and creating Facebook content - in Thai. Lao is still a strong language, and all locally created offline content in Laos is in Lao - but the warning is written on the wall. There is an urgent need to encourage much more content creation by the average Lao Internet user (not only professional journalists) in Lao language - or else as all written content moves online and offline content because negligible - a process bound to happen within the coming five years - Lao risks losing its status as a written language and becoming a verbal language only.

(5) Hebrew (my own native language) - considering Israel is a country of only 8M people (of which only half are native Hebrew speakers), Hebrew has a disproportionally large amount of content - both online and offline. Still, it is a relatively small language compared to the ocean of English out there. Indeed, I am often shocked when I realize that so many people, like me, have moved to write mostly in English - even when writing to fellow Hebrew speakers. The reason we write in English is because we read mostly in English. Is the decline of Hebrew as a written language inevitable? I certainly hope not, but I am very worried.


  1. Divon,

    Great post! I have always been wondering why I prefer writing in English while I speak in Hebrew.
    Makes much more sense now.

  2. "for example, since I am married to a Cambodian".
    I know you started the blog for this line alone :)

    Seriously speaking, some thoughts:
    1. Uzbeks changed their alphabet thrice over the last century (first it was Roman, then Cyrillic , now Arabic), rendering the entire adult population illiterate every time. But since the written language did not play a central role in people's communication, nobody seemed to care.
    2. there always is the gap between the spoken and the written language. Do you think the bigger the gap the *weaker* the language?

  3. As far as I am concerned there are only a few unique writings (around 14 or at lease under 20, according to some sources)in the world: I'll try to remember most: Latin, Urdu, Punjabi, Hebrew, Persian, Arabic, Georgian, Armenian, Greek, Chinese, Cyrillic.