The Plausibility of Estimating Language Diversity on the Internet through Wikipedia Projects

Liao, Han-Teng (2009) The Plausibility of Estimating Language Diversity on the Internet through Wikipedia Projects. In: Proceedings of the WebSci'09: Society On-Line, 18-20 March 2009, Athens, Greece. (In Press)

Estimating the language diversity on the Internet through Wikipedia projects provides other benefits that go beyond easy and updated research. This article argues that Wikipedia projects has generally alleviates some of the language bias embedded in computer systems that the Friedman and Nissenbaum (1995, 1997) has identified as pre-existing, technical, and emergent biases. First, since Wikipedia projects are maintained by volunteers and charity organizations, they are often less influenced by pre-existing major corporations and states compared to other websites. Under-represented ethno-linguistic groups do not need governmental approval or industry attention in order to start a language version, thus avoiding unnecessary language/dialect politics and market influence. Second, because Wikipedia projects adopt the Unicode, the international encoding standard that aims to accommodate all languages and scripts on the same webpage, they contains less technical biases than those websites that favor certain languages. It also opens up further research opportunities to understand the “relatedness” or distances between languages by studying the “mixed” pages and cross-language links. Third, since each language version of Wikipedia is self-governed by its respective community, the emergent bias that the English Wikipedia’s model are reapplied to other language versions such as German, French or Chinese are limited. For all these plausible arguments, it is speculated that the estimation of language diversity through Wikipedia projects, though may not be representative after all, actually demonstrates what is possible and thus what could be done to make it possible. In conclusion, researchers should take Wikipedia projects seriously as a major force in multilingualizing Internet and exploit their open data to contribute to the global profile of language statistics. Keywords: linguistic diversity, Internet statistics, webometrics References Friedman, B., & Nissenbaum, H. Bias in computer systems. In Batya Friedman (Ed.), Human Values and the Design of Computer Technology. Stanford, California. Cambridge ; New York,: Cambridge University Press. Gerrand, P. (2007). Estimating Linguistic Diversity on the Internet: A Taxonomy to Avoid Pitfalls and Paradoxes. Journal of Computer-Mediated Communication, 12(4), 1298-1321. doi: 10.1111/j.1083-6101.2007.00374.x. MAAYA. (2008). Dynamic Coalition for Linguistic Diversity. The World Network for Linguistic Diversity . Retrieved October 31, 2008, from Paolillo, J. C., & Das, A. (2006). Evaluating Language Statistics: The Ethnologue and Beyond. UNESCO.

