Language In Language Out: Natural Language Processing in the Context of Indigenous South African Languages
As a digital native who cannot imagine life without the conveniences of technology, Kgothatso Lephoko observed that of the 11 official languages spoken in her home country of South Africa, 9 indigenous languages are underrepresented in the technology used. This effectively limits access to 46 million people, representing 79% of the population. Common applications of natural language processing (NLP), broadly defined as how computers understand and communicate with human language, including spell checking, machine translation, search engines, chatbots, and voice interfaces such as Siri and Alexa.
Kgothatso’s thesis, Language In Language Out: Natural Language Processing in the Context of Indigenous South African Languages, explores the extent to which indigenous language speakers in South Africa are disadvantaged by technologies that exclude their languages and how design can be used to contribute to the development of more equitable tools to address this problem.
“Google’s responses to questions posed in indigenous languages ranged from hilarious to irrelevant.”
To address the problem, Kgothatso’s design interventions achieve four goals:
Challenge the status quo: highlight the marginalization of indigenous languages in NLP.
Enable participation: involve indigenous language speakers to create and develop NLP models.
Document representative datasets: leverage the value of indigenous languages by rewarding speakers who contribute to datasets.
Elevate indigenous languages: extend indigenous language use to the broader society and industry.
BrokenMic: An awareness campaign
To challenge the status quo and highlight the problem Kgothatso designed the BrokenMic campaign that invites South Africans to document and share the responses they get when they verbally ask Google questions in their home languages.
Google’s responses to questions posed in indigenous languages ranged from hilarious, for a question on cooking instructions, to irrelevant, for a question seeking advice on burns from boiling water—a potentially lifesaving situation.
This is in contrast to the responses provided to speakers of technology dominant languages, like English, who can ask any question—for example the location of their nearest hospital—and instantly find the relevant information.
The feedback from the indigenous language speaker participants, confirmed the problem. Kgothatso found it notable that some of them didn’t even expect their languages to be understood or represented at all.
Alpha Yaka: A suite of indigenous language learning products
Alpha Yaka is a suite of indigenous language learning products that enable parents to participate and contribute to the development of NLP datasets for applications that preserve and make these languages accessible. This suite of products—consisting of an app, an interactive alphabet pillow and an alphabet board—supports parents in teaching children their indigenous home languages.
The Alpha Yaka app contains a set of professionally recorded words in the official South African languages that parents can use as a reference. For each word, they can add a unique image and pronunciation. In addition, they can add their own words to the app, in particular, words which are unique to their vocabulary. They can also leverage on and add the words used by other parents in the Alpha Yaka community.
The Alpha Yaka interactive pillow, which is battery-operated, enables parents to tailor their child’s education and enhance their learning of the home language by using the app to transfer words that are associated with each letter. As the child presses a letter, they hear its associated word, its pronunciation and how it can be used in a sentence. In this way, the interactive pillow also adds a tactile component to the learning experience; as one parent Kgothatso interviewed said, “It could be a great complementary teaching aid for her son who has Down’s syndrome.”
The Alpha Yaka alphabet board is a culturally representative learning tool that reinforces and celebrates the child’s identity as they learn their home language. It’s visual design references patterns from South Africa’s 9 indigenous cultures and their unique clothing. Below is an alphabet board that is inspired by the Venda culture. The intention is that each language will have its unique board.
Chomi: An indigenous language, NLP dataset creation and translation platform
Chomi is a platform where indigenous language speakers can translate, ask and answer questions towards the development of representative NLP datasets.
In recognising the value of being able to speak and understand one’s home language, Chomi rewards indigenous language speakers for their contributions. As South Africans speak on average 2.8 languages, there is a significant potential user base of community members to respond to businesses, language researchers or anyone who may ask questions or require translation.
The impact of the platform extends beyond creating datasets to providing indigenous languages speakers an opportunity earn an income. This differentiates Chomi from Google and Facebook who don’t pay participants for the effort put into improving their NLP technologies.
Pronounc: A language learning platform for indigenous languages
Pronounc is a language learning platform that elevates indigenous languages of South Africa through teaching their pronunciation using speech recognition and computer vision. Upon searching for or selecting a word, the language learner can both see and hear an indigenous speaker’s pronunciation. This app is unique in that it references words and/or sounds that are similar to the learner’s home language.
In addition, the platform partners with retailers and organizations to provide language learners with opportunities to use their new language skills.