Awal, the popular project that wants to make the internet speak Amazigh

A snapshot of the Awal Linguistic Marathon, at the CIEMEN headquarters in Barcelona.
A snapshot of the Awal Linguistic Marathon, at the CIEMEN headquarters in Barcelona.
Catalonia is home to several minoritised languages. As is well known, there are two non-hegemonic languages traditionally spoken in the country—Catalan and Occitan. Recent migrations have brought in groups of minoritised language speakers. The largest of these is Amazigh, a language spoken in North Africa, where it has at least 20 million speakers, although it is in decline. In Catalonia, there are more than 100,000 Amazigh speakers. Their perseverance generates materials for learning and using the language. The latest example is the Awal project.

The issue dates back a long way. 15 years ago, the Amazighs of Catalonia managed to get the Catalan Government to provide a handful of extracurricular classes in some schools so that the Amazigh language and culture could be studied. Those classes are run by Casa Amaziga de Catalunya, an organisation which, with CIEMEN’s collaboration, developed its own Amazigh teaching units, designed for use in Catalan classrooms. The result was the “Tc wawjḍm!” method, freely available online. As part of the same collaboration, the Gramàtica amaziga by Carles Múrcia, one of the world’s leading experts on Amazigh, who was also the coordinator of “Tc wawjḍm!”, was released on paper.

But that was not enough. What about speakers’ needs in the digital world? To answer this question, three years ago CIEMEN and Casa Amaziga de Catalunya started working together with a Catalan cooperative specializing in language technologies, Col·lectivaT. And, with the support of several Amazigh activists, they launched the Awal Digital project.

Awal is yet another example of how the internet has become a refuge for many minoritised languages, in a seemingly paradoxical phenomenon: at the same time that the worldwide network is a vehicle for the strength and power of hegemonic languages on a global scale, it is also providing avenues for the digital development of a good number of subordinate languages.

This includes several aspects. On the one hand, speakers of minoritised languages who live dispersed over a relatively large territory have the opportunity to “meet” each other virtually and use the language in social networks. Occitan or Aragonese are clear examples in Western Europe. On the other hand, the internet offers possibilities for generating content in minoritised languages. It also allows for accessing them more quickly and massively than in the analogue setting. Finally, a whole series of tools—language technologies—can be developed on the internet that can contribute to preserving minoritised languages differently than has traditionally been done.

Automatic translator and voice recognition

The Awal Digital project develops in this latter area: technologies. It focuses on two aspects. The first is automatic translation. The second is voice recognition. Col·lectivaT explains in a note on the Awal project:

“To make the internet understand and speak the Amazigh language, the project focuses on collecting written and oral data from Amazigh speakers, i.e. fragments of text and voice recordings, which are then systematized and used to create applications and digital tools such as automatic translators and virtual assistants. Through Awal’s website, contributors can make two types of contributions: translating sentences to or from Amazigh in their own words, or reading aloud the sentences that appear on screen and recording them from the Common Voice platform.”

Awal also incorporates gamification logic to encourage popular participation:

“To make contributions, participants will have to create an account and earn points with their contributions and validations. Contributions will then be shared with open licences to promote open source language technology in Amazigh. The participation ranking on the website will update the number of points accumulated by the most active contributors in the community.”

The mass mobilization of Amazigh speakers and their contribution of linguistic knowledge is key to the initiative’s success. Awal is a pioneering project, because to date there has been no other similar one that addresses the linguistic needs of the Amazigh. However, it is more than that: it is a popular and participatory project that can only succeed to the extent that the Amazigh-speaking world, and especially the Catalan Amazighs, become involved..

Contributions to the Awal website can be done in different languages. In the image, the Amazigh version, written in the Tifinagh alphabet.

A linguistic marathon

For this reason, CIEMEN, the Casa Amaziga de Catalunya and Col·lectivaT have organized the first linguistic marathon ever recorded for the Amazigh language on 17 February. This is very significantly on a date halfway between Amazigh New Year and International Mother Language Day. The aim was to collect linguistic data, yes, but also to weave community ties and strengthen them. Language is fundamental to the Amazigh people’s identity. Connection and community are linked realities. Ghizlan Baryala, one of the Catalan-Amazigh activists working on the Awal project, explains:

“For the Amazigh community, it is vital to talk about the importance of language. Language is identity, and even more so in our case, since almost everything that is known, and we know about us, the Amazighs, has been transmitted orally. This project, Awal, materializes from a real need to connect us with a language that was considered lost a long time ago.”

A similar perspective can be found in linguist Farida Boudichat, another Catalan-Amazigh activist of the Awal project:

“Language is part of identity and has substantial weight in culture maintenance, it is a fundamental pillar. Moreover, digitizing it will make it last and prevent it from being blown away by the wind. This is because it is primarily an oral language. It is also pertinent to highlight that digitization provides tools that will allow members of the Amazigh community to communicate both among themselves and with the world in their own language, which gives them an enhanced degree of linguistic and cultural autonomy.”

Participation in a language project like this has both collective and personal dimensions. It touches on family roots and the links of each woman and man to her or his community and the environment. Farida Boudichat says:

“For me, as a linguist and Amazigh, it is an honour and a joy to be able to merge the languages I speak and the cultures that are part of me in the same project: I feel that it really represents me. I also feel that I am contributing to fulfilling the desire and long-term struggle of a whole community: to preserve, maintain and expand the language. I am very proud.”

Ghizlan Baryala—who as a child, although her family is Amazigh, was not taught the language at home—stresses the importance of her personal journey of reconnecting with her ancestral heritage:

“For me, this work means reconciliation and recognition. For many years, I refused to know anything about Amazigh. I didn’t use it or understand it very well. Giving it the recognition it deserves reconciles me with my roots. Also, with Awal, I am helping to give it a role in the digital environment, and this is very satisfying.”

Apart from working with Awal, Ghizlan Baryala channels much of her efforts for Amazigh and her people’s visibility through her Instagram account @amazightalks, which has more than 27,000 followers. If you want to know her story and understand Catalan, you can listen to the episode CIEMEN’s podcast Nexes dedicated to her.

A thousand phrases translated

The Awal Linguistic Marathon closed with a dozen people participating and, by the time it finished, a thousand sentences had been translated. It is only the beginning of a project that will last for a long time: for the automatic translator to work, it will be necessary to have, at least, tens of thousands of translated pairs of sentences. And, with this data, to train the machine translation model. This is a task that Alp Öktem, computational linguist and partner of Col·lectivaT, with experience in other minoritised languages, is in charge of. Nationalia told you about his Judeo-Spanish work two years ago..

Two participants in the Awal Linguistic Marathon prepare their translations.

The challenge is enormous. On the one hand, standard Amazigh—a model that could be used for this job—is not very developed in practice. On the other hand, Amazigh is a language made up of linguistic varieties quite distant from each other, not only in lexicon but also in grammar. This makes it difficult for speakers to understand each other. Awal's promoters and volunteers are confident that the project will at least help strengthen community links, develop more linguistic data available to the Amazighsphere, and bring us a little closer to the realization of digital tools that will make the internet speak Amazigh.