Empowering inclusive AI through community-driven language data collection.
Government of the Democratic Republic of the Congo, Government of France, University of Mbandaka, Protestant University in the Congo, Centre d'Innovation de Lubumbashi (CINOLU)
Today's artificial intelligence (AI) systems are predominantly developed based on a few high-resource languages such as English. For the estimated 1.2 billion people around the world who speak under-represented and indigenous languages, this reality tends to limit their access to reliable digital tools, online services, and AI-enabled innovations. These systems often deliver poor and unreliable outputs for many of these languages, dialects and cultural contexts, placing vulnerable communities at greater risk.
In the Democratic Republic of the Congo (DRC) more than 200 languages are spoken, and gaps in linguistic inclusion across digital systems carry direct consequences for human security. Lingala, spoken by more than 20 million people across DRC and by more than 40 million people in Central Africa, remains inadequately supported in AI technologies and unavailable across critical digital platforms and public services. Without structured, community-validated, high-quality text and voice datasets, the integration of Lingala into speech recognition systems, translation tools, or voice-enabled public services remains limited, restricting access to information, protection services, and economic participation.
Voice-a-Thon is a community-driven language data collection platform, which was developed as part of UNDP’s Local Language Accelerator Programme . It enables communities to collect, validate, and share high-quality Lingala voice data. Advancing AI innovation, these datasets can be used to enable better speech recognition systems, improve machine translation or voice assistants, and localize important digital public services – such as Sauti ya wa nyonge , a chatbot supporting people experiencing gender-based violence. The Voice-a-Thon platform provides a simple and accessible digital interface designed as an open and reusable tool, which enables communities to contribute and validate voice recordings by reading short sentences in their native language. This helps to create a verified and standardized linguistic corpus tailored to the Congolese context for advancing AI development, particularly for public service delivery. By combining community-based validation with oversight from local linguists and researchers, Voice-a-Thon ensures that the outputs from AI systems are both technically reliable and culturally relevant.
Voice-a-Thon enables speakers of under-represented languages to actively shape AI technologies that affect their daily lives. Communities contribute voice data as active builders of digital tools for their context, bringing benefits particularly for women and youth. By strengthening voice-enabled systems, Voice-a-Thon improves inclusive access to public information, health services, and protection tools towards ensuring no one is left behind.
Voice-a-Thon represents the first large-scale, structured effort to build a Lingala voice corpus in the DRC. It mobilizes universities, innovation hubs, and local associations to preserve and digitize linguistic heritage. This effort helps to strengthen the country’s AI readiness as part of its national digital transformation strategy, laying the groundwork for development and adoption of inclusive voice-enabled services across different products and services. To date, Voice-a-Thon has produced a linguist-validated corpus of 1 million parallel sentences in Lingala and French. This dataset supports high-quality recordings and expands opportunities for integration into existing digital services.
Under Digital X 3.0, Voice-a-Thon plans to scale through enhanced user testing, improved data validation and cleaning pipelines, and expanded domain-specific data collection, particularly in the sub-sets of health and human security-related data. Technical improvements will further strengthen interoperability with other digital public goods, including a multilingual chatbot solution for the Sauti ya wa nyonge platform.
Voice-a-Thon also plans to expand beyond Lingala to additional Congolese languages, and potentially to other linguistically diverse regions in Central and West Africa. By building long-term partnerships with universities, government institutions, and technology partners, Voice-a-Thon aims to foster inclusive, sustainable and locally governed AI ecosystems.