Google Unveils WAXAL Speech Dataset, Expands AI Support for Hausa, Yoruba and Igbo Google has launched WAXAL, a large-scale open speech dataset designed to improve artificial intelligence tools for African languages, with particular focus on widely spoken Nigerian languages such as Hausa, Yoruba and Igbo. Developed in partnership with African research institutions, the
Google Unveils WAXAL Speech Dataset, Expands AI Support for Hausa, Yoruba and Igbo
Google has launched WAXAL, a large-scale open speech dataset designed to improve artificial intelligence tools for African languages, with particular focus on widely spoken Nigerian languages such as Hausa, Yoruba and Igbo.
Developed in partnership with African research institutions, the dataset is expected to significantly boost voice-based technologies for more than 100 million speakers who have long been excluded from digital tools due to limited availability of quality language data.
WAXAL contains speech data covering 21 Sub-Saharan African languages, but Google said the inclusion of Hausa, Yoruba and Igbo marks a major step toward addressing language gaps in Africa’s most populous country, Nigeria.
Despite their extensive use across West Africa and the diaspora, the three languages have remained underrepresented in global artificial intelligence systems.
The technology company noted that while voice assistants, speech-to-text applications and other AI-powered tools have become commonplace in many parts of the world, Africa’s over 2,000 languages have largely been absent from such technologies, reinforcing the continent’s digital divide.
According to Google, the lack of reliable speech datasets has limited the ability of millions of Hausa, Yoruba and Igbo speakers to benefit from voice-enabled tools in education, healthcare delivery, commerce and everyday communication.
The WAXAL project was developed over a three-year period with funding from Google.
It includes approximately 1,250 hours of transcribed natural speech, alongside more than 20 hours of high-quality studio recordings designed to support realistic synthetic voices and advanced speech technologies.
Commenting on the initiative, Head of Google Research Africa, Aisha Walcott-Bryantt, said the dataset would provide a critical foundation for African-led innovation, particularly for developers working in indigenous languages.
“The ultimate impact of WAXAL is the empowerment of people in Africa. This dataset enables students, researchers and entrepreneurs to build technology in their own languages, including Hausa, Yoruba and Igbo, and to reach communities that have historically been left out of digital innovation,” she said.
A key aspect of the project is its community-driven approach.
African universities and organisations played a central role in collecting and validating the speech data, working closely with Google researchers throughout the process.
Institutions involved in the project include Makerere University in Uganda, the University of Ghana and Digital Umuganda in Rwanda.
Google said the collaboration model ensures that the data reflects authentic accents, expressions and usage patterns across different communities.

















Leave a Comment
Your email address will not be published. Required fields are marked with *