publications | Claytone Sikasote

2025

SACAIR
Investigating the Impact of Multilingual Pre-trained Speech Models on Gender Bias in ASR for Low Resource African Languages

Claytone Sikasote, Hussein Suleman , and Jan Buys

In Southern African Conference for Artificial Intelligence Research , 2025

Abs Bib HTML Slides

While fine-tuning transformer-based pre-trained speech models improves speech recognition for low resource languages, the approach increases the risk of speaker attribute bias in the resulting target language automatic speech recognition (ASR) systems. This work investigates gender bias in two state-of-the-art pre-trained speech models, MMS and Whisper, fine-tuned for ASR on three African languages: Bemba, Nyanja, and Swahili. We fine-tune models on gender-specific as well as gender-balanced datasets, and estimate and compare gender bias across different settings. Our results show varying degrees of gender bias in the fine-tuned models, even with gender-balanced fine-tuning, suggesting influence from pre-trained models. Inconsistencies in gender-specific fine-tuning further confirm the transfer of bias from pre-trained models. Additionally, an ablation study shows no relationship between training data size and gender bias.
@inproceedings{sikasote-etal-2025-asr-gender-bias, author = {Sikasote, Claytone and Suleman, Hussein and Buys, Jan}, editor = {Gerber, Aurona and Pillay, Anban W.}, title = {Investigating the Impact of Multilingual Pre-trained Speech Models on Gender Bias in ASR for Low Resource African Languages}, booktitle = {Southern African Conference for Artificial Intelligence Research}, year = {2025}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {144--158}, isbn = {978-3-032-11733-5}, doi = {https://doi.org/10.1007/978-3-032-11733-5_9}, url = {https://doi.org/10.1007/978-3-032-11733-5_9}, dimensions = {true}, google_scholar_id = {nroGzMJTTpEC}, }

2023

ACL
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

Claytone Sikasote, Eunice Mukonde , Md Mahfuz Ibn Alam , and 1 more author

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2023

Abs Bib HTML PDF

We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language processing research almost impossible. The dataset is comprised of multi-turn dialogues between Bemba speakers based on images, transcribed and translated into English. There are more than 92,000 utterances/sentences, amounting to more than 180 hours of audio data with corresponding transcriptions and English translations. We also provide baselines on speech recognition (ASR), machine translation (MT) and speech translation (ST) tasks, and sketch out other potential future multimodal uses of our dataset. We hope that by making the dataset available to the research community, this work will foster research and encourage collaboration across the language, speech, and vision communities especially for languages outside the “traditionally” used high-resourced ones. All data and code are publicly available: [https://github.com/csikasote/bigc](https://github.com/csikasote/bigc).
@inproceedings{sikasote-etal-2023-big, title = {BIG-C: a Multimodal Multi-Purpose Dataset for Bemba}, author = {Sikasote, Claytone and Mukonde, Eunice and Alam, Md Mahfuz Ibn and Anastasopoulos, Antonios}, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, volume = {1}, pages = {2062--2078}, year = {2023}, publisher = {Association for Computational Linguistics}, doi = {10.18653/v1/2023.acl-long.115}, url = {https://aclanthology.org/2023.acl-long.115}, dimensions = {true}, google_scholar_id = {Y0pCki6q_DkC}, }
Interspeech
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

Claytone Sikasote, Kalinda Siaminwe , Stanly Mwape , and 6 more authors

In Proceedings of INTERSPEECH , 2023

Abs Bib HTML PDF

This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. It contains two collections of datasets: unlabelled audio recordings of radio news and talk shows programs (160 hours) and labelled data (over 80 hours) consisting of read speech recorded from text sourced from publicly available literature books. The dataset is created for speech recognition but can be extended to multilingual speech processing research for both supervised and unsupervised learning approaches. To our knowledge, this is the first multilingual speech dataset created for Zambian languages. We exploit pretraining and cross-lingual transfer learning by finetuning the Wav2Vec2.0 large-scale multilingual pretrained model to build end-to-end (E2E) speech recognition models for our baseline models. The dataset is released publicly under a Creative Commons BY-NC-ND 4.0 license and can be accessed through the project repository.
@inproceedings{sikasote23_interspeech, title = {Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages}, author = {Sikasote, Claytone and Siaminwe, Kalinda and Mwape, Stanly and Zulu, Bangiwe and Phiri, Mofya and Phiri, Martin and Zulu, David and Nyirenda, Mayumbo and Anastasopoulos, Antonios}, booktitle = {Proceedings of INTERSPEECH}, pages = {3984--3988}, year = {2023}, publisher = {INTERSPEECH}, doi = {10.21437/Interspeech.2023-1979}, url = {https://www.isca-speech.org/archive/interspeech_2023/sikasote23_interspeech.html}, dimensions = {true}, google_scholar_id = {Oj-SLI0tZTcC}, }

EMNLP

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Odunayo Ogundepo , Tajuddeen Gwadabe , Clara Rivera , and 41 more authors

In Findings of the Association for Computational Linguistics: EMNLP 2023 , 2023

Abs Bib HTML PDF

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems – those that retrieve answer content from other languages while serving people in their native language—offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

@inproceedings{ogundepo-etal-2023-cross,
  title = {AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages},
  author = {Ogundepo, Odunayo and Gwadabe, Tajuddeen and Rivera, Clara and Clark, Jonathan H and Ruder, Sebastian and Adelani, David and Dossou, Bonaventure and Diop, Abdou and Sikasote, Claytone and Hacheme, Gilles and Buzaaba, Happy and Ezeani, Ignatius and Mabuya, Rooweither and Osei, Salomey and Emezue, Chris and Kahira, Albert and Muhammad, Shamsuddeen and Oladipo, Akintunde and Owodunni, Abraham and Tonja, Atnafu and Shode, Iyanuoluwa and Asai, Akari and Aremu, Anuoluwapo and Awokoya, Ayodele and Opoku, Bernard and Chukwuneke, Chiamaka and Mwase, Christine and Siro, Clemencia and Arthur, Stephen and Ajayi, Tunde and Otiende, Verrah and Rubungo, Andre and Sinkala, Boyd and Ajisafe, Daniel and Onwuegbuzia, Emeka and Lawan, Falalu and Ahmad, Ibrahim and Alabi, Jesujoba and Mbonu, Chinedu and Adeyemi, Mofetoluwa and Phiri, Mofya and Ahia, Orevaoghene and Iro, Ruqayya and Adhiambo, Sonia},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023},
  pages = {14957--14972},
  year = {2023},
  publisher = {Association for Computational Linguistics},
  doi = {10.18653/v1/2023.findings-emnlp.997},
  url = {https://aclanthology.org/2023.findings-emnlp.997/},
  dimensions = {true},
  google_scholar_id = {Tyk-4Ss8FVUC},
}

2022

LREC
BembaSpeech: A Speech Recognition Corpus for the Bemba Language

Claytone Sikasote, and Antonios Anastasopoulos

In Proceedings of the Thirteenth Language Resources and Evaluation Conference , 2022

Abs Bib HTML PDF

We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia. To assess its usefulness for training and testing ASR systems for Bemba, we explored different approaches; supervised pre-training (training from scratch), cross-lingual transfer learning from a monolingual English pre-trained model using DeepSpeech on the portion of the dataset and fine-tuning large scale self-supervised Wav2Vec2.0 based multilingual pre-trained models on the complete BembaSpeech corpus. From our experiments, the 1 billion XLS-R parameter model gives the best results. The model achieves a word error rate (WER) of 32.91%, results demonstrating that model capacity significantly improves performance and that multilingual pre-trained models transfers cross-lingual acoustic representation better than monolingual pre-trained English model on the BembaSpeech for the Bemba ASR. Lastly, results also show that the corpus can be used for building ASR systems for Bemba language.
@inproceedings{sikasote-anastasopoulos-2022-bembaspeech, title = {BembaSpeech: A Speech Recognition Corpus for the Bemba Language}, author = {Sikasote, Claytone and Anastasopoulos, Antonios}, booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference}, pages = {7277--7283}, year = {2022}, publisher = {European Language Resources Association}, doi = {https://aclanthology.org/2022.lrec-1.790}, url = {https://aclanthology.org/2022.lrec-1.790}, dimensions = {true}, google_scholar_id = {u-x6o8ySG0sC}, }

TACL

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Julia Kreutzer , Isaac Caswell , Lisa Wang , and 49 more authors

Transactions of the Association for Computational Linguistics, 2022

Abs Bib HTML PDF

With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, Web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.

@article{kreutzer-etal-2022-quality,
  title = {Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets},
  author = {Kreutzer, Julia and Caswell, Isaac and Wang, Lisa and Wahab, Ahsan and Esch, Daan van and Ulzii-Orshikh, Nasanbayar and Tapo, Allahsera and Subramani, Nishant and Sokolov, Artem and Sikasote, Claytone and Setyawan, Monang and Sarin, Supheakmungkol and Samb, Sokhar and Sagot, Benoît and Rivera, Clara and Rios, Annette and Papadimitriou, Isabel and Osei, Salomey and Suarez, Pedro Ortiz and Orife, Iroro and Ogueji, Kelechi and Rubungo, Andre Niyongabo and Nguyen, Toan Q and Müller, Mathias and Müller, André and Muhammad, Shamsuddeen Hassan and Muhammad, Nanda and Mnyakeni, Ayanda and Mirzakhalov, Jamshidbek and Matangira, Tapiwanashe and Leong, Colin and Lawson, Nze and Kudugunta, Sneha and Jernite, Yacine and Jenny, Mathias and Firat, Orhan and Dossou, Bonaventure FP and Dlamini, Sakhile and Silva, Nisansa de and Ballı, Sakine Çabuk and Biderman, Stella and Battisti, Alessia and Baruwa, Ahmed and Bapna, Ankur and Baljekar, Pallavi and Azime, Israel Abebe and Awokoya, Ayodele and Ataman, Duygu and Ahia, Orevaoghene and Ahia, Oghenefego and Agrawal, Sweta and Adeyemi, Mofetoluwa},
  journal = {Transactions of the Association for Computational Linguistics},
  volume = {10},
  pages = {50--72},
  year = {2022},
  publisher = {MIT Press},
  doi = {10.1162/tacl_a_00447},
  url = {https://aclanthology.org/2022.tacl-1.4},
  dimensions = {true},
  google_scholar_id = {qjMakFHDy7sC},
}

2019

JCS
Automated Fall Armyworm (Spodoptera frugiperda, J.E. Smith) Pheromone Trap Based on Machine Learning

Simon H. Chiwamba , Jackson Phiri , Phillip O.Y Nkunika , and 3 more authors

Journal of Computer Science, 2019

Abs Bib PDF

Maize is the main food crop that meets the nutritional needs of both humans and livestock in the sub-Saharan African region. Maize crop has in the recent past been threatened by the fall armyworm (Spodoptera frugiperda, J.E Smith) which has caused considerable maize yield losses in the region. Controlling this pest requires knowledge on the time, location and extent of infestation. In addition, the insect pest’s abundance and environmental conditions should be predicted as early as possible for integrated pest management to be effective. Consequently, a fall armyworm pheromone trap was deployed as a monitoring tool in the present study. The trap inspection is currently carried out manually every week. The purpose of this paper is to bring automation to the trap. We modify the trap and integrate Internet of Things technologies which include a Raspberry Pi 3 Model B+ micro-computer, Atmel 8-bit AVR microcontroller, 3G cellular modem and various sensors powered with an off-grid solar photovoltaic system to capture real-time fall armyworm moth images, environmental conditions and provide real-time indications of the pest occurrences. The environmental conditions include Geographical Positioning System coordinates, temperature, humidity, wind speed and direction. The captured images together with environmental conditions are uploaded to the cloud server where the image is classified instantly using Google’s pre-trained InceptionV3 Machine Learning model. Intended users view captured data including prediction accuracy via a web application. Once this smart technology is adopted, the labour-intensive task of monitoring will reduce while stakeholders shall be provided with a near real-time insight into the FAW situation in the field therefore enabling pro-activeness in their management of such a devastating pest.
@article{simon-chiwamba-2018, title = {Automated Fall Armyworm (Spodoptera frugiperda, J.E. Smith) Pheromone Trap Based on Machine Learning}, author = {Chiwamba, Simon H. and Phiri, Jackson and Nkunika, Phillip O.Y and Sikasote, Claytone and Kabemba, Monde M. and Moonga, Miyanda N}, pages = {1759--1779}, volume = {12}, journal = {Journal of Computer Science}, year = {2019}, publisher = {Journal of Computer Science}, doi = {https://doi.org/10.3844/jcssp.2019.1759.1779}, url = {https://thescipub.com/abstract/10.3844/jcssp.2019.1759.1779}, dimensions = {true}, google_scholar_id = {d1gkVwhDpl0C}, }

2018

ZAPUC
A Survey on Face Detection and Recognition Techniques for Application in Educational Institutions

Leena Kumar , Claytone Sikasote, Jackson Phiri , and 1 more author

In Proceedings of the Zambia Association of Public Universities and Colleges , 2018

Abs Bib PDF

Video surveillance systems continue to grow in importance and use. They monitor the behavior and activities of the people using electronic equipment. Consequently, video surveillance has emerged as a main component in ensuring public security at airports, hospitals, banks, government agencies, casinos and also educational institutions. Therefore, they have a great potential for enhancing security requirements in educational institutions. However, real-time detection and recognition of a human face from the video sequences is a difficult task due to the background variations, changes in the facial expression and illumination intensity. The ability to automatically recognize the faces in the surveillance video is highly important in detecting the intruder/suspicious person. Face detection and recognition are the two main stages of the surveillance process. Facial recognition has gained a lot of significance in commercial, finance and security applications. Various face recognition techniques are developed to improve the accurate recognition of the face in the image. However, the existing techniques suffer due to the variation in the illumination intensities, facial angles, low resolution, improper focus and light variations. This paper provides a survey of the face detection and recognition techniques. The survey presents the comparative analysis of the recent face detection and recognition techniques along with the merits and also discusses their applicability in the education sector. This information is very important in choosing what techniques would best be applied in educational institutions putting into consideration the financial and technological constraints they operate under.
@inproceedings{leena-kumar-2018, title = {A Survey on Face Detection and Recognition Techniques for Application in Educational Institutions}, author = {Kumar, Leena and Sikasote, Claytone and Phiri, Jackson and Nyirenda, Mayumbo}, booktitle = {Proceedings of the Zambia Association of Public Universities and Colleges}, year = {2018}, publisher = {ZAPUC}, url = {https://www.researchgate.net/publication/329417179_A_Survey_on_Face_Detection_and_Recognition_Techniques_for_Application_in_Educational_Institutions}, dimensions = {true}, google_scholar_id = {u5HHmVD_uO8C}, }