publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2023
- ACLBIG-C: a Multimodal Multi-Purpose Dataset for BembaClaytone Sikasote, Eunice Mukonde , Md Mahfuz Ibn Alam , and 1 more authorIn Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2023
We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language processing research almost impossible. The dataset is comprised of multi-turn dialogues between Bemba speakers based on images, transcribed and translated into English. There are more than 92,000 utterances/sentences, amounting to more than 180 hours of audio data with corresponding transcriptions and English translations. We also provide baselines on speech recognition (ASR), machine translation (MT) and speech translation (ST) tasks, and sketch out other potential future multimodal uses of our dataset. We hope that by making the dataset available to the research community, this work will foster research and encourage collaboration across the language, speech, and vision communities especially for languages outside the “traditionally” used high-resourced ones. All data and code are publicly available: [https://github.com/csikasote/bigc](https://github.com/csikasote/bigc).
- InterspeechZambezi Voice: A Multilingual Speech Corpus for Zambian LanguagesClaytone Sikasote, Kalinda Siaminwe , Stanly Mwape , and 6 more authorsIn Proceedings of INTERSPEECH , 2023
This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. It contains two collections of datasets: unlabelled audio recordings of radio news and talk shows programs (160 hours) and labelled data (over 80 hours) consisting of read speech recorded from text sourced from publicly available literature books. The dataset is created for speech recognition but can be extended to multilingual speech processing research for both supervised and unsupervised learning approaches. To our knowledge, this is the first multilingual speech dataset created for Zambian languages. We exploit pretraining and cross-lingual transfer learning by finetuning the Wav2Vec2.0 large-scale multilingual pretrained model to build end-to-end (E2E) speech recognition models for our baseline models. The dataset is released publicly under a Creative Commons BY-NC-ND 4.0 license and can be accessed through the project repository.
- EMNLPAfriQA: Cross-lingual Open-Retrieval Question Answering for African LanguagesOdunayo Ogundepo , Tajuddeen Gwadabe , Clara Rivera , and 41 more authorsIn Findings of the Association for Computational Linguistics: EMNLP 2023 , 2023
African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems – those that retrieve answer content from other languages while serving people in their native language—offer a means of filling this gap. To this end, we create Our Dataset, the first cross-lingual QA dataset with a focus on African languages. Our Dataset includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, Our Dataset focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, Our Dataset proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.
2022
- LRECBembaSpeech: A Speech Recognition Corpus for the Bemba LanguageClaytone Sikasote, and Antonios AnastasopoulosIn Proceedings of the Thirteenth Language Resources and Evaluation Conference , 2022
We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia. To assess its usefulness for training and testing ASR systems for Bemba, we explored different approaches; supervised pre-training (training from scratch), cross-lingual transfer learning from a monolingual English pre-trained model using DeepSpeech on the portion of the dataset and fine-tuning large scale self-supervised Wav2Vec2.0 based multilingual pre-trained models on the complete BembaSpeech corpus. From our experiments, the 1 billion XLS-R parameter model gives the best results. The model achieves a word error rate (WER) of 32.91%, results demonstrating that model capacity significantly improves performance and that multilingual pre-trained models transfers cross-lingual acoustic representation better than monolingual pre-trained English model on the BembaSpeech for the Bemba ASR. Lastly, results also show that the corpus can be used for building ASR systems for Bemba language.
- TACLQuality at a Glance: An Audit of Web-Crawled Multilingual DatasetsJulia Kreutzer , Isaac Caswell , Lisa Wang , and 49 more authorsTransactions of the Association for Computational Linguistics, 2022
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, Web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.
2019
- JCSAutomated Fall Armyworm (Spodoptera frugiperda, J.E. Smith) Pheromone Trap Based on Machine LearningSimon H. Chiwamba , Jackson Phiri , Phillip O.Y Nkunika , and 3 more authorsJournal of Computer Science, 2019
Maize is the main food crop that meets the nutritional needs of both humans and livestock in the sub-Saharan African region. Maize crop has in the recent past been threatened by the fall armyworm (Spodoptera frugiperda, J.E Smith) which has caused considerable maize yield losses in the region. Controlling this pest requires knowledge on the time, location and extent of infestation. In addition, the insect pest’s abundance and environmental conditions should be predicted as early as possible for integrated pest management to be effective. Consequently, a fall armyworm pheromone trap was deployed as a monitoring tool in the present study. The trap inspection is currently carried out manually every week. The purpose of this paper is to bring automation to the trap. We modify the trap and integrate Internet of Things technologies which include a Raspberry Pi 3 Model B+ micro-computer, Atmel 8-bit AVR microcontroller, 3G cellular modem and various sensors powered with an off-grid solar photovoltaic system to capture real-time fall armyworm moth images, environmental conditions and provide real-time indications of the pest occurrences. The environmental conditions include Geographical Positioning System coordinates, temperature, humidity, wind speed and direction. The captured images together with environmental conditions are uploaded to the cloud server where the image is classified instantly using Google’s pre-trained InceptionV3 Machine Learning model. Intended users view captured data including prediction accuracy via a web application. Once this smart technology is adopted, the labour-intensive task of monitoring will reduce while stakeholders shall be provided with a near real-time insight into the FAW situation in the field therefore enabling pro-activeness in their management of such a devastating pest.
2018
- ZAPUCA Survey on Face Detection and Recognition Techniques for Application in Educational InstitutionsLeena Kumar , Claytone Sikasote, Jackson Phiri , and 1 more authorIn Proceedings of the Zambia Association of Public Universities and Colleges , 2018
Video surveillance systems continue to grow in importance and use. They monitor the behavior and activities of the people using electronic equipment. Consequently, video surveillance has emerged as a main component in ensuring public security at airports, hospitals, banks, government agencies, casinos and also educational institutions. Therefore, they have a great potential for enhancing security requirements in educational institutions. However, real-time detection and recognition of a human face from the video sequences is a difficult task due to the background variations, changes in the facial expression and illumination intensity. The ability to automatically recognize the faces in the surveillance video is highly important in detecting the intruder/suspicious person. Face detection and recognition are the two main stages of the surveillance process. Facial recognition has gained a lot of significance in commercial, finance and security applications. Various face recognition techniques are developed to improve the accurate recognition of the face in the image. However, the existing techniques suffer due to the variation in the illumination intensities, facial angles, low resolution, improper focus and light variations. This paper provides a survey of the face detection and recognition techniques. The survey presents the comparative analysis of the recent face detection and recognition techniques along with the merits and also discusses their applicability in the education sector. This information is very important in choosing what techniques would best be applied in educational institutions putting into consideration the financial and technological constraints they operate under.