Adesua: Development and Feasibility Study of an AI WhatsApp Bot for Science Learning in West Africa

Source: arXiv:2605.15376 · Published 2026-05-14 · By George Boateng, Evans Atompoya, Philemon Badu, Samuel John, Samuel Ansah, Patrick Agyeman-Budu et al.

TL;DR

The paper addresses the critical challenge of limited access to personalized science education support in sub-Saharan Africa caused by high student-teacher ratios and teacher shortages. It presents Adesua, a WhatsApp-based AI teaching assistant designed for Junior High School (JHS) and Senior High School (SHS) students in West Africa. Building on the previous Kwame for Science platform, Adesua enhances accessibility by leveraging WhatsApp's ubiquity and low data usage, adding a retrieval-augmented generation (RAG) question-answering system combined with automated, curriculum-aligned assessments and detailed feedback. The knowledge base integrates 33 years of national exam questions and curated textbook content, with answers verified by experts through a human-in-the-loop process.

A 6-month feasibility study in Ghana with 56 active users showed promising engagement and a high perceived helpfulness (93.75% helpfulness score from 16 raters) of AI-generated answers, albeit with a small sample size limiting generalizability. The system supports natural language questions, interactive multiple-choice tests (timed and untimed), and comprehensive performance reports delivered via WhatsApp, demonstrating a potentially scalable, low-cost solution for personalized formative assessment in resource-constrained educational settings. The paper points towards further evaluation with larger cohorts and improved features, such as multimodal input and AI tutoring components.

Key findings

Deployment over 6 months in Ghana included 56 active users (38 students, 18 parents) out of 107 signups.
44 assessment attempts recorded from 19 unique users; 45.5% completed, with higher completion for shorter custom quizzes vs 40-60 question past-paper exams.
16 users submitted 48 queries with a 93.75% AI answer helpfulness score, indicating high perceived utility despite the small sample (n=16).
Median end-to-end latency for question answering was approximately 1.8 ms, supporting near real-time interaction.
All AI-generated answers to 33 years of national exam questions were reviewed and corrected by an expert, creating a fully verified ground-truth dataset.
Custom NLP-based topic matching enabled retrieval of relevant exam and textbook passages with a relevance threshold of 0.6 using cosine similarity.
The conversational system supports multi-turn dialogues with contextual awareness for follow-up questions and clarifications.
Assessment results delivered as detailed PDFs through WhatsApp, including question-by-question feedback, enabling retrospective review.

Threat model

The adversary model is minimal, assuming legitimate student users seeking educational support; there is no consideration of malicious actors manipulating the system or generating adversarial inputs. The system focuses on truthful, curriculum-aligned responses and does not address targeted attacks or misinformation hazards.

Methodology — deep read

The authors developed Adesua as a WhatsApp chatbot for science education targeting JHS and SHS students in West Africa. The threat model assumes a typical student user seeking personalized learning support; adversarial threats like misinformation are not the focus.

The data powering Adesua consists of a curated knowledge base including 33 years of national exam questions spanning BECE (JHS) and WASSCE (SHS), combined with textbook content. All exam questions were answered by GPT-4, and each answer was verified and corrected by an expert reviewer, resulting in a fully human-validated dataset serving as ground truth.

The core architecture is a retrieval-augmented generation (RAG) pipeline: first, the user question is embedded using all-mpnet-base-v2 sentence embeddings; then two ElasticSearch indices (one for curated textbooks, one for exam questions by education level) are queried in parallel with cosine similarity, retrieving the top 5 relevant textbook passages and 5 related exam Q&A pairs exceeding a relevance threshold (0.6). These retrieved text snippets are assembled into a structured prompt context.

The context and original question are passed to GPT-4 via Azure OpenAI service with a curriculum-specific system prompt, instructing concise, curriculum-aligned answers formatted for WhatsApp, with output capped at 1,024 tokens. GPT-4 is configured with temperature 1.0 and top-p 1.0. The prompt mandates grounding in the retrieved context to minimize hallucinations.

The system maintains conversation state for multi-turn queries, enabling follow-up clarifications and feedback collection (user helpfulness ratings).

For assessments, the platform supports premade quizzes (past national exams by year and subject) and custom quizzes (topic-based dynamically generated question sets). Students choose timed or untimed modes. The system uses NLP to match natural language quiz requests to stored quizzes. Quizzes deliver one multiple-choice question at a time, recording answers and enforcing timers for timed quizzes. On completion or exit, assessments are auto-graded and detailed feedback reports (PDFs with questions, answers, explanations) are delivered via WhatsApp.

User onboarding collects consent and profile info (student/parent/teacher, school, education level) with error handling and global exit commands. Interaction flows guide users through question asking, tests, and results retrieval.

The feasibility study ran for 6 months in Ghana with recruitment via social media and teacher organizations. Of 107 signups, 56 users continued past onboarding. Usage logs tracked 44 assessment attempts and 48 questions from 16 users. Helpfulness ratings and query volumes were analyzed quantitatively.

Limitations include a small user sample, low engagement beyond initial curiosity users, absence of a randomized controlled trial or learning outcomes measurement, and no adversarial robustness testing. Code and dataset details on public availability were not specified, though reliance on Azure OpenAI indicates some external dependencies.

One concrete example: a JHS student submits a science question through WhatsApp; the system embeds the query, retrieves relevant textbook passages and exam Q&As surpassing the similarity threshold; constructs a prompt including those contexts; queries GPT-4 to generate a concise, curriculum-aligned explanation; returns the answer on WhatsApp; the student can then rate the answer's helpfulness or ask a follow-up question, maintaining contextual awareness for a multi-turn dialogue. The process completes with logs stored for future analysis.

Overall, the methodology thoroughly integrates curated curriculum-aligned content with generative AI synthesis to enable accessible, personalized Q&A and assessment via a low-bandwidth messaging platform.

Technical innovations

Integration of retrieval-augmented generation (RAG) using elasticsearch-based passage retrieval combined with GPT-4 for curriculum-aligned science question answering in West African contexts.
Human-in-the-loop expert verification of GPT-4 generated answers for 33 years of exam questions, creating a high-quality, locally relevant ground-truth dataset.
Deployment of a comprehensive assessment system embedded in WhatsApp supporting premade and dynamic custom quizzes with automated grading and detailed feedback delivery as PDFs.
Stateful conversational interface with multi-turn dialogue support and contextual error handling tailored for low-bandwidth WhatsApp interactions.
Curriculum-specific prompting strategies that enforce grounded, concise AI responses formatted for WhatsApp's limited text formatting capabilities.

Datasets

West African Junior High School (BECE) national exam questions — 33 years (1990-2023) — curated and expert-verified answers
West African Senior High School (WASSCE) national exam questions — 33 years — curated and expert-verified answers
Open source science textbook passages — unspecified size — curated for relevance to West African curricula

Baselines vs proposed

Kwame for Science (prior system): Helpfulness score = 87.2% vs Adesua: 93.75% (n=16 ratings) for AI-generated answers
Kwame for Science passage retrieval latency not specified vs Adesua: median end-to-end QA latency ~1.8 ms

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.15376.

Fig 1: Screenshots of Adesua

Fig 2: Adesua Landing Page

Limitations

Small user sample size (56 active users, even fewer providing ratings and assessments) limits statistical power and generalizability.
No rigorous experimental evaluation of learning outcomes or student performance improvement was conducted.
Engagement was limited, with many users abandoning assessments early; caused partly by unstructured deployment lacking formal school partnerships.
No adversarial robustness or misinformation evaluation of AI model responses was performed.
The system currently supports only multiple-choice assessments with no open-ended question grading.
Voice input/output, local language translation, and multimodal question answering are planned future features but currently absent.

Open questions / follow-ons

How does Adesua’s AI-driven personalized support impact measurable student learning outcomes compared to traditional methods?
How can multimodal inputs (images, voice) and local language support be effectively integrated to improve accessibility and engagement?
What strategies could improve sustained user engagement and completion rates in low-resource settings?
How robust are the generative responses to ambiguous or misleading queries, and can safeguards against hallucinated or harmful answers be enhanced?

Why it matters for bot defense

While Adesua is primarily an AI educational tool, its design and deployment provide insights relevant to bot-defense and CAPTCHA engineering in messaging platforms. The study highlights how conversational AI bots can be integrated into popular messaging apps like WhatsApp with low-latency, stateful multi-turn dialogues that require careful input validation and error handling to maintain usability under resource constraints.

From a security and bot-defense perspective, understanding how such bots authenticate users, maintain session state, and manage input ambiguity could inform mechanisms to differentiate between genuine students and automated or malicious actors. Additionally, the focus on grounding AI outputs in verified sources to minimize hallucinations parallels verification challenges in bot-driven systems, emphasizing the importance of retrieval-augmented generation and human validation in high-stakes conversational AI deployments.

Cite

bibtex

@article{arxiv2605_15376,
  title={ Adesua: Development and Feasibility Study of an AI WhatsApp Bot for Science Learning in West Africa },
  author={ George Boateng and Evans Atompoya and Philemon Badu and Samuel John and Samuel Ansah and Patrick Agyeman-Budu and Victor Wumbor-Apin Kumbol },
  journal={arXiv preprint arXiv:2605.15376},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.15376}
}

Adesua: Development and Feasibility Study of an AI WhatsApp Bot for Science Learning in West Africa ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​