top of page
Robot Hand Detail

World Health Organization Knowledge Assistant (RAG enabled)

In a world full of medical guidelines, health policies and clinical protocols, finding the right information at the right time can be overwhelming. Healthcare professionals don’t need more noise — they need clarity, precision and trusted answers.

Imagine a healthcare environment where:

  • The system responds with clear, sourced guidance from official WHO documents.

  • You no longer scroll through dozens of PDFs or web pages to find the answer — it comes to you.

That’s exactly what the WHO Knowledge Assistant achieves.

Skills: LLM, RAG-retrieval, NLP, Prompt engineering, AI Deployment, API-free AI system, Python, Streamlit

📚 The Problem

Healthcare guidance is dispersed across hundreds of reports, manuals and policy briefs from the World Health Organization. Manually searching these documents is time-consuming and error-prone — especially when timely, accurate decisions matter most. Although this information is reliable and essential, it presents several challenges:

 

1. Information Overload: WHO reports are often hundreds of pages long, making it difficult for healthcare professionals or researchers to quickly locate relevant sections.

2. Time-Consuming Search: Manual searching through PDFs for specific answers slows down decision-making and research workflows.

3. Context Fragmentation: Relevant guidance is often distributed across multiple documents, requiring cross-referencing and interpretation.

 

4. Risk of Misinterpretation: Without structured assistance, users may overlook critical details or misunderstand complex policy language.

There is a clear need for a system that can bridge the gap between authoritative documentation and practical, question-driven access

🔍 The Solution

 

The WHO Knowledge Assistant addresses these challenges by turning static WHO documents into an intelligent, searchable knowledge base.

The system allows users to:

  • Ask healthcare-related questions

  • Retrieve answers grounded in WHO publications

  • See which documents the information came from

 

Instead of reading entire manuals, users interact with a conversational AI that extracts and summarizes relevant content directly from official WHO sources.

This ensures that:

  • Answers are evidence-based

  • Information remains traceable

  • The system avoids relying on uncontrolled internet data

⚙️ How It Works

 

The system uses a Retrieval-Augmented Generation (RAG) pipeline, combining document retrieval with language model reasoning.

1. Document Ingestion

WHO PDF documents are collected and processed. Each document is:

  • Converted into text

  • Split into smaller, meaningful chunks

  • Stored with metadata indicating the original source

This step transforms static reports into machine-readable knowledge units.

2. Semantic Embedding & Indexing

Each text chunk is converted into a numerical embedding using a sentence-transformer model. These embeddings are stored in a vector database (ChromaDB), allowing the system to search by meaning rather than keywords.

This enables the assistant to find relevant passages even if the user’s question is phrased differently from the original document.

3. Query Understanding & Retrieval

When a user asks a question, the system:

  • Converts the question into an embedding

  • Searches the vector database for the most relevant document chunks

  • Collects the top matches as contextual evidence

This ensures the assistant focuses only on WHO-based material.

4. Answer Generation

A locally hosted language model (via Ollama) receives:

  • The user’s question

  • The retrieved WHO document excerpts

The model generates a clear, structured answer using only this provided context. If the answer cannot be found in the documents, the assistant states that the information is not available in the WHO knowledge base.

This prevents hallucinations and keeps the system grounded.

🌟 Why This Stands Out

 

Unlike general-purpose AI chatbots that may pull uncertain information from across the internet, the WHO Knowledge Assistant is designed with domain restriction and reliability in mind.

🌍 Trusted Source Limitation: The assistant answers only from WHO documents, ensuring credibility and relevance in global health contexts.

📚 Source Transparency: Every answer can be traced back to the specific WHO document it was derived from, increasing user trust.

🔎 Meaning-Based Retrieval: Using embeddings allows the system to understand the intent behind a question rather than relying solely on keyword matches.

⚙️ Fully Local & API-Free: The system runs using local models and open-source tools, demonstrating a privacy-friendly and cost-efficient AI architecture.

💡 What I Learned

 

Building the WHO Knowledge Assistant helped me understand how AI systems can be made more reliable and domain-aware by combining language models with document retrieval.

 

I learned how Retrieval-Augmented Generation (RAG) improves accuracy by grounding responses in trusted external documents rather than relying only on a model’s internal knowledge. Working with WHO PDFs gave me hands-on experience processing unstructured data — extracting text, chunking information and preparing it for semantic search. I also explored how embeddings enable meaning-based retrieval, allowing the system to find relevant information even when questions are phrased differently from the source material. Designing prompts to restrict the model to WHO data taught me how to reduce hallucinations and build more trustworthy AI systems.

 

Most importantly, I gained experience building a complete end-to-end AI application — from data ingestion and vector databases to a working chatbot interface — and saw how AI can transform static documents into interactive knowledge tools.

⏭ Future Improvements

 

While the current system demonstrates a strong foundation for domain-specific AI assistance, several enhancements could make it even more powerful and user-friendly.

🔍 Highlighting Source Passages

Instead of only listing document names, the assistant could display the exact text snippets used to generate each answer. This would further improve transparency and trust.

📂 Dynamic Document Updates & Live Database

Adding an upload feature would allow new WHO documents to be added automatically, keeping the knowledge base current without manual reprocessing. Additionally, there could also be an option of having a live database such that whenever a new document is added, the RAG is retrained automatically.

💬 Conversational Memory

Incorporating memory across multiple questions would allow follow-up queries without repeating the full context. Hence, it would act as a customized yet private and confidential Chat-GPT.

🌐 Multi-Language Support

WHO publishes materials in multiple languages. Expanding the system to retrieve and answer across languages would make it more globally accessible.

📊 Confidence or Relevance Scores

Displaying a confidence indicator based on retrieval similarity could help users judge how strongly the answer is supported by the source documents.

🧠 Stronger Better Models

Future versions could integrate domain-tuned medical language models which have higher reliability and performance to improve terminology understanding and response quality.

🤝 Final Thoughts

 

The WHO Knowledge Assistant can serve as a prototype for intelligent document systems in many high-stakes domains.

Potential use cases include:

  • Supporting healthcare professionals who need quick access to global guidelines

  • Assisting researchers navigating complex public health documentation

  • Helping policy analysts interpret international health recommendations

  • Serving as an educational tool for students in public health and medicine

 

More broadly, this project illustrates how AI can transform large, complex document collections into interactive knowledge systems that support informed decision-making.

Let's connect

  • LinkedIn
  • Instagram

© 2024 by Riddhi Yogesh Kumavat.

Crafted with passion and technology.

bottom of page