Problem: The project aimed to develop a medical bot, called DocBot, to provide users with accurate and concise information from a medical encyclopedia.
Context: The data for DocBot was extracted from a collection of medical encyclopedia PDFs. The challenge was to create a system that could understand user queries and retrieve relevant information from medical documents. This project is built as in the series of other learning projects to work with Language models and conversational AI and apply them in the real world.
As part of this project, I developed an ingestion script to process PDF documents from the Encyclopedia of Medicine. The script, stored in book.py
, utilized the langchain
library for document loading, text splitting, and creating vector embeddings. The resulting vector database, powered by the FAISS
library, was then saved locally.
In the model.py
file, I implemented a conversational QA system using the Llama 2 model. The Llama 2, a state-of-the-art language model, was employed to understand and respond to user queries. The retrieval QA chain, facilitated by the chainlit
library, utilized prompt engineering techniques for effective question-answering.
Used the PyPDFLoader
and DirectoryLoader
from langchain
for loading PDF documents.Utilized the Hugging Face Embeddings (sentence-transformers/all-MiniLM-L6-v2
) for creating meaningful document embeddings and storing them in a vector database FAISS
and to retrieve vectors efficiently.
A snippet of code showing the document retrieval and storage:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
DATA_PATH = 'data/'
DB_FAISS_PATH = 'vectorstore/db_faiss'
# Create vector database
def create_vector_db():
loader = DirectoryLoader(DATA_PATH,
glob='*.pdf',
loader_cls=PyPDFLoader)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,
chunk_overlap=50)
texts = text_splitter.split_documents(documents)
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2',
model_kwargs={'device': 'cpu'})
db = FAISS.from_documents(texts, embeddings)
db.save_local(DB_FAISS_PATH)
if __name__ == "__main__":
create_vector_db()
Integrated the Llama 2 model using CTransformers
for handling user queries.
Prompt Engineering
Prompt engineering proved to be a vital aspect of enhancing the performance of the conversational QA system. The PromptTemplate
class facilitated the formulation of effective prompts that guided the model in understanding the context and generating helpful answers.
The code snipped shows the custom_prompt_template
used:
custom_prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: {context}
Question: {question}
Only return the helpful answer below and nothing else.
Helpful answer:
"""
def set_custom_prompt():
"""
Prompt template for QA retrieval for each vector store
"""
prompt = PromptTemplate(template=custom_prompt_template,
input_variables=['context', 'question'])
return prompt
The Llama 2 Model
The Llama 2 model, specifically the llama-2-7b-chat.ggmlv3.q4_0.bin
variant which is the quantized version of the Llama 2 model to run on CPU, was crucial in powering the conversational QA system. This model was integrated into the retrieval QA chain, allowing it to retrieve relevant information from the vector database created during the ingestion process. With a focus on medical knowledge, the model demonstrated its proficiency in handling diverse user queries.
The below code snippet shows how the quantized model was loaded. The temperature was set to 0.7 to make the model more creative: