Blog

    Mastering Psychology: Building an Expert RAG with Mistral-7B and LangChain

    March 26, 2024

    Welcome to the era of Language Models (LLMs), where they’ve become indispensable digital companions, assisting us in various tasks from coding to culinary adventures. However, let’s acknowledge that even the most capable beings have limitations. LLMs, despite their impressive capabilities, struggle with two significant challenges:

    1. Information Access: LLMs cannot access data outside their training set swiftly. Just imagine your favorite AI assistant unable to provide you with the answers that help but blabbers all about how you could get the answer to your question.
    2. Hallucination Errors: It may sound surprising, but LLMs can sometimes produce responses that seem like they’ve wandered into the realm of imagination. These inaccuracies can lead to confusion and frustration.

    We’ve all experienced that moment when an LLM seems stumped, unable to generate a coherent response due to its restricted training data. It’s akin to having a friend who excels at general advice but struggles when asked for genuine help.

    For example, when you hit top LLMs with those deep questions, AI like ChatGPT-3.5 tends to spit out answers that sound like they’re from a self-help pamphlet. They’re great at regurgitating data patterns, but they lack the real human understanding needed to give personalized advice. So, if one is looking for genuine support, vanilla LLMs would be of little to no help.

    In the whirlwind world of AI, we’re pretty much at the mercy of whenever those new models drop, packed with all that fancy expanded training data. But hey, what if we’re in the mood for an assistant who dishes out advice like a real-deal psychologist? And let’s not forget about diving into those deep questions. Plus, wouldn’t it be neat if our trusty LLM had a bit more intel to nail those responses with pinpoint accuracy?

    Now, fine-tuning seems like a possible solution, but it is not all rainbows and unicorns, it has its own set of risks and complexities:

    1. Model Drift: As we continuously fine-tune the model with fresh data, there’s a risk of it veering off course from its original performance, potentially yielding unexpected outcomes.
    2. Cost and Complexity: Beyond the technical hurdles, the iterative fine-tuning process demands significant computational resources and expertise, translating into a hefty investment.

    This is where RAG comes to the rescue. In this series, we’ll embark on an exploration of RAGs, covering:

    1. Understanding RAGs: What exactly are Retrieval Augmented Generation models, and how do they operate?
    2. Implementation with Mistral 7b: Step-by-step guidance on leveraging Mistral 7b through platforms like HuggingFace and LangChain to construct your RAG.
    3. Real-world Applications: Witnessing a small-scale RAG in action through practical examples, including creating a Psychology assistant capable of delving into deep questions akin to a human psychologist.

    Before we delve into the technical intricacies, let’s ensure we’re all on the same page with some key concepts:

    Text embeddings

    1. Text embeddings are like word magic in a computer’s brain. They turn words, phrases, or even whole sentences into numbers, helping computers understand what they mean. Think of it like turning words into secret codes that only computers can read. We’ve got old-school methods like Word2Vec and GloVe, and fancy new ones like BERT, all working their magic to make sense of text.
    2. Imagine this: you’ve got ‘king’ — ‘man’ + ‘woman’. That’s like trying to find the ‘queen’ in the word world. Surprisingly, this math trick often lands us right on target, showing how good text embeddings are at understanding word relationships.
    3. Text embeddings have come a long way. We started with basic methods, but now we’ve got transformers like BERT shaking things up. They’re like the cool kids on the block, using attention to understand words in context. This upgrade means they can handle bigger chunks of text and understand them better.

    Following are some great resources from OpenAI and Cohere to understand text embeddings in a better manner.

    Vector databases

    So, you know how dealing with loads of complicated data can be a real headache? Traditional databases sometimes just can’t cut it with all that complexity. But then, along comes vector databases to save the day! They’re like the superheroes of the data world, handling all those tricky high-dimensional data points with ease.

    Picture this: your data points are all laid out on a grid, each one represented by its unique attributes. It’s kind of like organizing a bunch of different fruits based on their tastes rather than their colors or shapes. The only difference being these attributes of text are represented` using text embeddings (vectors).

    Image creditsKDnuggets

    Vector DBs use cool tricks like cosine similarity to figure out how similar different data points are, just like when Google shows you search results based on how closely they match your query. It’s all about making sense of the data chaos and helping you find what you need, fast!

    For curious souls following are some great sources to learn more about vector databases

    RAGs Unveiled

    But what exactly is a RAG? Retrieval Augmented Generation models serve as a bridge, equipping Language Models (LLMs) with external data access to generate responses enriched with contextual insights. Whether drawing from recent news, lecture transcripts, or, in our case, top-tier psychology literature, RAGs empower LLMs to respond with newfound depth and relevance.

    How do they function, you ask? Picture RAG as an LLM outfitted with a vector search mechanism. Here’s a simplified breakdown of the process:

    1. Database Setup: Populate a vector database with encoded documents.
    2. Query Encoding: Transform the input query into a vector using a sentence transformer.
    3. Relevant Context Retrieval: Retrieve pertinent context from the vector database based on the query.
    4. Prompting the LLM: Armed with the retrieved context and original query, guide the LLM to generate a response infused with contextual depth.
    High-Level RAG architecture

    Psycho Expert RAG

    Before we crack open our code editor, let’s take a moment to understand why RAGs could hold immense potential in the field of psychology. Imagine having a digital companion capable of not only understanding your queries but also fetching relevant insights from a vast repository of psychology literature. With RAG, we’re stepping into a world where personalized mental health support and empathetic responses are just a few keystrokes away. Now let’s get to the code part!

    Getting Started: Library Installations and Setup

    Alright, let’s get those dependencies sorted out. We’ll need a bunch of libraries to kickstart our RAG journey. Don’t worry, we’ve got you covered with a simple pip install command:

    !pip install -q torch datasets
    !pip install -q accelerate==0.21.0 \
                    peft==0.4.0 \
                    bitsandbytes==0.40.2 \
                    trl==0.4.7
    !pip install -q langchain \
                    sentence_transformers \
                    faiss-cpu \
                    pypdf
    !pip install -U transformers

    Importing the Tools of the Trade

    Now that we’ve got our libraries in place, let’s bring in the heavy artillery. We’ll be working with PyTorch, Transformers, and a dash of LangChain magic to weave our RAG masterpiece.

    import os
    import torch
    from transformers import (
      AutoTokenizer,
      AutoModelForCausalLM,
      BitsAndBytesConfig,
      pipeline
    )
    
    from transformers import BitsAndBytesConfig
    
    from langchain.text_splitter import CharacterTextSplitter
    
    from langchain.embeddings.huggingface import HuggingFaceEmbeddings
    from langchain.vectorstores import FAISS
    
    from langchain.prompts import PromptTemplate
    from langchain.schema.runnable import RunnablePassthrough
    from langchain.llms import HuggingFacePipeline
    from langchain.chains import LLMChain
    import transformers
    import nest_asyncio
    
    nest_asyncio.apply()
    from tqdm.notebook import tqdm
    from langchain_community.document_loaders import PyPDFLoader

    Loading the Model and Tokenizer

    Ah, the heart and soul of our RAG application — Mistral-7B. Let’s fire up the model and tokenizer, but wait, let’s do it efficiently to fit it all on a single GPU. We’re all about optimization here!

    # Load Tokenizer
    model_name='mistralai/Mistral-7B-Instruct-v0.2'
    
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"
    print("Tokenizer loaded !!")
    ## bitsandbytes parameters
    # Activate 4-bit precision base model loading
    use_4bit = True
    
    # Compute dtype for 4-bit base models
    bnb_4bit_compute_dtype = "float16"
    
    # Quantization type (fp4 or nf4)
    bnb_4bit_quant_type = "nf4"
    
    # Activate nested quantization for 4-bit base models (double quantization)
    use_nested_quant = False
    
    compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
    
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=use_4bit,
        bnb_4bit_quant_type=bnb_4bit_quant_type,
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=use_nested_quant,
    )
    
    # Check GPU compatibility with bfloat16
    if compute_dtype == torch.float16 and use_4bit:
        major, _ = torch.cuda.get_device_capability()
        if major >= 8:
            print("=" * 80)
            print("Your GPU supports bfloat16: accelerate training with bf16=True")
            print("=" * 80)
    # Load Model
    model = AutoModelForCausalLM.from_pretrained(
            model_name,
                quantization_config=bnb_config,
            )

    Let’s test our vanilla model on a basic question

    def get_vanilla_response(question, tokenizer, model):
      inputs = tokenizer.encode_plus(f"[INST] {question} [/INST]", return_tensors="pt")['input_ids'].to('cuda')
    
      generated_ids = model.generate(inputs,
                                    max_new_tokens=1000,
                                    do_sample=True)
      output = tokenizer.batch_decode(generated_ids)
      return output[0]
    
    question = "What changes do I need in my personality to make friends easily?"
    print(get_vanilla_response(question = question, tokenizer = tokenizer, model = model))
    

    The following is the response we get, the response could change a little when run

    Question: What changes do I need in my personality to make friends easily?

    Response:

    Making friends can be a challenge for some people, and it often requires some adjustments to our personality traits and social behaviors. Here are some suggestions for changes that may help you make friends more easily:

    1. Be open and approachable: People are more likely to be drawn to those who are friendly and welcoming. Smile often, make eye contact, and show genuine interest in others.

    2. Be a good listener: Most people love to talk about themselves and share their stories. Be an attentive listener and show empathy when others speak. This will make them feel valued and appreciated.

    3. Be patient and persistent: Making friends takes time and effort. Don’t get discouraged if it takes a while to build a strong connection. Keep putting yourself out there and be persistent.

    4. Be optimistic: Having a positive attitude and a good sense of humor can help attract people to you. People are more likely to want to be around those who make them feel good.

    5. Be reliable and consistent: People value those who are dependable. Keep your commitments and be consistent in your actions and behaviors.

    6. Be authentic to yourself: Don’t try to be someone you’re not just to fit in. People are drawn to authenticity and genuine personality. Be yourself and let your unique qualities shine through.

    7. Be willing to try new things: Being open to new experiences and trying new things can help broaden your horizons and expand your social circle. This can lead to meeting new people and making new friends. keep in mind that making friends is a two-way street, so be open to forming genuine connections and being a good friend in return.

    This answer is pretty good but still has a generic nature and lacks a human touch. Now let’s try and make this better using RAG.

    Crafting the RAG Pipeline

    Now comes the fun part. Putting it all together to create our RAG pipeline. We’ll be using LangChain to orchestrate the retrieval and generation process seamlessly.

    # Create a text generation pipeline
    text_generation_pipeline = pipeline(
        model=model,
        tokenizer=tokenizer,
        task="text-generation",
        temperature=0.2,
        repetition_penalty=1.1,
        return_full_text=True,
        max_new_tokens=10000,
    )
    
    mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

    Now let us get to the data we shall use to create our psychology assistant.

    Psychology Book List

    1. “Thinking, Fast and Slow” by Daniel Kahneman: This book delves into the two systems of thinking, providing insights into cognitive biases and decision-making processes.
    2. “Influence: The Psychology of Persuasion” by Robert B. Cialdini: Cialdini explores the principles of influence, shedding light on psychological triggers that impact human behavior.
    3. “Predictably Irrational” by Dan Ariely: Ariely’s book explores the irrational behaviors that influence decision-making, providing valuable perspectives on human psychology.
    4. “Emotional Intelligence” by Daniel Goleman: This book emphasizes the importance of emotional intelligence in understanding oneself and others, a crucial aspect of human psychology.
    5. “The Social Animal” by Elliot Aronson: Aronson’s book covers a wide range of topics in social psychology, offering insights into human behavior in social contexts.
    6. “Nudge” by Richard H. Thaler and Cass R. Sunstein: Thaler and Sunstein discuss how subtle nudges can influence decision-making, combining insights from psychology and economics.
    # add Book paths from Google Drive
    pdf_paths = ['/content/drive/MyDrive/Blogs/psychology-gpt/Dan Ariely - Predictably Irrational_ The Hidden Forces That Shape Our Decisions-HarperCollins (2008).pdf',
                 '/content/drive/MyDrive/Blogs/psychology-gpt/Daniel Goleman - Emotional Intelligence_ Why it Can Matter More Than IQ-Bloomsbury (2009).pdf',
                 '/content/drive/MyDrive/Blogs/psychology-gpt/Daniel Kahneman - Thinking, Fast and Slow  .pdf',
                 '/content/drive/MyDrive/Blogs/psychology-gpt/Elliot Aronson - The Social Animal. Tenth Edition-Worth Publishers (2007).pdf',
                 '/content/drive/MyDrive/Blogs/psychology-gpt/Richard H. Thaler, Prof. Cass R. Sunstein - Nudge_ Improving Decisions About Health, Wealth, and Happiness-Yale University Press (2008).pdf',
                 '/content/drive/MyDrive/Blogs/psychology-gpt/Robert B. Cialdini - Influence_ The Psychology of Persuasion (Collins Business Essentials) (2007).pdf']
    docs = []
    for pdf_path in tqdm(pdf_paths):
      loader = PyPDFLoader(pdf_path)
      pages = loader.load_and_split()
      docs = docs+pages[8:-10]
    
    # Load chunked documents into the FAISS index
    db = FAISS.from_documents(docs,HuggingFaceEmbeddings(model_name='BAAI/bge-base-en-v1.5'))
    retriever = db.as_retriever()
    # Create prompt template
    prompt_template = """
    ### [INST] Instruction: Answer the question based on your human psychology knowledge, you can also use this auxillary knowledge to help:
    
    {context}
    
    ### QUESTION:
    {question} [/INST]
     """
    
    # Create prompt from prompt template
    prompt = PromptTemplate(
        input_variables=["context", "question"],
        template=prompt_template,
    )
    
    # Create llm chain
    llm_chain = LLMChain(llm=mistral_llm, prompt=prompt)
    
    rag_chain = (
     {"context": retriever, "question": RunnablePassthrough()}
        | llm_chain
    )

    Let’s test-drive our RAG

    Enough with the setup; let’s put our RAG to the test! We’ll throw some thought-provoking questions its way and see how it fares.

    # testing on the same question for a comparison
    question = "What changes do I need in my personality to make friends easily?"
    op = rag_chain.invoke(question)
    print(op['text'])

    Question: What changes do I need in my personality to make friends easily?

    Response:

    Based on the information provided in the documents, here are some suggestions for developing social skills to make friends more easily:

    1. Be attuned to others’ needs and feelings: Strive for a balance between being true to yourself and being sensitive to others’ needs. This will help you avoid becoming a social chameleon, who may win popularity but sacrifices true satisfaction.

    2. Honesty and authenticity: Be genuine in your interactions with others. People appreciate authenticity and are more likely to form lasting relationships with those who are true to themselves.

    3. Active listening: Pay attention to what others are saying and respond appropriately. This shows respect and interest in their thoughts and experiences.

    4. Empathy: Try to understand others’ perspectives and emotions. This will help you connect with them on a deeper level and build stronger relationships.

    5. Adaptability: Be flexible in your interactions with others. This means being able to adjust your behavior to fit different social situations and adapt to the needs and preferences of the people around you.

    6. Positive body language: Use open and positive body language to convey warmth and approachability. This includes maintaining eye contact, smiling, and using appropriate facial expressions.

    7. Approachability: Make yourself available and approachable to others. This means being open to new experiences and meeting new people, and creating opportunities for social interaction.

    8. Practice active engagement: Engage fully in conversations and activities with others. This means being present in the moment and actively participating in the interaction, rather than just going through the motions.

    By focusing on these areas, you can develop the social skills necessary to make friends more easily and build strong, meaningful relationships.

    The Upside

    So, when we compare the two answers, we notice some similarities and differences in how they dish out advice on boosting those social skills for making friends.

    Now, the RAG-based answer? It’s like the friend who gives you the whole breakdown — detailed, with specific examples for each skill. It’s all about practical tips and strategies, giving you clear steps to level up your social game. On the flip side, the vanilla answer goes a bit broader, taking a philosophical turn.

    In essence, RAG enhances the quality and depth of responses by augmenting the capabilities of the base language model with external data sources, making it particularly valuable for tasks requiring domain-specific expertise, such as providing advice on social skills and psychology.

    Share

    Subscribe to newsletter

    Blog form image