Part 3/3: Creating a Markdown Q&A ChatBot: Streamlit App to Chat with Temporal Docs

In part one of this series, we downloaded the TemporalIO Java documentation and transformed it into markdown files. If you haven't checked that out, go see this post. In part two, we focused on chunking the documentation using LangChain and OpenAI and then creating embeddings to store in our Pinecone index. You can see that post here. If you're only interested in the full code, check out my GitHub repo and give it a star!

In this final part, we will cover how to create a Streamlit app to host the chatbot and ask questions over the TemporalIO documentation.

Crafting the StreamLit App - The Q&A Bot

Step A: Import the necessary libraries

from common import *
import streamlit as st
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings

common functions like initialize_session().
streamlit is the main module for creating web applications.
Various langchain functionalities are imported, focused on conversational memory, retrieval-based Q&A, vector storage in Pinecone, and embeddings using OpenAI.

Step B: Initialize Session, Embeddings and Vector store

initialize_session()
embeddings = OpenAIEmbeddings()
temporal_docs_vector_store = Pinecone.from_existing_index(st.session_state["index_name"], embeddings)

OpenAI embeddings are initialized.
A vector store, containing embeddings of documents, is loaded from an existing Pinecone index.

Step C: Configure Conversational Memory

conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k = 5,
    return_messages = True
)

This sets up a memory buffer for the chat, which can remember the last 5 messages. This can be useful to maintain context in ongoing conversations.

Step D: Setting up the Prompt Template for OpenAI

template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use five sentences maximum and include java code examples where applicable. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

A template is defined to instruct the model on how to use the provided context and answer a given question. We are letting the model know that it should use the given context to answer the question, and if it does not know the answer from that context, say it doesn't know. The OpenAI model (GPT-3.5 Turbo) is also initialized here.

Step E: Streamlit Interface - Chat UI

st.header("`Chat with Temporal Java Docs`")

if "messages" not in st.session_state:
    st.session_state.messages = []

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

A header is displayed for the chat UI.
If no messages exist in the session, an empty message list is initialized.
Previous chat messages are displayed by iterating over the messages in session_state.

Step F: User Interaction & Assistant Response

if prompt := st.chat_input("What is up?"):
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": prompt})
    # Display user message in chat message container
    with st.chat_message("user"):
        st.markdown(prompt)

    # Display assistant response in chat message container
    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        full_response = ""
        result_docs = temporal_docs_vector_store.similarity_search(
            prompt,  # our search query
            k=3  # return 3 most relevant docs
        )
        print("result docs length:", len(result_docs))
        result_vector_store = Pinecone.from_documents(
            result_docs,
            embeddings,
            index_name=st.session_state["index_name"]
        )
        qa_chain = RetrievalQA.from_chain_type(
            llm,
            retriever=result_vector_store.as_retriever(),
            chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
        )
        result = qa_chain({"query": prompt})
        full_response = result["result"]
        message_placeholder.markdown(full_response)
    # Add assistant response to chat history
    st.session_state.messages.append({"role": "assistant", "content": full_response})

This section manages the interaction:

The user provides input through the chat interface.
The user's message is displayed.
The user's message is processed to generate a relevant response:
- The vector store is searched for documents similar to the user's question.
- The most relevant documents are then passed through a retrieval-based Q&A chain.
- The response from this chain is the final message displayed as the assistant's answer

Conclusion

This Streamlit app provides a chatbot interface where users can converse with TemporalIO's Java Documentation. When a user submits a question, the system looks up the most relevant documents (using embeddings and Pinecone vector search). It then uses a retrieval-based Q&A mechanism, powered by OpenAI and LangChain, to generate a concise answer from the relevant document context. This process aims to provide users with precise answers from the Temporal Java documentation. I hope you have enjoyed reading along as I learn the different aspects of AI. To check out the full repo, take a look here.

Software engineer learns AI