
Guide to developing an informative QA bot with displayed sources used

17 hours ago
A Query Answering system may be of great assist in analyzing large amounts of your data or documents. Nonetheless, the sources (i.e., parts of your document) that the model used to create the reply are often not shown in the ultimate answer.
Understanding the context and origin of responses is beneficial not just for users searching for accurate information, but in addition for developers wanting to repeatedly improve their QA bots. With the sources included in the reply, developers gain beneficial insights into the model’s decision-making process, facilitating iterative improvements and fine-tuning.
This text shows how you can use LangChain and GPT-3 (text-davinci-003) to create a transparent Query-Answering bot that displays the sources used to generate the reply through the use of two examples.
In the primary example, you’ll learn how you can create a transparent QA bot that leverages your website’s content to reply questions. Within the second example, we’ll explore using transcripts from different YouTube videos, each with and without timestamps.
Before we are able to leverage the capabilities of an LMM like GPT-3, we want to process our documents (e.g., website content or YouTube transcripts) in the right format (first chunks, then embeddings) and store them in a vector store. Figure 1 below shows the method flow from left to right.
Website content example
In this instance, we’ll process the content of the net portal, It’s FOSS, which focuses on Open Source technologies, with a selected concentrate on Linux.
First, we want to acquire a list of all of the articles we want to process and store in our vector store. The code below reads the sitemap-posts.xml file, which incorporates a listing of links to all of the articles.