Home Artificial Intelligence Why Your RAG Is Not Reliable in a Production Environment RAG in a nutshell ⚙️

Why Your RAG Is Not Reliable in a Production Environment RAG in a nutshell ⚙️

0
Why Your RAG Is Not Reliable in a Production Environment
RAG in a nutshell ⚙️

With the rise of LLMs, the Retrieval Augmented Generation (RAG) framework also gained popularity by making it possible to construct question-answering systems over data.

We’ve all seen those demos of chatbots conversing with PDFs or emails.

While these systems are definitely impressive, they may not be reliable in production without tweaking and experimentation.

On this post, I explore the issues behind the RAG framework and go over some suggestions to enhance its performance. This goes from leveraging document metadata to fine-tuning hyperparameters.

These findings are based on my experience as an ML engineer who’s still learning about this tech and constructing RAGs within the pharmaceutical industry.

Without much further ado, let’s take a look 🔍

Let’s get the fundamentals right first.

Here’s how RAG works.

It first takes an input query and retrieves relevant documents to it from an external database. Then, it passes those chunks as a context in a prompt to assist an LLM generate an augmented answer.

That’s mainly saying:

“Hey LLM, here’s my query, and listed here are some pieces of text to show you how to understand the issue. Give me a solution.”

Image by the writer

You must not be fooled by the simplicity of this diagram.

In reality, RAG hides a certain complexity and involves the next components behind the scenes:

  • Loaders to parse external data in several formats: PDFs, web sites, Doc files, etc.
  • Splitters to chunk the raw data into smaller pieces of text
  • An embedding model to convert the chunks into vectors
  • A vector database to store the vectors and query them
  • A prompt to mix the query and the retrieved documents

LEAVE A REPLY

Please enter your comment!
Please enter your name here