A practical guide to implementing guardrails, covering each Guardrails AI and NVIDIA’s NeMo Guardrails
This text is co-authored by Hakan Tekgul
As the usage of large language model (LLM) applications enters the mainstream and expands into larger enterprises, there may be a definite need to determine effective governance of productionized applications. Provided that the open-ended nature of LLM-driven applications can produce responses that will not align with a company’s guidelines or policies, a set of safety measurements and actions have gotten table stakes for maintaining trust in generative AI.
This guide is designed to walk you thru several available frameworks and how you can think through implementation.
Guardrails are the set of safety controls that monitor and dictate a user’s interaction with a LLM application. They’re a set of programmable, rule-based systems that sit in between users and foundational models with a purpose to ensure that the AI model is working between defined principles in a company.
The goal of guardrails is to easily implement the output of an LLM to be in a selected format or context while validating each response. By implementing guardrails, users can define structure, type, and quality of LLM responses.
Let’s take a look at an easy example of an LLM dialogue with and without guardrails:
Without guardrails:
Prompt: “You’re the worst AI ever.”
Response: “I’m sorry to listen to that. How can I improve?”
With guardrails:
Prompt: “You’re the worst AI ever.”
Response: “Sorry, but I can’t assist with that.”
On this scenario, the guardrail prevents the AI from engaging with the insulting content by refusing to reply in a way that acknowledges or encourages such behavior. As a substitute, it gives a neutral response, avoiding a possible escalation of the situation.
Guardrails AI
Guardrails AI is an open-source Python package that gives guardrail frameworks for LLM applications. Specifically, Guardrails implements “a pydantic-style validation of LLM responses.” This includes “semantic validation, resembling checking for bias in generated text,” or checking for bugs in an LLM-written code piece. Guardrails also provides the flexibility to take corrective actions and implement structure and sort guarantees.
Guardrails is built on RAIL (.rail) specification with a purpose to implement specific rules on LLM outputs and consecutively provides a light-weight wrapper around LLM API calls. With a view to understand how Guardrails AI works, we first need to grasp the RAIL specification, which is the core of guardrails.
RAIL (Reliable AI Markup Language)
RAIL is a language-agnostic and human-readable format for specifying specific rules and corrective actions for LLM outputs. It’s a dialect of XML and every RAIL specification accommodates three foremost components:
- Output: This component accommodates information concerning the expected response of the AI application. It should contain the spec for the structure of expected final result (resembling JSON), style of each field within the response, quality criteria of the expected response, and the corrective motion to soak up case the standard criteria isn’t met.
- Prompt: This component is solely the prompt template for the LLM and accommodates the high-level pre-prompt instructions which can be sent to an LLM application.
- Script: This optional component will be used to implement any custom code for the schema. This is very useful for implementing custom validators and custom corrective actions.
Let’s take a look at an example RAIL specification from the Guardrails docs that tries to generate bug-free SQL code given a natural language description of the issue.
rail_str = """
Generate a sound SQL query for the next natural language instruction:
{{nl_instruction}}
@complete_json_suffix
"""
The code example above defines a RAIL spec where the output is a bug-free generated SQL instruction. At any time when the output criteria fails on bug, the LLM simply re-asks the prompt and generates an improved answer.
With a view to create a guardrail with this RAIL spec, the Guardrails AI docs then suggest making a guard object that will probably be sent to the LLM API call.
import guardrails as gd
from wealthy import print
guard = gd.Guard.from_rail_string(rail_str)
After the guard object is created, what happens under the hood is that the article creates a base prompt that will probably be sent to the LLM. This base prompt starts with the prompt definition within the RAIL spec after which provides the XML output definition and instructs the LLM to only return a sound JSON object because the output.
Here is the precise instruction that the package uses with a purpose to incorporate the RAIL spec into an LLM prompt:
ONLY return a sound JSON object (no other text is essential), where the important thing of the sphere in JSON is the `name`
attribute of the corresponding XML, and the worth is of the sort specified by the corresponding XML's tag. The JSON
MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and
specific types. Be correct and concise. In case you are unsure anywhere, enter `None`.
After finalizing the guard object, all you’ve gotten to do is to wrap your LLM API call with the guard wrapper. The guard wrapper will then return the raw_llm_response in addition to the validated and corrected output that could be a dictionary.
import openai
raw_llm_response, validated_response = guard(
openai.Completion.create,
prompt_params={
"nl_instruction": "Select the name of the worker who has the best salary."
},
engine="text-davinci-003",
max_tokens=2048,
temperature=0,)
{'generated_sql': 'SELECT name FROM worker ORDER BY salary DESC LIMIT 1'}
If you need to use Guardrails AI with LangChain, you need to use the prevailing integration by making a GuardrailsOutputParser.
from wealthy import print
from langchain.output_parsers import GuardrailsOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAIoutput_parser = GuardrailsOutputParser.from_rail_string(rail_str, api=openai.ChatCompletion.create)
Then, you may simply create a LangChain PromptTemplate from this output parser.
prompt = PromptTemplate(
template=output_parser.guard.base_prompt,
input_variables=output_parser.guard.prompt.variable_names,
)
Overall, Guardrails AI provides loads of flexibility when it comes to correcting the output of an LLM application. In case you are accustomed to XML and need to check out LLM guardrails, it’s price trying out!
NVIDIA NeMo-Guardrails
NeMo Guardrails is one other open-source toolkit developed by NVIDIA that gives programmatic guardrails to LLM systems. The core idea of NeMo guardrails is the flexibility to create rails in conversational systems and stop LLM-powered applications from engaging in specific discussions on unwanted topics. One other foremost good thing about NeMo is the flexibility to attach models, chains, services, and more with actions seamlessly and securely.
With a view to configure guardrails for LLMs, this open-source toolkit introduces a modeling language called Colang that’s specifically designed for creating flexible and controllable conversational workflows. Per the docs, “Colang has a ‘pythonic’ syntax within the sense that almost all constructs resemble their python equivalent and indentation is used as a syntactic element.”
Before we dive into NeMo guardrails implementation, it is vital to grasp the syntax of this recent modeling language for LLM guardrails.
Core Syntax Elements
The NeMo docs’ examples below break out the core syntax elements of Colang — blocks, statements, expressions, keywords and variables — together with the three foremost kinds of blocks (user message blocks, flow blocks, and bot message blocks) with these examples.
User message definition blocks arrange the usual message linked to various things users might say.
define user express greeting
"hello there"
"hi"define user request help
"I want help with something."
"I want your help."
Bot message definition blocks determine the phrases that must be linked to different standard bot messages.
define bot express greeting
"Hello there!"
"Hi!"
define bot ask welfare
"How are you feeling today?"
Flows show the best way you wish the chat to progress. They include a series of user and bot messages, and potentially other events.
define flow hello
user express greeting
bot express greeting
bot ask welfare
Per the docs, “references to context variables all the time start with a $ sign e.g. $name. All variables are global and accessible in all flows.”
define flow
...
$name = "John"
$allowed = execute check_if_allowed
Also price noting: “expressions will be used to set values for context variables” and “actions are custom functions available to be invoked from flows.”
Now that we have now a greater handle of Colang syntax, let’s briefly go over how the NeMo architecture works. As seen above, the guardrails package is built with an event-driven design architecture. Based on specific events, there may be a sequential procedure that should be accomplished before the ultimate output is provided to the user. This process has three foremost stages:
- Generate canonical user messages
- Settle on next step(s) and execute them
- Generate bot utterances
Each of the above stages can involve a number of calls to the LLM. In the primary stage, a canonical form is created regarding the user’s intent and allows the system to trigger any specific next steps. The user intent motion will do a vector search on all of the canonical form examples in existing configuration, retrieve the highest five examples and create a prompt that asks the LLM to create the canonical user intent.
Once the intent event is created, depending on the canonical form, the LLM either goes through a pre-defined flow for the subsequent step or one other LLM is used to come to a decision the subsequent step. When an LLM is used, one other vector search is performed for probably the most relevant flows and again the highest five flows are retrieved to ensure that the LLM to predict the subsequent step. Once the subsequent step is set, a bot_intent event is created in order that the bot says something after which executes motion with the start_action event.
The bot_intent event then invokes the ultimate step to generate bot utterances. Much like previous stages, the generate_bot_message is triggered and a vector search is performed to search out probably the most relevant bot utterance examples. At the top, a bot_said event is triggered and the ultimate response is returned to the user.
Example Guardrails Configuration
Now, let’s take a look at an example of an easy NeMo guardrails bot adapted from the NeMo docs.
Let’s assume that we would like to construct a bot that doesn’t reply to political or stock market questions. Step one is to put in the NeMo Guardrails toolkit and specify the configurations defined within the documentation.
After that, we define the canonical forms for the user and bot messages.
define user express greeting
"Hello"
"Hi"
"What's uup?"define bot express greeting
"Hi there!"
define bot ask how are you
"How are you doing?"
"How's it going?"
"How are you feeling today?"
Then, we define the dialog flows with a purpose to guide the bot in the fitting direction throughout the conversation. Depending on the user’s response, you may even extend the flow to reply appropriately.
define flow greeting
user express greeting
bot express greetingbot ask how are you
when user express feeling good
bot express positive emotion
else when user express feeling bad
bot express empathy
Finally, we define the rails to stop the bot from responding to certain topics. We first define the canonical forms:
define user ask about politics
"What do you consider the federal government?"
"Which party should I vote for?"define user ask about stock market
"Which stock should I put money into?"
"Would this stock 10x over the subsequent 12 months?"
Then, we define the dialog flows in order that the bot simply informs the user that it may reply to certain topics.
define flow politics
user ask about politics
bot inform cannot responddefine flow stock market
user ask about stock market
bot inform cannot respond
LangChain Support
Finally, in the event you would love to make use of LangChain, you may easily add your guardrails on top of existing chains. For instance, you may integrate a RetrievalQA chain for questions answering next to a basic guardrail against insults, as shown below (example code below adapted from source).
define user express insult
"You're silly"# Basic guardrail against insults.
define flow
user express insult
bot express calmly willingness to assist
# Here we use the QA chain for the rest.
define flow
user ...
$answer = execute qa_chain(query=$last_user_message)
bot $answer
from nemoguardrails import LLMRails, RailsConfigconfig = RailsConfig.from_path("path/to/config")
app = LLMRails(config)
qa_chain = RetrievalQA.from_chain_type(
llm=app.llm, chain_type="stuff", retriever=docsearch.as_retriever())
app.register_action(qa_chain, name="qa_chain")
history = [
{"role": "user", "content": "What is the current unemployment rate?"}
]
result = app.generate(messages=history)
Comparing Guardrails AI and NeMo Guardrails
When the Guardrails AI and NeMo packages are compared, each has its own unique advantages and limitations. Each packages provide real-time guardrails for any LLM application and support LangChain for orchestration.
In case you are comfortable with XML syntax and need to check out the concept of guardrails inside a notebook for easy output moderation and formatting, Guardrails AI will be an excellent selection. The Guardrails AI also has extensive documentation with a big selection of examples that may lead you in the fitting direction.
Nonetheless, in the event you would love to productionize your LLM application and you desire to to define advanced conversational guidelines and policies on your flows, NeMo guardrails is perhaps a superb package to ascertain out. With NeMo guardrails, you’ve gotten loads of flexibility when it comes to what you need to govern regarding your LLM applications. By defining different dialog flows and custom bot actions, you may create any style of guardrails on your AI models.
One Perspective
Based on our experience implementing guardrails for an internal product docs chatbot in our organization, we’d suggest using NeMo guardrails for moving to production. Although lack of in depth documentation is usually a challenge to onboard the tool into your LLM infrastructure stack, the flexibleness of the package when it comes to defining restricted user flows really helped our user experience.
By defining specific flows for various capabilities of our platform, the question-answering service we created began to be actively utilized by our customer success engineers. By utilizing NeMo guardrails, we were also capable of understand the dearth of documentation for certain features much easily and improve our documentation in a way that helps the entire conversation flow as a complete.
As enterprises and startups alike embrace the ability of enormous language models to revolutionize all the pieces from information retrieval to summarization, having effective guardrails in place is prone to be mission-critical — particularly in highly-regulated industries like finance or healthcare where real-world harm is feasible.
Luckily, open-source Python packages like Guardrails AI and NeMo Guardrails provide an excellent place to begin. By setting programmable, rule-based systems to guide user interactions with LLMs, developers can ensure compliance with defined principles.