Generative AI and particularly the language-flavor of it – ChatGPT is in all places. Large Language Model (LLM) technology will play a big role in the event of future applications. LLMs are excellent at understanding language due to the extensive pre-training that has been done for foundation models on trillions of lines of public domain text, including code. Methods like supervised fine-tuning and reinforced learning with human feedback (RLHF) make these LLM much more efficient in answering specific questions and conversing with users. As we get into next phase of AI apps powered by LLMs – following key components will likely be crucial for these next-gen applications. The figure below shows this progression, and as you progress up the chain, you construct more intelligence and autonomy in your applications. Let’s take a look at these various levels.
LLM calls:
These are direct calls to completion or chat models by a LLM provider like Azure OpenAI or Google PaLM or Amazon Bedrock. These calls have a really basic prompt and mostly use the interior memory of the LLM to supply the output.
Example: Asking a basic model like “text-davinci” to “tell a joke”. You give little or no context and model relies on its internal pre-trained memory to give you a solution (highlighted in green in figure below – using Azure OpenAI).
Prompts:
Next level of intelligence is in adding an increasing number of context into prompts. There are techniques for prompt engineering that will be applied to LLMs that could make them give customized responses. For instance, when generating an email to a user, some context concerning the user, past purchases and behavior patterns can function prompt to higher customize the e-mail. Users acquainted with ChatGPT will know different methods of prompting like giving examples that are utilized by the LLM to construct response. Prompts augment the interior memory of the LLM with additional context. Example is below.
Embeddings:
Embeddings take prompts to the following level by searching a knowledge store for context and obtaining that context and appending to the prompt. Here, step one is to make a big document store with unstructured text searchable by indexing the text and populating a vector database. For this an embedding model like ‘ada’ by OpenAI is used that takes a piece of text and converts it right into a n-dimensional vector. These embeddings capture the context of the text, so similar sentences can have embeddings which might be close to one another in vector space. When user enters a question, that question can be converted into embedding and that vector is matched against vectors in database. Thus, we get top 5 or 10 matching text chunks for the query which form the context. The query and context are passed to LLM to reply the query in a human-like manner.
Chains:
Today Chains is essentially the most advanced and mature technology available that’s extensively getting used to construct LLM applications. Chains are deterministic where a sequence of LLM calls are joined along with output from one flowing into certainly one of more LLMs. For instance, we could have a LLM call query a SQL database and get list of customer emails and send that list to a different LLM that may generate personalized emails to Customers. These LLM chains will be integrated in existing application flows to generate more invaluable outcomes. Using chains, we could augment LLM calls with external inputs like API calls and integration with knowledge graphs to offer context. Furthermore, today with multiple LLM providers available like OpenAI, AWS Bedrock, Google PaLM, MosaicML, etc. we could mix and match LLM calls into chains. For chain elements with limited intelligence a lower LLM like ‘gpt3.5-turbo’ might be used while for more advanced tasks ‘gpt4’ might be used. Chains give an abstraction for data, applications and LLM calls.
Agents:
Agents is a subject of many online debates particularly with respect to being artificial general intelligence (AGI). Agents use a complicated LLM like ‘gpt4’ or ‘PaLM2’ to plan tasks relatively than having pre-defined chains. So now when there are user requests, based on query the agent decides what set of tasks to call and dynamically builds a sequence. For instance, once we configure an agent with a command like “notify customers when loan APR changes resulting from government regulation update”. The agent framework makes a LLM call to determine on the steps to take or chains to construct. Here it can involve invoking an app that scrapes regulatory web sites and extracts latest APR rate, then a LLM call searches database and extracts customer emails that are affected and at last an email is generated to notify everyone.
Final Thoughts
LLM is a highly evolving technology and higher models and applications are being launched every week. LLM to Agents is the intelligence ladder and as we move up, we construct complex autonomous applications. Higher models will mean more practical agents and the next-gen applications will likely be powered by these. Time will tell how advanced the following gen applications will likely be and what patterns they will likely be powered by.