More sophisticated approaches to solving much more complex tasks are actually being actively developed. While they significantly outperform in some scenarios, their practical usage stays somewhat limited. I’ll mention two such techniques: self-consistency and the Tree of Thoughts.
The authors of the self-consistency paper offered the next approach. As a substitute of just counting on the initial model output, they suggested sampling multiple times and aggregating the outcomes through majority voting. By counting on each intuition and the success of ensembles in classical machine learning, this method enhances the model’s robustness.
It’s also possible to apply self-consistency without implementing the aggregation step. For tasks with short outputs ask the model to suggest several options and select the most effective one.
Tree of Thoughts (ToT) takes this idea a stride further. It puts forward the thought of applying tree-search algorithms for the model’s “reasoning thoughts”, essentially backtracking when it stumbles upon poor assumptions.
In the event you have an interest, take a look at Yannic Kilcher’s video with a ToT paper review.
For our particular scenario, utilizing Chain-of-Thought reasoning is just not vital, yet we are able to prompt the model to tackle the summarization task in two phases. Initially, it could actually condense all the job description, after which summarize the derived summary with a give attention to job responsibilities.
On this particular example, the outcomes didn’t show significant changes, but this approach works thoroughly for many tasks.
Few-shot Learning
The last technique we are going to cover known as few-shot learning, also often known as in-context learning. It’s so simple as incorporating several examples into your prompt to supply the model with a clearer picture of your task.
These examples shouldn’t only be relevant to your task but in addition diverse to encapsulate the variability in your data. “Labeling” data for few-shot learning could be a bit more difficult while you’re using CoT, particularly in case your pipeline has many steps or your inputs are long. Nevertheless, typically, the outcomes make it well worth the effort. Also, consider that labeling just a few examples is way inexpensive than labeling a complete training/testing set as in traditional ML model development.
If we add an example to our prompt, it should understand the necessities even higher. As an illustration, if we show that we’d prefer the ultimate summary in bullet-point format, the model will mirror our template.
This prompt is sort of overwhelming, but don’t be afraid: it’s only a previous prompt (v5) and one labeled example with one other job description within the For instance: 'input description' -> 'output JSON'
format.
Summarizing Best Practices
To summarize the most effective practices for prompt engineering, consider the next:
- Don’t be afraid to experiment. Try different approaches and iterate regularly, correcting the model and taking small steps at a time;
- Use separators in input (e.g. <>) and ask for a structured output (e.g. JSON);
- Provide an inventory of actions to finish the duty. At any time when feasible, offer the model a set of actions and let it output its “internal thoughts”;
- In case of short outputs ask for multiple suggestions;
- Provide examples. If possible, show the model several diverse examples that represent your data with the specified output.
I’d say that this framework offers a sufficient basis for automating a wide selection of day-to-day tasks, like information extraction, summarization, text generation corresponding to emails, etc. Nevertheless, in a production environment, it remains to be possible to further optimize models by fine-tuning them on specific datasets to further enhance performance. Moreover, there’s rapid development within the plugins and agents, but that’s a complete different story altogether.
Prompt Engineering Course by DeepLearning.AI and OpenAI
Together with the earlier-mentioned talk by Andrej Karpathy, this blog post draws its inspiration from the ChatGPT Prompt Engineering for Developers course by DeepLearning.AI and OpenAI. It’s absolutely free, takes just a few hours to finish, and, my personal favorite, it allows you to experiment with the OpenAI API without even signing up!
That’s an incredible playground for experimenting, so definitely test it out.
Wow, we covered quite lots of information! Now, let’s move forward and begin constructing the appliance using the knowledge we have now gained.
Generating OpenAI Key
To start, you’ll have to register an OpenAI account and create your API key. OpenAI currently offers $5 of free credit for 3 months to each individual. Follow the introduction to the OpenAI API page to register your account and generate your API key.
Once you could have a key, create an OPENAI_API_KEY
environment variable to access it within the code with os.getenv('OPENAI_API_KEY')
.
Estimating the Costs with Tokenizer Playground
At this stage, you could be interested by how much you possibly can do with only a free trial and what options can be found after the initial three months. It’s a fairly good query to ask, especially while you consider that LLMs cost tens of millions of dollars!
In fact, these tens of millions are about training. It seems that the inference requests are quite reasonably priced. While GPT-4 could also be perceived as expensive (although the worth is prone to decrease), gpt-3.5-turbo
(the model behind default ChatGPT) remains to be sufficient for the vast majority of tasks. Actually, OpenAI has done an incredible engineering job, given how inexpensive and fast these models are actually, considering their original size in billions of parameters.
The gpt-3.5-turbo
model comes at a price of $0.002 per 1,000 tokens.
But how much is it? Let’s see. First, we want to know what’s a token. In easy terms, a token refers to an element of a word. Within the context of the English language, you possibly can expect around 14 tokens for each 10 words.
To get a more accurate estimation of the variety of tokens in your specific task and prompt, the most effective approach is to offer it a try! Luckily, OpenAI provides a tokenizer playground that may provide help to with this.
Side note: Tokenization for Different Languages
Resulting from the widespread use of English on the Web, this language advantages from essentially the most optimal tokenization. As highlighted within the “All languages will not be tokenized equal” blog post, tokenization is just not a uniform process across languages, and certain languages may require a greater variety of tokens for representation. Keep this in mind if you would like to construct an application that involves prompts in multiple languages, e.g. for translation.
As an instance this point, let’s take a take a look at the tokenization of pangrams in numerous languages. On this toy example, English required 9 tokens, French — 12, Bulgarian — 59, Japanese — 72, and Russian — 73.
Cost vs Performance
As you will have noticed, prompts can turn into quite lengthy, especially when incorporating examples. By increasing the length of the prompt, we potentially enhance the standard, but the associated fee grows concurrently we use more tokens.
Our latest prompt (v6) consists of roughly 1.5k tokens.
Considering that the output length is usually the identical range because the input length, we are able to estimate a median of around 3k tokens per request (input tokens + output tokens). By multiplying this number by the initial cost, we discover that each request is about $0.006 or 0.6 cents, which is sort of reasonably priced.
Even when we consider a rather higher cost of 1 cent per request (corresponding to roughly 5k tokens), you’ll still find a way to make 100 requests for just $1. Moreover, OpenAI offers the flexibleness to set each soft and hard limits. With soft limits, you receive notifications while you approach your defined limit, while hard limits restrict you from exceeding the desired threshold.
For local use of your LLM application, you possibly can comfortably configure a tough limit of $1 monthly, ensuring that you simply remain inside budget while having fun with the advantages of the model.
Streamlit App Template
Now, let’s construct an online interface to interact with the model programmatically eliminating the necessity to manually copy prompts every time. We’ll do that with Streamlit.
Streamlit is a Python library that permits you to create easy web interfaces without the necessity for HTML, CSS, and JavaScript. It’s beginner-friendly and enables the creation of browser-based applications using minimal Python knowledge. Let’s now create an easy template for our LLM-based application.
Firstly, we want the logic that may handle the communication with the OpenAI API. In the instance below, I consider generate_prompt()
function to be defined and return the prompt for a given input text (e.g. just like what you saw before).
And that’s it! Know more about different parameters in OpenAI’s documentation, but things work well just out of the box.
Having this code, we are able to design an easy web app. We’d like a field to enter some text, a button to process it, and a few output widgets. I prefer to have access to each the complete model prompt and output for debugging and exploring reasons.
The code for all the application will look something like this and may be present in this GitHub repository. I even have added a placeholder function called toy_ask_chatgpt()
since sharing the OpenAI key is just not an excellent idea. Currently, this application simply copies the prompt into the output.
Without defining functions and placeholders, it is barely about 50 lines of code!
And because of a recent update in Streamlit it now allows embed it right in this text! So it is best to find a way to see it right below.
Now you see how easy it’s. In the event you wish, you possibly can deploy your app with Streamlit Cloud. But watch out, since every request costs you money should you put your API key there!
On this blog post, I listed several best practices for prompt engineering. We discussed iterative prompt development, using separators, requesting structural output, Chain-of-Thought reasoning, and few-shot learning. I also provided you with a template to construct an easy web app using Streamlit in under 100 lines of code. Now, it’s your turn to provide you with an exciting project idea and switch it into reality!
It’s truly amazing how modern tools allow us to create complex applications in only just a few hours. Even without extensive programming knowledge, proficiency in Python, or a deep understanding of machine learning, you possibly can quickly construct something useful and automate some tasks.
Don’t hesitate to ask me questions should you’re a beginner and wish to create an identical project. I’ll be greater than joyful to help you and respond as soon as possible. Better of luck together with your projects!