
With the increasing popularity of Large Language Models (LLMs), latest research and advancements are getting introduced almost every single day. Using deep learning technologies and the facility of Artificial Intelligence, LLMs are constantly evolving and spreading in every domain. LLMs are trained on massive amounts of raw text, and to be able to enhance their performance, these models are fine-tuned. Through the technique of fine-tuning, LLMs are trained on particular tasks using direct training signals that measure their performance, corresponding to classification accuracy, query answering, document summarization, etc.
Recently, a brand new fine-tuning paradigm called LETI (Learn from Textual Interactions) has been introduced, which dives into the potential that Large Language Models can learn from textual interactions & feedback. LETI enables language models to grasp not only in the event that they were incorrect but why they’re incorrect. This approach enables LLMs to surpass the constraints of learning solely from labels and scalar rewards.
The team of researchers behind the event of LETI has mentioned how this approach provides textual feedback to the language model. It helps check the correctness of the model’s outputs with the assistance of binary labels and identifies and explains errors in its generated code. The LETI paradigm is identical to the iterative technique of software development, which involves a developer writing a program, testing it, and improving it based on feedback. Similarly, LETI fine-tunes the LLM by providing textual feedback that pinpoints bugs and errors.
Through the fine-tuning process, the model is prompted with a natural language problem description, followed by which it generates a set of solutions. A Solution Evaluator then evaluates these solutions using a set of test cases. The researchers used a Python interpreter to make use of the error messages and stack traces obtained from the generated code because the source of textual feedback. The Solution Evaluator is that Python interpreter.
The training data used for fine-tuning the model consists of three components: natural language instructions, LM-generated programs, and textual feedback. When the generated program is unable to supply an answer, feedback is provided to the LLM. Otherwise, a reward token is provided to the model in the shape of binary feedback to encourage it to generate an accurate solution. The generated textual feedback is utilized in the fine-tuning technique of the LM, often called Feedback-Conditioned Wonderful-Tuning.
For the evaluation process, the researchers have used a dataset of code generation tasks called the MBPP (Multiple Big Programming Problems) datasets. The outcomes have shown that LETI significantly improves the performance of two base LMs of various scales on the MBPP dataset without requiring ground-truth outputs for training. On the HumanEval dataset, LETI achieves an identical or higher performance than the bottom LMs on unseen problems. Furthermore, researchers have found that, as in comparison with binary feedback, using textual feedback allows the model to attain the identical performance but with fewer gradient steps.
In conclusion, LETI is a fantastic approach for fine-tuning which reinforces language models through the use of detailed textual feedback. It enables them to learn from mistakes and improve performance in tasks like code generation. LETI seems promising.
Take a look at the Paper and GitHub link. Don’t forget to affix our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more. If you may have any questions regarding the above article or if we missed anything, be happy to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Tanya Malhotra is a final 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and significant pondering, together with an ardent interest in acquiring latest skills, leading groups, and managing work in an organized manner.