Home Community This AI Paper from UCLA Introduces ‘SPIN’ (Self-Play fIne-tuNing): A Machine Learning Method to Convert a Weak LLM to a Strong LLM by Unleashing the Full Power of Human-Annotated Data

This AI Paper from UCLA Introduces ‘SPIN’ (Self-Play fIne-tuNing): A Machine Learning Method to Convert a Weak LLM to a Strong LLM by Unleashing the Full Power of Human-Annotated Data

This AI Paper from UCLA Introduces ‘SPIN’ (Self-Play fIne-tuNing): A Machine Learning Method to Convert a Weak LLM to a Strong LLM by Unleashing the Full Power of Human-Annotated Data

Large Language Models (LLMs) have ushered a brand new era in the sector of Artificial Intelligence (AI) through their exceptional natural language processing capabilities. From mathematical reasoning to code generation and even drafting legal opinions, LLMs find their applications in almost every field. To align the performance of such models with desirable behavior, they’re fine-tuned using techniques like Supervised Wonderful-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). Nevertheless, the difficulty is that these methods require a major volume of human-annotated data, making the method resource-intensive and time-consuming.

On this research paper, researchers from UCLA have tried to empower a weak LLM to enhance its performance without requiring additional human-annotated data. They’ve introduced a novel fine-tuning method called , which allows the model to have interaction in self-play, i.e., ‘playing’ against itself without requiring any direct supervision.

There have been previous works to handle this problem, akin to using synthetic data with binary feedback in self-training and employing a weak model to guide the stronger one. SPIN, nonetheless, is a more efficient approach that eliminates the necessity for human binary feedback and operates effectively with only one LLM.

Your complete process might be seen as a two-player game during which the primary model generates responses as close as possible to those within the human-annotated dataset, and the second model tries to tell apart between the responses of the opposite model and human-generated responses. The latter is obtained by fine-tuning the previous to prefer responses from the goal dataset over the response generated by the previous model. In the subsequent iteration, the models switch their roles (generating responses and discerning them), and the method continues until the iteration where the LLM cannot differentiate between the response generated by its previous version and people generated by the human.

The authors demonstrated the effectiveness of SPIN through an example. When an LLM was prompted to list the favored types of transportation in Southampton, on the zeroth iteration, the model began to hallucinate and provided incorrect distribution of the modes of transport. Nevertheless, at the subsequent step, it gave a solution that aligned more closely with the bottom truth.

The researchers used the to evaluate the framework. The model was derived from the pre-trained Mistral-7B and was further fine-tuned on an SFT dataset. The bottom model was used to generate synthetic responses on randomly sampled 50K prompts from the dataset. The outcomes show that SPIN improved the common rating of the model by 2.66% at iteration 0. In the subsequent iteration, the LLM model from the previous iteration was used to generate latest responses for SPIN, which further improved the common rating by 1.32%.

In conclusion, SPIN is a novel framework that converts a weak LLM to a powerful one without the necessity for an authority human annotator. Using a self-play mechanism, it was capable of significantly improve the performance of a fine-tuned model on an SFT dataset. There are a couple of limitations to their approach, though, which puts a ceiling to the performance of the fine-tuned LLM. Nevertheless, this issue might be resolved by dynamically changing the goal data distribution, and the researchers have left this topic for future work.

Try the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 35k+ ML SubReddit, 41k+ Facebook Community, Discord ChannelLinkedIn GroupTwitter, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more.

In the event you like our work, you’ll love our newsletter..

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most up-to-date endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that’s each technically sound and simply comprehensible by a large audience. The platform boasts of over 2 million monthly views, illustrating its popularity amongst audiences.

🐝 Get stunning skilled headshots effortlessly with Aragon- TRY IT NOW!.


Please enter your comment!
Please enter your name here