Home Community Researchers from AI2 and the University of Washington Uncover the Superficial Nature of Alignment in LLMs and Introduce URIAL: A Novel Tuning-Free Method

Researchers from AI2 and the University of Washington Uncover the Superficial Nature of Alignment in LLMs and Introduce URIAL: A Novel Tuning-Free Method

0
Researchers from AI2 and the University of Washington Uncover the Superficial Nature of Alignment in LLMs and Introduce URIAL: A Novel Tuning-Free Method

Large Language Models (LLMs) are recent innovations in the sphere of Artificial Intelligence (AI) and Deep Learning. A number of the well-known LLMs, like GPT, PaLM, LLaMa, etc, have demonstrated incredible potential in generating content. From query answering and text summarization to language translation and code completion, these models can do rather a lot. These models, including ChatGPT, have passed through extensive pre-training on vast unsupervised text corpora. Nonetheless, recent studies have suggested that the commonly adopted practice of fine-tuning might not be as essential as previously thought.

Alignment tuning, which is the means of improving base LLMs for usage as open-domain AI assistants, has been accepted because the industry standard. This includes Reinforcement Learning from Human Feedback (RLHF) and Supervised High-quality-Tuning (SFT). This standard was questioned by a study called LIMA, which showed that as few as 1,000 samples for SFT could also be sufficient to attain meaningful alignment performance.

The Superficial Alignment Hypothesis, put forth by LIMA, proposed that alignment tuning, versus radically changing basic LLMs’ behavior, may as a substitute train them to decide on particular data formats for user engagement. This showed that a couple of examples can produce high-quality, aligned models under supervised fine-tuning.

Since not enough research has been done to seek out solid support for the superficial alignment theory, a team of researchers from the Allen Institute for Artificial Intelligence and the University of Washington has addressed the widely used strategy of alignment tuning in a recent paper to make basic LLMs into useful AI assistants for the open domain. Preference tuning has been achieved through reinforcement learning from human feedback, and instruction learning has been achieved through supervised fine-tuning.

The team has examined the shift in token distribution between base LLMs and their aligned counterparts, like Llama-2 and Llama-2-chat, in an effort to study the impact of alignment adjustment. They’ve discovered that base LLMs and their aligned versions share the top-ranked tokens and perform nearly identically in decoding on most token positions. Discourse markers and safety disclaimers are examples of favor tokens that have essentially the most distribution fluctuations. This study has provided compelling evidence for the hypothesis that alignment adjustment mostly concentrates on assimilating the linguistic sort of AI assistants, with the bottom LLMs supplying the data required to answer user inquiries.

The team has also presented a research topic in response to those findings: to what extent may base LLMs be aligned without SFT or RLHF? They’ve suggested URIAL (Untuned LLMs with Restyled In-context Alignment), an alignment technique that doesn’t require tuning. With just three continual style examples and a system prompt, URIAL accomplishes effective alignment solely through in-context learning (ICL) with base LLMs. 

In a series of instances dubbed just-eval-instruct, the team has provided an in depth and comprehensible evaluation that shows how base LLMs with URIAL can perform on par with or higher than LLMs aligned with SFT (Mistral-7b-Instruct) or SFT+RLHF (Llama-2-70b-chat). The outcomes have demonstrated that deliberate prompting and in-context learning can dramatically close the gap between tuning-free and tuning-based alignment strategies.

In conclusion, the evaluation results have highlighted shallow alignment tuning and have shown that it mostly entails adopting linguistic styles and is determined by the preexisting knowledge of the essential LLMs.


Take a look at the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

When you like our work, you’ll love our newsletter..


Tanya Malhotra is a final 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and significant pondering, together with an ardent interest in acquiring latest skills, leading groups, and managing work in an organized manner.


🐝 [FREE AI WEBINAR] ‘Beginners Guide to LangChain: Chat with Your Multi-Model Data’ Dec 11, 2023 10 am PST

LEAVE A REPLY

Please enter your comment!
Please enter your name here