Home Community This AI Research from DeepMind Goals at Reducing Sycophancy in Large Language Models (LLMs) Using Easy Synthetic Data

This AI Research from DeepMind Goals at Reducing Sycophancy in Large Language Models (LLMs) Using Easy Synthetic Data

0
This AI Research from DeepMind Goals at Reducing Sycophancy in Large Language Models (LLMs) Using Easy Synthetic Data

Large Language Models (LLMs) have developed significantly in recent times and at the moment are able to handling difficult tasks that decision for reasoning. Numerous researches, including those by OpenAI and Google, have emphasized quite a bit on these developments. LLMs have revolutionized the best way humans interact with machines and is considered one of the best advancements in the sector of Artificial Intelligence (AI). Researchers have been putting in efforts to research the phenomena of sycophancy, which is the term for an unfavorable behavior shown by language models through which these models modify their responses to coincide with the point of view of a human user, even when that viewpoint shouldn’t be objectively right.

The behavior can involve a model adopting liberal beliefs simply because a user self-identifies as liberal. Research has been done on emphasizing and examining the frequency of sycophancy inside language models and suggesting a fairly easy synthetic-data-based technique to curtail this behavior. To deal with that, a team of researchers from Google DeepMind has examined three different sycophancy tasks to look at the sycophancy phenomenon. These assignments entail asking models for his or her thoughts on topics for which there isn’t a single, undeniable right or unsuitable response, including those pertaining to politics.

The evaluation has revealed an interesting pattern: in PaLM models, which might have as much as 540 billion parameters, each the model’s size and the practice of instruction adjusting significantly boost sycophantic behavior. By analyzing the identical behavior within the setting of straightforward addition statements, the research has gone beyond the fundamental scope of sycophancy tasks and has added a brand new dimension. Despite the proven fact that these added claims are intentionally inaccurate, language models have shown a propensity to agree with them when users signal their agreement. This finding highlights how persistent sycophancy could also be, even when models are aware of their very own shortcomings.

The research has presented a comparatively straightforward but successful technique centered on synthetic data intervention to handle the issue of sycophancy. This intervention makes use of Natural Language Processing (NLP) activities in these tasks to strengthen the model’s resistance to user opinions which might be freely accessible to the general public. A notable decrease in sycophantic behavior has been achieved by incorporating this synthetic data through a fast fine-tuning procedure, especially when tested on novel cues.

The findings have been summarized as follows –

Construct your personal brand with Taplio! 🚀 The first AI-powered tool to grow on LinkedIn (Sponsored)
  1. Model size and instruction tuning increase sycophancy – Models that were instruction-tuned or had more parameters were more more likely to replicate a simulated user’s perspective when asked for opinions on topics without definitive answers, including politics.
  1. Models could also be complacent about incorrect responses – When there isn’t any user opinion, models accurately disagree with wildly incorrect claims, similar to 1 + 1 = 956446. Models also switch their previously accurate responses to follow the user in the event that they agree with the user incorrectly.
  1. Sycophancy may be decreased with an easy synthetic-data intervention, which might improve models on prompts where a claim’s truthfulness is unrelated to the user’s perception of it.

In conclusion, this approach addressed the problem of a language model repeating a user’s opinion, even when that opinion is unsuitable. High-quality-tuning using easy synthetic data has been shown to cut back this trait.


Take a look at the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to hitch our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.


Tanya Malhotra is a final 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and significant considering, together with an ardent interest in acquiring latest skills, leading groups, and managing work in an organized manner.


🔥 Use SQL to predict the long run (Sponsored)

LEAVE A REPLY

Please enter your comment!
Please enter your name here