Home Community Researchers from MIT and Microsoft Introduce DoLa: A Novel AI Decoding Strategy Aimed toward Reducing Hallucinations in LLMs

Researchers from MIT and Microsoft Introduce DoLa: A Novel AI Decoding Strategy Aimed toward Reducing Hallucinations in LLMs

0
Researchers from MIT and Microsoft Introduce DoLa: A Novel AI Decoding Strategy Aimed toward Reducing Hallucinations in LLMs

Quite a few natural language processing (NLP) applications have benefited greatly from using large language models (LLMs). While LLMs have improved in performance and gained additional capabilities attributable to being scaled, they still have an issue with “hallucinating” or producing information inconsistent with the real-world facts detected during pre-training. This represents a major barrier to adoption for high-stakes applications (corresponding to those present in clinical and legal settings), where the generation of trustworthy text is important.

The utmost likelihood language modeling goal, which seeks to reduce the forward KL divergence between the info and model distributions, could also be accountable for LMs’ hallucinations. Nonetheless, this is way from certain. The LM may assign a non-zero probability to phrases that usually are not fully consistent with the knowledge encoded within the training data if this goal is pursued.

From the attitude of the interpretability of the model, studies have shown that the sooner layers of transformer LMs encode “lower level” information (corresponding to part-of-speech tags). In contrast, the later levels encode more “semantic” information. 

A gaggle of researchers at MIT and Microsoft suggest using this modular encoding of information to extend the LM’s factual knowledge via a contrastive decoding strategy, where the likelihood of the subsequent word’s output is calculated using the difference in logits from a better layer. With this, it is feasible to make LMs more grounded in point of fact and cut down on hallucinations by prioritizing information from deeper levels and downplaying that from intermediate or shallower ones.

Their recent work introduces Decoding by Contrasting Layers (DoLa), a novel decoding approach. The proposed method relies on improving the exposure of factual knowledge encoded in an LLM without retrieving external knowledge or doing further fine-tuning. 

DoLa has been shown experimentally to enhance the integrity of LLaMA family models on each TruthfulQA and FACTOR. For each StrategyQA and GSM8K cc, additional experiments on chain-of-thought reasoning display its potential to enhance factual reasoning. Finally, experimental results on open-ended text production (evaluated with GPT-4) reveal that DoLa can generate informative and significantly more factual responses that result in superior rankings in comparison with the unique decoding approach. DoLa is a decoding approach that could be used to extend the honesty of LLMs, and findings show that it adds only a small period of time to the decoding process.

The researchers didn’t investigate the model’s performance in other domains, corresponding to following instructions or picking up on human feedback. As well as, moderately than leveraging human labels or factual information sources for fine-tuning, the team relies on preexisting architecture and parameters, restricting the scope of possible enhancements. Unlike certain retrieval-augmented LMs, this method depends entirely on the model’s preexisting knowledge moderately than adding latest information through external retrieval modules. The team hopes future work incorporates the components above with their decoding technique to assist overcome the restrictions.


Take a look at the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to hitch our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

Should you like our work, you’ll love our newsletter..


Dhanshree

” data-medium-file=”https://www.marktechpost.com/wp-content/uploads/2022/11/20221028_101632-Dhanshree-Shenwai-169×300.jpg” data-large-file=”https://www.marktechpost.com/wp-content/uploads/2022/11/20221028_101632-Dhanshree-Shenwai-576×1024.jpg”>

Dhanshree Shenwai is a Computer Science Engineer and has an excellent experience in FinTech firms covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is smitten by exploring latest technologies and advancements in today’s evolving world making everyone’s life easy.


🚀 The top of project management by humans (Sponsored)

LEAVE A REPLY

Please enter your comment!
Please enter your name here