Home Community This AI Paper from the University of Washington, CMU, and Allen Institute for AI Unveils FAVA: The Next Leap in Detecting and Editing Hallucinations in Language Models

This AI Paper from the University of Washington, CMU, and Allen Institute for AI Unveils FAVA: The Next Leap in Detecting and Editing Hallucinations in Language Models

0
This AI Paper from the University of Washington, CMU, and Allen Institute for AI Unveils FAVA: The Next Leap in Detecting and Editing Hallucinations in Language Models

Large Language Models (LLMs), that are the newest and most incredible developments in the sector of Artificial Intelligence (AI), have gained massive popularity. As a result of their human-imitating skills of answering questions like humans, completing codes, summarizing long textual paragraphs, etc, these models have utilized the potential of Natural Language Processing (NLP) and Natural Language Generation (NLG) to an incredible extent.

Though these models have shown impressive capabilities, there still arise challenges on the subject of these models producing content that’s factually correct in addition to fluent. LLMs are capable of manufacturing extremely realistic and cohesive text, but in addition they tend sometimes to provide factually false information, i.e., hallucinations. These hallucinations can hamper the sensible use of those models in real-world applications.

Previous studies on hallucinations within the Natural Language Generation have steadily targeting situations by which a certain reference text is on the market, examining how closely the generated text adheres to those references. Then again, issues have been brought up regarding hallucinations that result from the model depending more on facts and general knowledge than from a specific source text.

To beat this, a team of researchers has recently released a study on a novel task: automatic fine-grained hallucination detection. The team has proposed a comprehensive taxonomy consisting of six hierarchically defined types of hallucinations. Automated systems for modifying or detecting hallucinations have been developed. 

Current systems steadily concentrate on particular domains or kinds of errors, oversimplifying factual errors into binary categories like factual or not factual. This oversimplification may not capture the variability of hallucination kinds, comparable to entity-level contradictions and the creation of entities that haven’t any real-world existence. For that, the team has suggested a more detailed approach to hallucination identification by introducing a brand new task, benchmark, and model so as to recover from these drawbacks. 

The objectives are precise detection of hallucination sequences, differentiation of mistake types, and suggestions for possible improvements. The team has focused on hallucinations in information-seeking contexts when grounding in world knowledge is important. They’ve also provided a novel taxonomy that divides factual errors into six kinds.

The team has presented a brand new benchmark that includes human judgments on outputs from two Language Models (LM), ChatGPT and Llama2-Chat 70B, across multiple domains to assist in the evaluation of fine-grained hallucination identification. Based on the benchmark study, it was observed that a substantial percentage of ChatGPT and Llama2-Chat’s outputs, 60% and 75%, respectively, display hallucinations. 

In ChatGPT and Llama2-Chat, the benchmark indicated a mean of 1.9 and three.4 hallucinations per response. It was also noted that a big proportion of those hallucinations belong to categories which have not been properly examined. Flaws aside from entity-level faults, like fabricated concepts or unverifiable words, were present in greater than 60% of LM-generated hallucinations.

The team has also trained FAVA, a retrieval-augmented LM, as a possible solution. The training procedure included meticulously creating synthetic data production to discover and address fine-grained hallucinations. Each automated and human assessments on the benchmark demonstrated that FAVA performs higher than ChatGPT by way of fine-grained hallucination identification. FAVA’s proposed edits improved the factuality of LM-generated text and detected hallucinations concurrently, yielding 5–10% FActScore improvements.

In conclusion, this study has proposed a novel task of automatic fine-grained hallucination identification so as to address the common problem of hallucinations in text generated by Language Models. The paper’s thorough taxonomy and benchmark have provided insight into the degree of hallucinations in popular LMs. Promising results have been shown in detecting and correcting fine-grained hallucinations using FAVA, the proposed retrieval-augmented LM, highlighting the need for further developments on this area.


Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

When you like our work, you’ll love our newsletter..

Don’t Forget to affix our Telegram Channel


Tanya Malhotra is a final 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and important considering, together with an ardent interest in acquiring latest skills, leading groups, and managing work in an organized manner.


LEAVE A REPLY

Please enter your comment!
Please enter your name here