Home Community Can We Align LLMs to Honesty via Instruction Wonderful-Tuning? Addressing Hallucination in Large Language Models with Refusal-Aware Instruction Tuning

Can We Align LLMs to Honesty via Instruction Wonderful-Tuning? Addressing Hallucination in Large Language Models with Refusal-Aware Instruction Tuning

0
Can We Align LLMs to Honesty via Instruction Wonderful-Tuning? Addressing Hallucination in Large Language Models with Refusal-Aware Instruction Tuning

Researchers from the Hong Kong University of Science and Technology and the University of Illinois Urbana-Champaign have collaborated to deal with a challenge faced by large language models (LLMs) often known as hallucination, where these models generate non-existent facts, by introducing a novel approach called Refusal-Aware Instruction Tuning (R-Tuning). The remark from the prevailing instruction tuning methods reveals that always in LLM, models are compelled to finish sentences even when there may be a knowledge gap, which ends up in the generation of inaccurate information. 

The core idea of R-tuning involves recognizing the knowledge gap between the parametric knowledge of LLMs and the instruction tuning data after which constructing a refusal-aware dataset by identifying uncertain questions and training the model to explicitly refuse to reply questions beyond its parametric knowledge. This two-step process involves measuring the knowledge gap by comparing model predictions with ground-truth labels and constructing refusal-aware data by appending uncertainty expressions to uncertain questions.

The researchers conducted each single-task and multi-task experiments on seven datasets, namely ParaRel, HotpotQA, SelfAware, HaluEval, FalseQA, NEC, MMLU, WiCE, and FEVER. In single-task experiments, R-Tuning demonstrated a remarkable ability to refuse uncertain questions, resulting in improved accuracy on questions throughout the model’s knowledge. In multi-task experiments, R-Tuning showcased its refusal ability as a meta-skill, providing benefits in- and out-of-domain datasets.

Comparisons with baseline models, including Pretrain-T, Pretrain-W, and Vanilla fine-tuning, revealed that R-Tuning consistently outperformed in Average Precision (AP) scores. The outcomes indicated that R-Tuning effectively reduced hallucination by filtering out questions beyond the model’s knowledge domain. Moreover, the study explored the impact of model size on refusal ability, showing that larger models demonstrated higher scalability and performance.

Surprisingly, the researchers found that learning uncertainty during training and incorporating it into the model’s training process yielded higher results than directly applying uncertainty filtering on test data. This unexpected finding suggested that learning uncertainty improved the model’s training in estimating uncertainty and answering questions, highlighting some great benefits of incorporating uncertainty learning into LLM training. Additionally they discovered unsupervised identification strategies and label alternative methods inside R-Tuning, showing that uncertainty-based identification and direct label alternative were effective approaches. 

Moreover, R-Tuning successfully addressed unanswerable questions, refusing to offer answers to queries that contradicted common sense or were beyond the model’s knowledge. The in-depth evaluation included examining the perplexity of refused questions and the entropy of answers, providing insights into how R-Tuning improved the model’s ability to handle different levels of query randomness and difficulties.

In conclusion, the researchers introduced R-Tuning as a robust method for teaching LLMs to refuse unknown questions, addressing the challenge of hallucination and improving model accuracy. The refusal ability demonstrated by R-Tuning was identified as a meta-skill that could possibly be generalized across various tasks, showcasing its potential impact on the reliability and performance of enormous language models.


Try the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 35k+ ML SubReddit, 41k+ Facebook Community, Discord ChannelLinkedIn GroupTwitter, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

Should you like our work, you’ll love our newsletter..


Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest within the scope of software and data science applications. She is at all times reading in regards to the developments in several field of AI and ML.


🐝 Get stunning skilled headshots effortlessly with Aragon- TRY IT NOW!.

LEAVE A REPLY

Please enter your comment!
Please enter your name here