
Tencent AI Lab researchers address challenges within the reliability of retrieval-augmented language models (RALMs), which can retrieve irrelevant information, resulting in misguided responses. The proposed approach, CHAIN-OF-NOTING (CON), goals to boost RALM. CON-equipped RALMs exhibit substantial performance improvements across open-domain QA benchmarks, achieving notable gains in Exact Match (EM) scores and rejection rates for out-of-scope questions.
The research addresses limitations in RALMs, emphasizing noise robustness and reduced dependence on retrieved documents. The CON approach generates sequential reading notes for retrieved documents, enabling a comprehensive relevance evaluation. The case studies highlight that CON enhances the model’s understanding of document relevance, leading to more accurate, contextually relevant responses by filtering out irrelevant or less trustworthy content.
Outperforming standard RALMs, CON achieves higher Exact Match scores and rejection rates for out-of-scope questions. It balances direct retrieval, inferential reasoning, and acknowledging knowledge gaps, resembling human information processing. CON’s implementation involves designing reading notes, data collection, and model training, offering an answer to current RALM limitations and enhancing reliability.
CON, a framework generating sequential reading notes for retrieved documents, enhances the performance of RALMs. Trained on a LLaMa-2 7B model with ChatGPT-created training data, CON outperforms standard RALMs, especially in high-noise scenarios. It classifies reading notes into direct answers, useful context, and unknown scenarios, demonstrating a sturdy mechanism for assessing document relevance. Comparisons with LLaMa-2 wo IR, a baseline method, showcase CON’s ability to filter irrelevant content, improving response accuracy and contextual relevance.
RALMs equipped with CON reveal substantial improvements, achieving a remarkable +7.9 average increase in EM rating for entirely noisy retrieved documents. CON exhibits a notable +10.5 improvement in rejection rates for real-time questions beyond pre-training knowledge. Evaluation metrics include EM rating, F1 rating, and reject rate for open-domain QA. Case studies highlight CON’s efficacy in deepening RALMs’ understanding, addressing challenges of noisy, irrelevant documents, and improving overall robustness.
The CON framework significantly enhances RALMs. By generating sequential reading notes for retrieved documents and integrating this information into the ultimate answer, RALMs equipped with CON outperform standard RALMs, showing a notable average improvement. CON addresses the constraints of normal RALMs, fostering a deeper understanding of relevant information and improving overall performance on various open-domain QA benchmarks.
Future research may extend the CON framework’s application to diverse domains and tasks, evaluating its generalizability and efficacy in fortifying RALMs. Investigating varied retrieval strategies and document rating methods can optimize the retrieval process, enhancing the relevance of retrieved documents. User studies should assess the usability and satisfaction of RALMs with CON in real-world scenarios, considering response quality and trustworthiness. Exploring additional external knowledge sources and mixing CON with techniques like pre-training or fine-tuning can further enhance RALM performance and flexibility.
Try the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.
If you happen to like our work, you’ll love our newsletter..
Hello, My name is Adnan Hassan. I’m a consulting intern at Marktechpost and shortly to be a management trainee at American Express. I’m currently pursuing a dual degree on the Indian Institute of Technology, Kharagpur. I’m keen about technology and need to create recent products that make a difference.