Home Community Will LLMs Replace Knowledge Graphs? Meta Researchers Propose ‘Head-to-Tail’: A Recent Benchmark to Measure the Factual Knowledge of Large Language Models

Will LLMs Replace Knowledge Graphs? Meta Researchers Propose ‘Head-to-Tail’: A Recent Benchmark to Measure the Factual Knowledge of Large Language Models

0
Will LLMs Replace Knowledge Graphs? Meta Researchers Propose ‘Head-to-Tail’: A Recent Benchmark to Measure the Factual Knowledge of Large Language Models

Large Language Models have gathered loads of appreciation for his or her super amazing capabilities. They’re able to imitate humans and generate content identical to a human would do. Pre-trained large language models (LLMs), comparable to ChatGPT and LLaMA, have demonstrated astounding aptitudes for understanding the fabric and responding to frequent queries. Several studies have demonstrated their aptitude for internalizing knowledge and responding to inquiries. Though LLMs have significantly advanced, they continuously lack a classy understanding of domain-specific nuances and are susceptible to producing misinformation, referred to as hallucinations. This highlights the numerous obstacles to improving LLM accuracy and reducing the incidence of hallucinating responses.

Discussion related to LLMs has majorly focused on three most important areas, that are reducing hallucinations in LLM-generated responses, improving the factual accuracy of LLMs, and speculating on whether LLMs might eventually replace Knowledge Graphs (KGs) as a way of storing world knowledge in a symbolic format. Recently, a team of researchers from Meta Reality Labs have opted for a fresh approach to reply these questions by attempting to find out how much information LLMs actually possess.

While answering the query of how well-versed LLMs are when it comes to knowledge, the team has discussed two facets. Firstly, it might be difficult to directly query the knowledge contained inside an LLM at first. Even when the knowledge is already incorporated within the model’s parameters, hallucinations might be attributable to a lack of understanding or a malfunctioning generative model. The study suggests using correctness as a metric to roughly gauge the degree of information inside an LLM. This involves assessing the model’s ability to reply clear, accurate questions like “Where was basketball player Michael Jordan born?” The LLM can also be asked to offer succinct responses and admit uncertainty through the use of the word ‘unsure’ when its confidence is low.

Secondly, there isn’t any readily accessible benchmark that accurately reflects the variety of user interests or the breadth of data on this planet. Even essentially the most comprehensive knowledge graphs show gaps in knowledge, particularly on the subject of less well-known facts. The query logs from major LLMs or engines like google should not publicly available.

To handle all the restrictions, the team has introduced a benchmark they’ve created called “Head-to-Tail.” This benchmark consists of a set of 18,000 question-answer (QA) pairs which were divided into head, torso, and tail facts based on the recognition of their respective subjects. Different public familiarity levels are reflected in these categories. The team has created an automatic evaluation method and a set of measures that closely reflect the breadth of information that an LLM has competently assimilated so as to evaluate the knowledge maintained by LLMs.

The research’s core is the evaluation of 14 LLMs which are available to most of the people. The outcomes showed that existing LLMs still need to enhance significantly when it comes to perfecting their comprehension of factual data. This is very true for information that falls inside the torso-to-tail area and concerns less well-known organizations.

In conclusion, this research examines the factual knowledge of LLMs using a recently proposed benchmark and cutting-edge evaluation techniques. The work makes a considerable contribution to the continuing discussion regarding the dependability and prospective advancements of huge language models in incorporating factual information by addressing significant research problems and outlining specific findings.


Take a look at the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to affix our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

Should you like our work, you’ll love our newsletter..


Tanya Malhotra is a final 12 months undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and significant considering, together with an ardent interest in acquiring latest skills, leading groups, and managing work in an organized manner.


🚀 CodiumAI enables busy developers to generate meaningful tests (Sponsored)

LEAVE A REPLY

Please enter your comment!
Please enter your name here