Socrates once said: “It will not be the dimensions of a thing, but the standard that actually matters. For it’s in the character of substance, not its volume, that true value is found.”
Does size at all times matter for giant language models (LLMs)? In a technological landscape bedazzled by LLMs taking center stage, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers think smaller models shouldn’t be ignored, especially for natural language understanding products widely deployed within the industry.
To that end, the researchers cooked up an approach to long-standing problems of inefficiency and privacy related to big, text-based AI models — a logic-aware model that outperforms 500-times-bigger counterparts on some language understanding tasks without human-generated annotations, while preserving privacy and robustness with high performance.
LLMs, which have shown some promising skills in generating language, art, and code, are computationally expensive, and their data requirements can risk privacy leaks when using application programming interfaces for data upload. Smaller models have been historically less capable, particularly in multitasking and weakly supervised tasks, in comparison with their larger counterparts.
So what’s helping these smaller models act so mighty, then? Something called “textual entailment,” a solution to help these models understand a wide range of language tasks, where if one sentence (the premise) is true, then the opposite sentence (the hypothesis) is more likely to be true as well. For instance, if the premise is, “all cats have tails” then the hypothesis “a tabby cat has a tail” can be entailed by the premise. This idea is used to coach an “entailment model” that proved to be less biased than other language models, from the team’s previous research. They then created “prompts” that the models can use to work out if certain information is entailed by a given sentence or phrase in accordance with different tasks. This method improved the model’s ability to adapt to different tasks with none additional training, referred to as zero-shot adaptation.
Within the realm of “natural language understanding,” there are numerous applications that hinge on determining the connection between two pieces of text. For instance, in sentiment classification, a press release like “I feel the movie is sweet” might be inferred or entailed from a movie review that claims, “I just like the story and the acting is great,” indicating a positive sentiment. One other is news classification, where the subject of a news article might be inferred from its content. For instance, a press release like “the news article is about sports” might be entailed if the most important content of the article reports on an NBA game. The important thing insight was that many existing natural language understanding tasks may very well be recast as an entailment (i.e., logical inference in natural language) task.
“Our research is about improving the flexibility of computer programs to know and process natural language — the way in which humans speak and write. Our self-trained, 350-million-parameter entailment models, without human-generated labels, outperform supervised language models with 137 to 175 billion parameters,” says MIT CSAIL postdoc Hongyin Luo, lead creator on a brand new paper concerning the study. “This has potential to reshape the landscape of AI and machine learning, providing a more scalable, trustworthy, and cost-effective solution to language modeling,” says Luo. “By proving that smaller models can perform at the identical level as larger ones for language understanding, this work paves the way in which for more sustainable and privacy-preserving AI technologies.”
The team discovered that they may improve the model’s performance much more through the use of a method called “self-training,” where the model uses its own predictions to show itself, effectively learning without human supervision and extra annotated training data.The self-training method significantly improved performance on a bunch of downstream tasks, including sentiment evaluation, question-answering, and news classification. It outperformed each Google’s LaMDA and FLAN in zero-shot capabilities, GPT models, and other supervised algorithms.
Nevertheless, one challenge with self-training is that the model can sometimes generate incorrect or noisy labels that harm performance. To beat this, they developed a brand new algorithm called ‘SimPLE’ (Easy Pseudo-Label Editing), a process to review and modify the pseudo-labels made in initial rounds of learning. By correcting any mislabeled instances, it improved the general quality of the self-generated labels. This not only made the models more practical at understanding language, but more robust when faced with adversarial data.
As with most research, there are some limitations. The self-training on multi-class classification tasks didn’t perform in addition to on binary natural language understanding tasks, indicating the challenge of applying entailment models to multi-choice tasks.
“This research presents an efficient and effective solution to train large language models (LLMs) by formulating natural language understanding tasks as contextual entailment problems and employing a pseudo-labeling self-training mechanism to include large quantities of unlabelled text data within the training process,” adds CSAIL Senior Research Scientist James Glass, who can be an creator on the paper. “While the sphere of LLMs is undergoing rapid and dramatic changes, this research shows that it is feasible to provide relatively compact language models that perform thoroughly on benchmark understanding tasks in comparison with their peers of roughly the identical size, and even much larger language models.”
“Entailment task is a preferred proxy to guage “understanding” of a given context by an AI model,” says Leonid Karlinsky, research staff member on the MIT-IBM Watson AI Lab. “It’s utilized in many areas analyzing models with unimodal, like LLMs, and and multi-modal, like VLMs [visual language models]inputs, simplifying the duty of question-answering a few given input context to a binary classification problem — does this context entail a certain (e.g., text) conclusion or not? This paper makes two contributions on this space. First, it proposes a solution to improve the zero-shot (without additional tuning) NLU performance and robustness to adversarial attacks via tuning with synthesized (specialized) entailment tasks generated for the primal NLU task. Second, it offers a self-supervised SimPLE method including pseudo-labeling and confidence-based filtering to further improve large LLMs’ NLU performance.”
Luo and Glass wrote the paper with Yoon Kim, a CSAIL member and assistant professor in MIT’s Department of Electrical Engineering and Computer Science, and Jiaxin Ge of Peking University. Their work will probably be presented on the meeting of the Association for Computational Linguistics in Toronto, Ontario this July. This research was supported by a grant from the Hong Kong Innovation AI program.