Autoregressive models are a category of statistical models based on the intuition that a variable’s current value largely depends upon its past values. In other words, the model predicts the longer term value of a variable by regressing it on its past values. Some of the well-known examples of autoregressive models is the category of GPT models, especially GPT-3 and its variants, that are largely based on the muse of predicting the subsequent word in a sequence given the previous words. By training GPT on this autoregressive manner on a big text corpus, it learns to capture the statistical patterns, dependencies, and semantic relationships in language, thereby enabling it to generate contextually relevant text based on the input prompt. Nevertheless, previous research experiments have shown that smaller models or models that are fine-tuned to have less randomness or variability (i.e., lower generation temperatures) are inclined to generate repetitive or erroneous outputs. Furthermore, in certain scenarios, these models use their very own outputs as inputs, often resulting in compounding errors that quickly take the model out of its intended distribution.
To beat these challenges, a team of researchers from Stanford conducted initial studies and identified two predominant obstacles that prevent autoregressive models trained with maximum likelihood estimation (MLE) from generating coherent sequences during evaluation. The primary issue lies within the divergence measure used to evaluate the disparity between the model and the information distribution. Because MLE doesn’t consider out-of-distribution (OOD) sequences, the model’s behavior on such sequences can’t be controlled. To tackle this, the researchers devised the thought to reduce the χ2-divergence between a mixture of actual data and the autoregressively generated sequences, which has shown superior performance in comparison with MLE. The second challenge arises when the model produces an OOD token with no suitable continuation that’s aligned with the information distribution. To handle this, the researchers introduce an
By drawing these learnings from their preliminary studies, Stanford Researchers have provide you with a novel method called SequenceMatch, which enables the training of autoregressive models against difference divergence techniques while adding an
The researchers conducted several experimental evaluations to check the performance of GPT-2 based models fine-tuned on SequenceMatch with MLE-trained models. The researchers used the MAUVE rating as a metric to check the performance, and it was revealed that models fine-tuned on SequenceMatch generated text closer to the dataset and appeared more fluent and error-free in contrast to MLE-trained models. The team also highlighted the limitation of their model because it requires more computational resources and time for generating lengthy texts. On the subject of future work, the researchers are specializing in studying how different divergence methods affect the standard of the sequences generated.
Check Out The Paper. Don’t forget to affix our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more. If you’ve got any questions regarding the above article or if we missed anything, be at liberty to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is captivated with the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more concerning the technical field by participating in several challenges.