Home Community Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

0
Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

In recent times, large language models (LLMs) have revolutionized the sphere of natural language processing, enabling unprecedented zero-shot and few-shot learning capabilities. Nevertheless, their deployment in real-world applications has been hindered by their immense computational demands. A single 175 billion parameter LLM necessitates a staggering 350GB of GPU memory and specialized infrastructure. With today’s state-of-the-art models boasting over 500 billion parameters, these requirements render LLMs inaccessible to many research teams, particularly those with low-latency performance needs.

To handle this deployment challenge, researchers have turned to smaller specialized models, trained through either fine-tuning or distillation. High quality-tuning, while effective, relies on costly and time-consuming human-generated labels. Distillation, then again, demands copious amounts of unlabeled data, which will be difficult to acquire.

In a groundbreaking study by a research team from Google and the University of Washington presented at ACL2023, the authors introduced “Distilling Step-by-Step,” a novel mechanism designed to mitigate the trade-off between model size and the price of knowledge collection. This modern approach hinges on extracting informative natural language rationales, or intermediate reasoning steps, from LLMs. These rationales function additional, richer supervision in training smaller task-specific models alongside standard task labels.

The researchers outline a two-stage process for implementing Distilling Step-by-Step. First, they employ CoT prompting to extract rationales from an LLM, enabling the model to generate rationales for unseen inputs. Subsequently, these rationales are integrated into the training of small models using a multi-task learning framework, with task prefixes guiding the model’s differentiation between label prediction and rationale generation.

In a series of experiments, a 540B parameter LLM was utilized, together with T5 models for task-specific downstream tasks. Distilling Step-by-Step exhibited remarkable performance gains with significantly reduced data requirements. For example, on the e-SNLI dataset, the tactic outperformed standard fine-tuning with just 12.5% of the complete dataset. Similar reductions in dataset size were observed across various NLP tasks, including ANLI, CQA, and SVAMP.

Moreover, Distilling Step-by-Step achieved superior performance using considerably smaller model sizes in comparison with few-shot CoT-prompted LLMs. For example, on the e-SNLI dataset, a 220M T5 model surpassed the performance of a 540B PaLM. On ANLI, a 770M T5 model outperformed a 540B PaLM by over 700 times, demonstrating the immense potential for efficiency gains.

Notably, Distilling Step-by-Step showcased its ability to outperform few-shot LLMs using significantly smaller models and fewer data. For example, on ANLI, a 770M T5 model surpassed the performance of a 540B PaLM using only 80% of the complete dataset, a feat unattainable through standard fine-tuning.

In conclusion, Distilling Step-by-Step presents a groundbreaking paradigm for training small, task-specific models. By extracting rationales from LLMs, this approach not only reduces the info required for model training but additionally enables using significantly smaller models. This modern technique stands to revolutionize the sphere of natural language processing, making advanced language models more accessible and practical for a broader range of applications.


Try the Paper and Google AI Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to affix our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

In case you like our work, you’ll love our newsletter..


Niharika

” data-medium-file=”https://www.marktechpost.com/wp-content/uploads/2023/01/1674480782181-Niharika-Singh-264×300.jpg” data-large-file=”https://www.marktechpost.com/wp-content/uploads/2023/01/1674480782181-Niharika-Singh-902×1024.jpg”>

Niharika is a Technical consulting intern at Marktechpost. She is a 3rd 12 months undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the newest developments in these fields.


🚀 The tip of project management by humans (Sponsored)

LEAVE A REPLY

Please enter your comment!
Please enter your name here