Home Community A Latest AI Research Introduces AttrPrompt: A LLM-as-Training-Data-Generator for a Latest Paradigm in Zero-Shot Learning

A Latest AI Research Introduces AttrPrompt: A LLM-as-Training-Data-Generator for a Latest Paradigm in Zero-Shot Learning

0
A Latest AI Research Introduces AttrPrompt: A LLM-as-Training-Data-Generator for a Latest Paradigm in Zero-Shot Learning

The performance of huge language models (LLMs) has been impressive across many alternative natural language processing (NLP) applications. In recent studies, LLMs have been proposed as task-specific training data generators to cut back the need of task-specific data and annotations, especially for text classification. Though these efforts have demonstrated the usefulness of LLMs as data producers, they’ve largely centered on improving the training step, when the generated data are used to coach task-specific models, leaving the upstream data creation process untouched. To question LLMs, the prevalent method uses a single class conditional prompt, which can reduce the variability of provided data and perpetuate the inherent systematic biases of LLMs. 

A brand new study by Georgia Tech, University of Washington, UIUC, and Google Research analyzes 4 difficult subject classification tasks with large cardinality from different domains. It anchors the LLM to ChatGPT for its ability to jot down high-quality, human-like language. The team primarily uses data attributes to guage the extent of bias and variety inside the created training set. Specifically, data attributes consist of several attribute dimensions and various attribute values, each representing a possible realization of the attributes themselves.

The researchers used a trained attribute classifier to investigate the attribute bias within the SimPrompt-generated dataset. They investigate how different attributes can affect a model’s final results. To generate attributed data, they use ChatGPT and add constraints to the questions with certain values for the mandatory characteristics. The researchers find that models trained on datasets generated with random characteristics perform significantly higher than those trained on datasets with fixed attributes, highlighting the importance of attribute variation within the generated dataset.

🔥 Join The Fastest Growing ML Subreddit

The team suggests generating data using diversely attributed prompts to cut back attribute biases and increase the attribute diversity of the generated data. Using the LLM, an interactive, semi-automated process is first engaged to find out the suitable attribute dimensions and values for a given classification task. The usual class-conditional prompt for LLM data queries is then replaced by more complex inquiries generated by randomly combined properties. They’ve coined the term “AttrPrompt” to explain these various attributable triggers. 

The researchers empirically evaluate the created datasets on the 4 classification tasks by comparing the outcomes of models trained under two scenarios: 1) only on the generated dataset and a couple of) on a merged dataset, including the real training set and the generated set. The dataset created using AttrPrompt performs much better than the dataset created with SimPrompt in each cases. Their results further show that AttrPrompt is superior to SimPrompt regarding data/budget efficiency and suppleness toward a wide selection of model sizes and LLM-as-training-data-generator strategies. 

AttrPrompt is notable since it provides the identical performance as SimPrompt while only requiring 5% of the querying cost of ChatGPT that SimPrompt necessitates. Finally, they show for the primary time that AttrPrompt beats SimPrompt across all evaluation criteria by extending the LLM-as-training-data-generator paradigm to the harder multi-label classification problems.


Check Out the Paper and Github Link. Don’t forget to affix our 25k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more. If you will have any questions regarding the above article or if we missed anything, be at liberty to email us at Asif@marktechpost.com


Featured Tools:

🚀 Check Out 100’s AI Tools in AI Tools Club


Dhanshree

” data-medium-file=”https://www.marktechpost.com/wp-content/uploads/2022/11/20221028_101632-Dhanshree-Shenwai-169×300.jpg” data-large-file=”https://www.marktechpost.com/wp-content/uploads/2022/11/20221028_101632-Dhanshree-Shenwai-576×1024.jpg”>

Dhanshree Shenwai is a Computer Science Engineer and has a great experience in FinTech corporations covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is obsessed with exploring recent technologies and advancements in today’s evolving world making everyone’s life easy.


🔥 StoryBird.ai just dropped some amazing features. Generate an illustrated story from a prompt. Test it out here. (Sponsored)

LEAVE A REPLY

Please enter your comment!
Please enter your name here