Recent advancements in large language models (LLMs) have propelled the sphere forward in interpreting and executing instructions. Despite these strides, LLMs still grapple with errors in recalling and composing world knowledge, resulting in inaccuracies in responses. To handle this, the mixing of auxiliary tools, similar to using search engines like google or calculators during inference, has been proposed to boost reasoning. Nonetheless, existing tool-augmented LLMs face challenges in efficiently leveraging tools for multi-step reasoning, particularly in handling interleaved tool calls and minimizing inference waiting times.
In response to those challenges, this research from EPFL and Meta introduces the Chain-of-Abstraction (CoA) reasoning method, a sturdy and efficient approach for LLMs to perform multi-step reasoning with tools. The core idea is illustrated in Figure 1, where LLMs are fine-tuned to create reasoning chains with abstract placeholders (e.g., y1, y2, y3). Subsequently, these placeholders are replaced with specific knowledge obtained from external tools, similar to calculators or web search engines like google, grounding the ultimate answer generations.
Furthermore, unlike prior methods where LLM decoding and API calls are interleaved, CoA reasoning promotes effective planning by encouraging LLMs to interconnect multiple tool calls and adopt more feasible reasoning strategies. The abstract chain of reasoning allows LLMs to deal with general and holistic reasoning strategies without generating instance-specific knowledge for the model’s parameters. Notably, the decoupling of general reasoning and domain-specific knowledge enables parallel processing, where LLMs can generate the subsequent abstract chain while tools fill the present chain, thus speeding up the general inference process.
To coach LLMs for CoA reasoning, the authors construct fine-tuning data by repurposing existing open-source question-answering datasets (Cobbe et al., 2021; Miao et al., 2020; Yang et al., 2018). LLaMa-70B is prompted to re-write answers as abstract chains, replacing specific operations with abstract placeholders. The resulting CoA traces are validated using domain-specialized tools to make sure accuracy.
The CoA method is evaluated in two domains: mathematical reasoning and Wikipedia query answering (Wiki QA). For mathematical reasoning, LLMs are trained on CoA data constructed by re-writing the GSM8K (Cobbe et al., 2021) training set. CoA outperforms few-shot and regular fine-tuning baselines on each in-distribution and out-of-distribution datasets, showcasing its effectiveness in multi-step reasoning tasks. The CoA method also demonstrates superior performance in comparison with the Toolformer baseline.
Within the Wiki QA domain, HotpotQA (Yang et al., 2018) is utilized to construct fine-tuning CoA data. CoA surpasses baselines, including Toolformer, and achieves remarkable generalization ability on diverse question-answering datasets (WebQuestions, NaturalQuestions, TriviaQA). Domain tools, similar to a Wikipedia search engine and named-entity recognition toolkit, further enhance the performance of CoA.
The evaluation results across each domains indicate significant improvements with the CoA method, yielding a median accuracy increase of ∼7.5% and 4.5% for mathematical reasoning and Wiki QA, respectively. These improvements hold across in-distribution and out-of-distribution test sets, particularly benefiting questions requiring complex chain-of-thought reasoning. CoA also exhibits faster inference speeds, outpacing previous augmentation methods on mathematical reasoning and Wiki QA tasks.
In conclusion, The proposed CoA reasoning method separates general reasoning from domain-specific knowledge, fostering more robust multi-step reasoning in LLMs. Its efficiency in tool usage contributes to faster inference, making it a promising approach for diverse reasoning scenarios. The experiments on mathematical reasoning and Wiki QA underscore the flexibility and efficacy of the CoA method, suggesting its potential for broader applications in enhancing LLM performance in various domains.
Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.
In the event you like our work, you’ll love our newsletter..
Don’t Forget to hitch our Telegram Channel
Vineet
” data-medium-file=”https://www.marktechpost.com/wp-content/uploads/2022/11/IMG20221002180119-Vineet-kumar-225×300.jpg” data-large-file=”https://www.marktechpost.com/wp-content/uploads/2022/11/IMG20221002180119-Vineet-kumar-768×1024.jpg”>
Vineet Kumar is a consulting intern at MarktechPost. He’s currently pursuing his BS from the Indian Institute of Technology(IIT), Kanpur. He’s a Machine Learning enthusiast. He’s captivated with research and the newest advancements in Deep Learning, Computer Vision, and related fields.