Home Community LLMs Outperform Reinforcement Learning- Meet SPRING: An Progressive Prompting Framework for LLMs Designed to Enable in-Context Chain-of-Thought Planning and Reasoning

LLMs Outperform Reinforcement Learning- Meet SPRING: An Progressive Prompting Framework for LLMs Designed to Enable in-Context Chain-of-Thought Planning and Reasoning

0
LLMs Outperform Reinforcement Learning- Meet SPRING: An Progressive Prompting Framework for LLMs Designed to Enable in-Context Chain-of-Thought Planning and Reasoning

is an LLM-based policy that outperforms Reinforcement Learning algorithms in an interactive environment requiring multi-task planning and reasoning. 

A bunch of researchers from , , , and have investigated the usage of Large Language Models (LLMs) for understanding and reasoning with human knowledge within the context of games. They propose a two-stage approach called , which involves studying an educational paper after which using a Query-Answer (QA) framework to justify the knowledge obtained.

More details about SPRING

In the primary stage, the authors read the LaTeX source code of the unique paper by Hafner (2021) to extract prior knowledge. They employed an LLM to extract relevant information, including game mechanics and desirable behaviors documented within the paper. They then utilized a QA summarization framework just like Wu et al. (2023) to generate QA dialogue based on the extracted knowledge, enabling SPRING to handle diverse contextual information.

🚀 JOIN the fastest ML Subreddit Community

The second stage focused on in-context chain-of-thought reasoning using LLMs to resolve complex games. They constructed a directed acyclic graph (DAG) as a reasoning module, where questions are nodes and dependencies between questions are represented as edges. For instance, the query “For every motion, are the necessities met?” is linked to the query “What are the highest 5 actions?” throughout the DAG, establishing a dependency from the latter query to the previous.

LLM answers are computed for every node/query by traversing the DAG in topological order. The ultimate node within the DAG represents the query about the very best motion to take, and the LLM’s answer is directly translated into an environmental motion.

Experiments and Results

The Crafter Environment, introduced by Hafner (2021), is an open-world survival game with 22 achievements organized in a tech tree of depth 7. The sport is represented as a grid world with top-down observations and a discrete motion space consisting of 17 options. The observations also provide information in regards to the player’s current inventory state, including health points, food, water, rest levels, and inventory items.

The authors compared SPRING and popular RL methods on the Crafter benchmark. Subsequently, experiments and evaluation were carried out on different components of their architecture to look at the impact of every part on the in-context “reasoning” abilities of the LLM.

Source: https://arxiv.org/pdf/2305.15486.pdf

The authors compared the performance of assorted RL baselines to SPRING with GPT-4, conditioned on the environment paper by Hafner (2021). SPRING surpasses previous state-of-the-art (SOTA) methods by a major margin, achieving an 88% relative improvement in-game rating and a 5% improvement in reward in comparison with the best-performing RL method by Hafner et al. (2023).

Notably, SPRING leverages prior knowledge from reading the paper and requires zero training steps, while RL methods typically necessitate tens of millions of coaching steps.

Source: https://arxiv.org/pdf/2305.15486.pdf

The above figure represents a plot of unlock rates for various tasks, comparing SPRING to popular RL baselines. SPRING, empowered by prior knowledge, outperforms RL methods by greater than ten times on achievements similar to “Make Stone Pickaxe,” “Make Stone Sword,” and “Collect Iron,” that are deeper within the tech tree (as much as depth 5) and difficult to achieve through random exploration. 

Furthermore, SPRING performs perfectly on achievements like “Eat Cow” and “Collect Drink.” At the identical time, model-based RL frameworks like Dreamer-V3 have significantly lower unlock rates (over five times lower) for “Eat Cow” attributable to the challenge of reaching moving cows through random exploration. Importantly, SPRING doesn’t take motion “Place Stone” because it was not discussed as useful for the agent within the paper by Hafner (2021), despite the fact that it could possibly be easily achieved through random exploration.

Limitations

One limitation of using an LLM for interacting with the environment is the necessity for object recognition and grounding. Nonetheless, this limitation doesn’t exist in environments that provide accurate object information, similar to contemporary games and virtual reality worlds. While pre-trained visual backbones struggle with games, they perform reasonably well in real-world-like environments. Recent advancements in visual-language models indicate potential for reliable solutions in visual-language understanding in the longer term.

Conclusion

In summary, the SPRING framework showcases the potential of Language Models (LLMs) for game understanding and reasoning. By leveraging prior knowledge from academic papers and employing in-context chain-of-thought reasoning, SPRING outperforms previous state-of-the-art methods on the Crafter benchmark, achieving substantial improvements in-game rating and reward. The outcomes highlight the ability of LLMs in complex game tasks and suggest future advancements in visual-language models could address existing limitations, paving the best way for reliable and generalizable solutions.


Try the Paper. Don’t forget to affix our 22k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more. If you’ve any questions regarding the above article or if we missed anything, be happy to email us at Asif@marktechpost.com

🚀 Check Out 100’s AI Tools in AI Tools Club


I’m a Civil Engineering Graduate (2022) from Jamia Millia Islamia, Latest Delhi, and I even have a keen interest in Data Science, especially Neural Networks and their application in various areas.


➡️ Ultimate Guide to Data Labeling in Machine Learning

LEAVE A REPLY

Please enter your comment!
Please enter your name here