Home Artificial Intelligence Knowledge-Enhanced Agents for Interactive Text Games Introduction: Background and Motivation: The Core Innovation — Knowledge Injection Framework: 1. Online Policy Optimization through Rewards (RL Methods) 2. Single-step Offline Prediction (LM Methods) Experiment Setup Experimental Insights: Concluding Thoughts: References APPENDIX

Knowledge-Enhanced Agents for Interactive Text Games Introduction: Background and Motivation: The Core Innovation — Knowledge Injection Framework: 1. Online Policy Optimization through Rewards (RL Methods) 2. Single-step Offline Prediction (LM Methods) Experiment Setup Experimental Insights: Concluding Thoughts: References APPENDIX

0
Knowledge-Enhanced Agents for Interactive Text Games
Introduction:
Background and Motivation:
The Core Innovation — Knowledge Injection Framework:
1. Online Policy Optimization through Rewards (RL Methods)
2. Single-step Offline Prediction (LM Methods)
Experiment Setup
Experimental Insights:
Concluding Thoughts:
References
APPENDIX

Revolutionizing Interactive Text Games with Knowledge-Enhanced AI Agents

Towards Data Science

Communication through natural language is crucial to machine intelligence [9]. The recent progress in computational language models (LMs) has enabled strong performance on tasks with limited interaction, like question-answering and procedural text understanding [10]. Recognizing that interactivity is a vital aspect of communication, the community has turned its attention towards training and evaluating agents in interactive fiction (IF) environments, like text-based games, which offer a singular testing ground for investigating the reasoning abilities of LMs and the potential for Artificial Intelligence (AI) agents to perform multi-step real-world tasks in a constrained environment. As an illustration, in Figure 1, an agent must pick a fruit within the front room and place it in a blue box within the kitchen. In these games, agents navigate complex environments using text-based inputs, which demands a classy understanding of natural language and strategic decision-making from AI agents. To achieve these games, agents must manage their knowledge, reason, and generate language-based actions that produce desired and predictable changes in the sport world.

Figure 1. Illustration of an Interactive Fiction (IF) game, where an agent must perform the duty of picking a fruit (e.g., an apple) then placing it in a blue box within the kitchen.

Prior work has shown that Reinforcement Learning- and Language Model-based agents struggle to reason about or to elucidate science concepts in IF environments [1], which raises questions on these models’ ability to generalize to unseen situations beyond what has been observed during training [2]. For instance, while tasks corresponding to ‘retrieving a known substance’s melting (or boiling) point’ could also be relatively easy, ‘determining an unknown substance’s melting (or boiling) point in a particular environment’ will be difficult for these models. To enhance generalization, it might be effective to include world knowledge, e.g., about object affordances, yet no prior work has investigated this direction. As well as, existing models struggle to learn effectively from environmental feedback. As an illustration, when examining the conductivity of a particular substance, the agent must understand that it has already obtained the crucial wires and the actual substance in order that it then proceeds to locate an influence source. Due to this fact, there’s a necessity for a framework that may analyze and evaluate the effectiveness of several types of knowledge and knowledge-injection methods for text-based game agents.

Our paper, “Knowledge-enhanced Agents for Interactive Text Games,” introduces a novel framework to boost AI agents’ performance in these IF environments.

Published Version: https://dl.acm.org/doi/10.1145/3587259.3627561

We’re proud to announce that our paper has been awarded the Best Student Paper on the KCAP 2023 Conference, a testament to our team’s modern research and dedication. 🏆🏆🏆

Our work introduces a singular framework to reinforce AI agents with specific knowledge. The framework comprises two key components:

  1. Memory of Correct Actions (MCA): This feature enables AI agents to recollect and leverage past correct actions. The agent can formulate simpler strategies and avoid repetitive mistakes by maintaining a memory of what has worked before. MCA is decided by the environment feedback. If an motion yields a reward, then it is taken into account correct. Due to this fact correct actions can’t be fed to the agent initially, but are as a substitute stored in memory because the agent progresses through the (train/test time) episode.
  2. Affordance Knowledge (Aff): Understanding the potential interactions with objects in the sport world is crucial. We expect that affordances will help models learn higher by listing the possible interactions with the objects around them. Unlike historical knowledge, the environment doesn’t provide the affordances, they usually have to be retrieved from external sources. For this purpose, we use ConceptNet and procure its capableOf and usedFor relations for the objects in a given IF game episode.

We implemented this framework in two AI agent architectures:

  1. Online Policy Optimization through Rewards (RL Methods)
  2. Single-step Offline Prediction (LM Methods)

Pure RL-based Model — DRRN [3] (Fig. 2)

The baseline DRRN model uses only the inputs of remark, inventory, and task description to compute Q-values for every motion. To boost the DRRN baseline, we now have injected external knowledge into the model and created three recent variations of DRRN:

  1. aff: Using a definite GRU encoding layer, we introduce the affordances of the objects presented within the inputs to the baseline model.
  2. mca: A separate GRU encoding layer is utilized on this model to pass all previously correct actions to the baseline model.
  3. aff ⊕ mca: The encoding of this architecture is comprised of each the agent’s previous correct actions and the affordance as distinct components.
Figure 2: DRRN architecture, enhanced with the memory of previous correct actions and object affordances.

RL-enhanced KG Model — KG-A2C [4] (Fig. 3)

As baseline, we use a modified version of KG-A2C, where we utilize a single golden motion sequence provided by the environment because the goal, though there may exist multiple possible golden sequences. We found this goal to perform higher than the unique goal of predicting a sound motion. We devise the next knowledge-injection strategies to include
memory of correct actions and affordance knowledge for KG-A2C:

  1. mca: On top of the baseline, we incorporate all previously correct
    actions by utilizing a separate GRU encoding layer and concatenate the
    output vector together with other output representations.
  2. aff: The KG component within the KG-A2C model provides us with a convenient technique to add more knowledge. Specifically, we directly add the affordance knowledge into the KG as additional triples on top of the
    baseline model. For instance, given the present relation within the KG
    (front room, hasA, apple) we are able to add the affordance relation: (apple,
    usedFor, eating). In this fashion, the KG encoding network can produce
    a more meaningful representation of the sport state and potentially
    guide the model to supply higher actions. In our experiments, we
    compare this approach to adding affordance knowledge using a
    separate GRU encoding layer, much like the DRRN case.
  3. aff ⊕ mca: We include each affordances within the KG and the memory of all
    previous correction actions with a separate GRU encoding layer.
Figure 3: KG-A2C model architecture with integrated affordances and former correct actions.

Pre-trained LM — RoBERTa [5] (Fig. 4)

Here we view the duty as multiple-choice QA. At each step, the present game state is treated because the query and must predict the following motion from a set of candidates. Much like RL agents, the model is given the environment remark (𝑜𝑏𝑣), inventory (𝑖𝑛𝑣), and task description (𝑑𝑒𝑠𝑐) at every step. Then we concatenate it with each motion and let the LM select the motion with the best rating. Given the massive set of possible actions, we only randomly select 𝑛=4 distractor actions during training to cut back the computational burden, the LM is trained with cross-entropy loss to pick out the proper motion. At inference time, the model assigns scores for all valid actions, and we use top-p sampling for motion selection to forestall it from being stuck in an motion loop. We formalize three knowledge-injection strategies for the baseline RoBERTa model.

  1. mca: Here, we enable the LM to concentrate on its past correct actions by incorporating an MCA that lists them as a string, appended to the unique input. On account of token limitations of RoBERTa, we use a sliding window with size 𝐴=5, i.e., at each step, the model sees at most the past
    𝐴 correct actions.
  2. aff: We inject affordance knowledge into the LM by first adapting it on a subset of the Commonsense Knowledge Graph containing object utilities. We adapt the model via an auxiliary QA task following prior knowledge injection work [6]. We use pretraining as a substitute of easy concatenation for input resulting from the substantial volume of affordance knowledge triples, which can’t be simply concatenated to the input of RoBERTa resulting from limited input length. Pre-training on affordances through an auxiliary QA task alleviates this challenge, while still enabling the model to learn the relevant knowledge. We then finetune our task model on top of the utility-enhanced model, as described within the baseline.
  3. aff ⊕ mca: This variation simply combines mca and aff.
Figure 4: RoBERTa architecture trained using distractors.

Instruction-tuned LM — Flan T5 [7][8] (Fig. 5)

The Swift model inherently integrates the historical context of the preceding ten actions. Notably, in contrast to the three previously examined models that exclusively consider the history of the last ten correct actions, the Swift model adheres to its original design by encompassing the complete history of the ten previous actions. To ascertain a comparable baseline model to the methodology applied within the preceding three architectures, we omit the motion history from the Swift model. The unaltered variation of Swift is herein denoted because the mca version. Moreover, incorporation of affordance into the baseline model yields the aff model. Similarly, integration of affordances inside the mca version led to the formation of the aff ⊕ mca model. These affordances are introduced into the first input sequence immediately following the inventory data and preceding details about visited rooms.

Figure 5: Swift architecture trained in a Seq2Seq manner.

Environment: We now have used ScienceWorld [1], a fancy text-based virtual world presented in English. It features 10 interconnected locations and houses 218 unique objects, including various items from instruments and electrical components to plants, animals, and on a regular basis objects like furniture and books. The sport offers a wealthy array of interactions, with 25 high-level actions and as much as 200,000 possible mixtures per step, though only a number of are practically valid. ScienceWorld has 10 tasks with a complete set of 30 sub-tasks. On account of the range inside ScienceWorld, each task functions as a person benchmark with distinct reasoning abilities, knowledge requirements, and ranging numbers of actions needed to realize the goal state. Furthermore, each sub-task has a set of mandatory objectives that have to be met by any agent (corresponding to specializing in a non-living object and putting it in a red box within the kitchen). For experimentation purposes, we chosen a single representative sub-task from each of the ten tasks. Task details are mentioned in Appendix (at the top of this text).

Rewards and Scoring System: The reward system in ScienceWorld is designed to guide the agent towards preferred solutions. The environment provides a numeric rating and a boolean indicator of task completion for each motion performed. An agent can take as much as 100 steps (actions) in each episode. The ultimate rating, ranging between 0 and 100, reflects how well the agent achieves the episode goal and its sub-goals. An episode concludes, and the cumulative rating is calculated when the agent completes the duty or reaches the 100-step limit.

  • Knowledge injection helps agents in text-based games — In 34 out of 40 cases, our knowledge injection strategies improve over the baseline models.
  • Affordance knowledge is more useful than the memory of correct actions — Affordance models obtain the perfect ends in 15 cases, followed by including MCA (8 cases). Including each knowledge types together led to the perfect ends in 11 cases
  • When it comes to the general impact across tasks, the LM variants, RoBERTa and Swift, profit essentially the most on average from including affordances, resulting in a relative increase of 48% and eight% respectively, over the baselines. An example is illustrated in Fig. 6, where LM models are greatly benefitted from affordance addition.
Figure 6: Actions taken by affordance models on Task 4. Blue = step index, green = cumulative rating, and yellow = correct motion.
  • Variable effect across tasks relies on the duty relevance of the injected knowledge — The variable effect across tasks was steadily resulting from the relevance of the injected knowledge to the duty at hand, with certain tasks (e.g., electricity) benefiting more from the injection.
  • Injecting affordances is simplest via KGs; incorporating them as raw inputs increased the training complexity for the models — We explore multiple variations of injecting affordance knowledge into KG-A2C (Fig. 7): by adding it as input into the remark, inventory, and outline, making a separate GRU encoding layer for affordance, and adding affordance to the KG itself. We evaluate the performance of every method on three sub-tasks: easy, medium, and hard.
Figure 7: Effect of 5 ways so as to add affordances in KG-A2C.

Our research represents a major stride toward more sophisticated AI agents. By equipping them with the power to learn from past actions and understand their environment deeply, we pave the best way for AI that plays games and interacts intelligently and intuitively in various elements of our lives. The framework will be prolonged to other AI applications, corresponding to virtual assistants or educational tools, where understanding and interacting with the environment is crucial.

Few-shot prompting of huge LMs has recently shown promise on reasoning tasks, in addition to clear advantages from interactive communication and input clarification. Exploring their role in interactive tasks, either as solutions that require less training data or as components that may generate synthetic data for knowledge distillation to smaller models, is a promising future direction.

Should you like our work, please cite it 😁

@inproceedings{chhikara,
writer = {Chhikara, Prateek and Zhang, Jiarui and Ilievski, Filip and Francis, Jonathan and Ma, Kaixin},
title = {Knowledge-Enhanced Agents for Interactive Text Games},
yr = {2023},
doi = {10.1145/3587259.3627561},
booktitle = {Proceedings of the twelfth Knowledge Capture Conference 2023},
pages = {157–165},
numpages = {9},
series = {K-CAP '23}
}

[1] Ruoyao Wang, Peter Alexander Jansen, Marc-Alexandre Côté, and Prithviraj Ammanabrolu. 2022. ScienceWorld: Is your Agent Smarter than a fifth Grader? EMNLP (2022).

[2] Peter Jansen, Kelly J. Smith, Dan Moreno, and Huitzilin Ortiz. 2021. On the Challenges of Evaluating Compositional Explanations in Multi-Hop Inference: Relevance, Completeness, and Expert Rankings. In Proceedings of EMNLP.

[3] Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, and Mari Ostendorf. 2016. Deep Reinforcement Learning with a Natural Language Motion Space. In Proceedings of ACL.

[4] Prithviraj Ammanabrolu and Matthew Hausknecht. 2020. Graph Constrained Reinforcement Learning for Natural Language Motion Spaces. In ICLR.

[5] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. (2019).

[6] Filip Ilievski, Alessandro Oltramari, Kaixin Ma, Bin Zhang, Deborah L McGuinness, and Pedro Szekely. 2021. Dimensions of commonsense knowledge. Knowledge-Based Systems 229 (2021), 107347.

[7] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al . 2022. Scaling instruction-finetuned language models.

[8] Bill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, and Xiang Ren. 2023. SwiftSage: A Generative Agent with Fast and Slow Pondering for Complex Interactive Tasks.

[9] Noam Chomsky 2014. Points of Theory of Syntax. Vol. 11. MIT press.

[10] Yifan Jiang, Filip Ilievski and Kaixin Ma. 2023. Transferring Procedural Knowledge across Commonsense Tasks. In ECAI

Task Descriptions

  1. Task 1 — Matter: Your task is to freeze water. First, deal with the substance. Then, take actions that can cause it to vary its state of
    matter.
  2. Task 2 — Measurement: Your task is to measure the melting point of chocolate, which is situated across the kitchen. First, deal with the thermometer. Next, deal with the chocolate. If the melting point of chocolate is above -10.0 degrees, deal with the blue box. If the melting point of chocolate is below -10.0 degrees, deal with the orange box. The boxes are situated across the kitchen.
  3. Task 3 — Electricity: Your task is to activate the red light bulb by powering it using a renewable power source. First, deal with the red light bulb. Then, create an electrical circuit that powers it on.
  4. Task 4 — Classification: Your task is to seek out a(n) non-living thing. First, deal with the thing. Then, move it to the red box within the kitchen.
  5. Task 5 — Biology I: Your task is to grow a apple plant from seed. Seeds will be present in the kitchen. First, deal with a seed. Then, make changes to the environment that grow the plant until it reaches the reproduction life stage.
  6. Task 6 — Chemistry: Your task is to make use of chemistry to create the substance ‘salt water’. A recipe and a few of the ingredients could be found near the kitchen. When you’re done, deal with the salt water.
  7. Task 7 — Biology II: Your task is to seek out the animal with the longest life span, then the shortest life span. First, deal with the animal with the longest life span. Then, deal with the animal with the shortest life span. The animals are within the ’outside’ location.
  8. Task 8 — Biology III: Your task is to deal with the 4 life stages of the turtle, ranging from earliest to latest.
  9. Task 9 — Forces: Your task is to find out which of the 2 inclined planes (unknown material C, unknown material H) has essentially the most
    friction. After completing your experiment, deal with the inclined plane with essentially the most friction.
  10. Task 10 — Biology IV: Your task is to find out whether blue seed color is a dominant or recessive trait within the unknown E plant. If the trait is dominant, deal with the red box. If the trait is recessive, deal with the green box.

ScienceWorld Gameplay Example

Task: 4 (discover a non-living thing)
Variation: 239 (DRRN baseline)
Description: Your task is to seek out a(n) non-living thing. First, deal with the thing. Then, move it to the purple box within the workshop.

LEAVE A REPLY

Please enter your comment!
Please enter your name here