SwiftSage

We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage integrates the strengths of behavior cloning and prompting large language models (LLMs) to enhance task completion performance.
The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. The Swift module is a small encoder-decoder LM fine-tuned on the oracle agent's action trajectories (i.e., imitation learning / behavior cloning), while the Sage module employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process.
In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflexion, demonstrating its effectiveness in solving complex real-world tasks.

There are three primary approaches to developing agents capable of addressing complex interactive reasoning tasks: (1) deep reinforcement learning (RL), (2) behavior cloning (BC) through sequence-to-sequence (seq2seq) learning, and (3) prompting large language models (LLMs). In addition to conventional RL methods such as DRRN, interactive reasoning can be framed as a seq2seq task, where the input text serves as the current state description and the output text corresponds to the subsequent action. By leveraging numerous gold trajectories generated by oracle agents, it becomes feasible to fine-tune Transformer models, like T5, to effectively imitate the behavior of these oracle agents. Recent studies have also demonstrated that generative agents based on prompting LLMs, such as GPT-4, can produce reasonable plans and actions.

SayCan is a straightforward agent that integrates an LLM with a value function of underlying policies regarding grounding affordances (i.e., the feasibility of an action in the environment). We need to provide the history and current environment as textual inputs to LLMs for generating a ranked list of action candidates. This action list is then reranked based on a value function.
ReAct presents a virtual ‘think’ action, enabling LLMs to generate sub-goals during action planning. This approach requires human annotators to supply examples of correct subgoals for each task type, employing few-shot in-context learning to teach LLMs when and how to ‘think’ in order to plan subsequent subgoals, in addition to providing complete action trajectories.
Reflexion, a recent work building on ReAct, proposes a multi-round approach enabling LLMs to use the history of previously failed rounds to refine their planning for the next round. This self-reflection mechanism helps LLMs improve after each failed attempt. However, this may not be practical in real-world applications for many tasks, as actions in failed trials can be irrecoverable.
Limitations: All three methods require a new LLM inference at each time step to predict the next immediate action, resulting in inefficient and costly agents. ReAct and Reflexion require annotations of correct subgoals for each unseen task. Moreover, it is difficult to generalize Reflexion to real world where trial-and-error approaches can be infeasible for embodied tasks.
SwiftSage (ours) is a hybrid agent framework, inspired by Fast and Slow thinking. In short, we use a smaller LM (e.g., T5) to perform imitation learning and only call LLMs (e.g., GPT-4) when it is necessary for planning. Check the next for more details.

[show more]

Citation

@inproceedings{
	lin2023swiftsage,
	title={SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks},
	author={Bill Yuchen Lin and Yicheng Fu and Karina Yang and Faeze Brahman and Shiyu Huang and Chandra Bhagavatula and Prithviraj Ammanabrolu and Yejin Choi and Xiang Ren},
	booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
	year={2023}
}

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Introduction

Background

SwiftSage Framework

Evaluation

Analysis

Misc.

Citation