Executable Code Actions Elicit Better LLM Agents

Feb 6, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_19_161

Code •
Paper •
Data (CodeActInstruct) •
Model (CodeActAgent-Mistral-7b-v0.1) •
Chat with CodeActAgent!

In the world of Large Language Models (LLMs), the ability to connect actions with executable code can significantly enhance the performance of agents. This article discusses our innovative approach, CodeAct, which integrates a Python interpreter enabling LLM agents to interactively execute code and adjust their actions based on newly obtained results throughout multi-turn dialogues.

Why CodeAct?

Our research, analyzing 17 different LLMs on the API-Bank and the newly devised benchmark, Msup3supToolEval, reveals that CodeAct exhibits a 20% increase in success rate compared to other traditional formats like Text and JSON. Such a remarkable breakthrough demonstrates the effectiveness of CodeAct in streamlining actions within LLMs.

Introducing CodeActInstruct

The backbone of our methodology is the CodeActInstruct dataset, comprising 7,000 multi-turn interactions utilizing CodeAct. The dataset is available for download at Hugging Face Dataset. We encourage you to explore our paper for an in-depth understanding of how this dataset was constructed.

The Power of CodeActAgent

CodeActAgent, trained on the CodeActInstruct dataset and various conversational formats, performs exceptionally well, particularly in out-of-domain tasks when juxtaposed with similarly sized open-source models. We’re excited to present two variants of the CodeActAgent:

**CodeActAgent-Mistral-7b-v0.1**: This is the recommended version, utilizing Mistral-7b-v0.1 as the base model featuring a 32k context window. Access it here.
**CodeActAgent-Llama-7b**: This version uses Llama-2-7b as its base model, with a 4k context window. Find it here.

Understanding the Code

Let’s liken our CodeAct approach to a chef preparing a meal. Traditional methods, such as using Text and JSON, are like a chef who only references recipes without adjusting the ingredients based on the taste during the cooking process. In contrast, CodeAct acts like a chef equipped with tasting spoons, allowing them to sample the dish as it cooks. This enables them to modify seasonings or ingredients in real-time, producing a far superior outcome—much like our LLMs can modify their actions based on executing code and receiving dynamic feedback.

Troubleshooting

When implementing CodeAct, you might encounter a few issues. Here are some common troubleshooting ideas:

Execution Errors: Verify that the code you are trying to execute is properly formatted and compatible with the Python interpreter.
Integration Failures: Check your integration steps with CodeAct and ensure that all modules are correctly linked.
Performance Issues: If you’re experiencing slow responses, consider optimizing the logic of your code actions or examining your computational resources.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox