How to Integrate llama.cpp into ROS 2 with llama_ros

Oct 17, 2020 | Data Science

homemayankDocumentsarticle-generation-using-llmresized_images_gitlangchainreadme_mgonzs13_llama_ros

Welcome to the intriguing world of ROS 2 and large language models (LLMs)! Today, we’ll explore how to seamlessly integrate llama.cpp into your ROS 2 projects using the llama_ros packages. Whether you’re working on sophisticated robotics applications or eager to explore AI capabilities, this guide is tailored for you.

Related Projects
Installation
Usage
Demos

chatbot_ros – A chatbot integrated into ROS 2 that utilizes whisper_ros for speech recognition and llama_ros for response generation.
explainable_ros – A tool to explain robot behaviors, integrating LangChain and using logs stored in a database to provide relevant answers with llama_ros.

Installation

Before running llama_ros, ensure you have the appropriate tools in place. Here’s how you can install everything you need:

shell
$ cd ~ros2_ws/src
$ git clone https://github.com/mgonzs13/llama_ros.git
$ pip3 install -r llama_ros/requirements.txt
$ cd ~ros2_ws
$ rosdep install --from-paths src --ignore-src -r -y
$ colcon build --cmake-args -DGGML_CUDA=ON # Use this for CUDA support

Usage

With everything installed, let’s dive into how to use the llama_ros package effectively.

llama_cli

The llama_cli commands are designed to expedite testing of GGUF-based LLMs within the ROS 2 realm. Here’s how to use it:

shell
# Launch the LLM from a YAML file
$ ros2 llama launch ~ros2_ws/src/llama_ros/llama_bringup/params/StableLM-Zephyr.yaml

# Send a prompt to a launched LLM
$ ros2 llama prompt "Do you know ROS 2?" -t 0.0

Launch Files

To use llama_ros or llava_ros, you need to create a launch file that contains key parameters. Let’s understand this process through an analogy:

Imagine you’re a chef assembling your ingredients (parameters) for a recipe. Each component plays a unique role, just like how each parameter in the launch file contributes to launching the model.

Here’s a brief example of how this setup looks in both Python and YAML:

python
from launch import LaunchDescription
from llama_bringup.utils import create_llama_launch

def generate_launch_description():
    return LaunchDescription([
        create_llama_launch(
            n_ctx=2048,
            n_batch=8,
            n_gpu_layers=0,
            n_threads=1,
            n_predict=2048,
            model_repo="TheBlokeMarcoroni-7B-v3-GGUF",
            model_filename="marcoroni-7b-v3.Q4_K_M.gguf",
            system_prompt_type="alpaca"
        )
    ])

yaml
n_ctx: 2048
n_batch: 8
n_gpu_layers: 0
n_threads: 1
n_predict: 2048
model_repo: cstr/Spaetzle-v60-7B-GGUF
model_filename: Spaetzle-v60-7b-q4-k-m.gguf
system_prompt_type: Alpaca

LoRA Adapters

LoRA adapters can be loaded when launching LLMs. For instance, you might want to mix various flavors (adapters) to enhance the dish (model). Below is an example:

yaml
n_ctx: 2048
n_batch: 8
n_gpu_layers: 0
n_threads: 1
n_predict: 2048
model_repo: bartowski/Phi-3.5-mini-instruct-GGUF
model_filename: Phi-3.5-mini-instruct-Q4_K_M.gguf
lora_adapters:
  - repo: zhhan/adapter-Phi-3-mini-4k-instruct_code_writing
    filename: Phi-3-mini-4k-instruct-adaptor-f16-code_writer.gguf
    scale: 0.5
  - repo: zhhan/adapter-Phi-3-mini-4k-instruct_summarization
    filename: Phi-3-mini-4k-instruct-adaptor-f16-summarization.gguf
    scale: 0.5
system_prompt_type: Phi-3

ROS 2 Clients

The llama_ros and llava_ros packages provide interfaces for interacting with the models. Below are some examples:

Tokenize

python
from rclpy.node import Node
from llama_msgs.srv import Tokenize

class ExampleNode(Node):
    def __init__(self):
        super().__init__("example_node")
        self.srv_client = self.create_client(Tokenize, "llama/tokenize")
        req = Tokenize.Request()
        req.prompt = "Example text"
        self.srv_client.wait_for_service()
        res = self.srv_client.call(req)
        tokens = res.tokens

Generate Response

python
from rclpy.node import Node
from rclpy.action import ActionClient
from llama_msgs.action import GenerateResponse

class ExampleNode(Node):
    def __init__(self):
        super().__init__("example_node")
        self.action_client = ActionClient(self, GenerateResponse, "llama/generate_response")
        goal = GenerateResponse.Goal()
        goal.prompt = "Your prompt"
        goal.sampling_config.temp = 0.2
        self.action_client.wait_for_server()
        send_goal_future = self.action_client.send_goal_async(goal)
        # Process the result

LangChain

An integration with LangChain is also available, allowing you to leverage advanced techniques. An example is outlined below:

python
import rclpy
from llama_ros.langchain import LlamaROS
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

rclpy.init()
llm = LlamaROS()
prompt_template = "Tell me a joke about {topic}"
prompt = PromptTemplate(input_variables=["topic"], template=prompt_template)
chain = prompt | llm | StrOutputParser()
text = chain.invoke(topic="bears")
print(text)
rclpy.shutdown()

Demos

To see everything in action, you can launch the demos:

shell
# For llama_ros
$ ros2 launch llama_bringup spaetzle.launch.py

# For llava_ros
$ ros2 launch llama_bringup minicpm-2.6.launch.py

Troubleshooting

If you encounter any issues during installation or usage, here are some troubleshooting tips:

Ensure your environment is properly set up with the ROS 2 installation and compatible versions of Python and dependencies.
If you face issues related to CUDA, double-check your CUDA Toolkit installation and compatibility with your hardware.
Check the configuration files for errors in YAML syntax or incorrect parameter values.
If your model is not launching, try revising the model repository or filename in your launch files.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox