Unlocking the Power of Meta-Llama 3: A Guide to Function Calling and JSON Mode

May 15, 2024 | Educational

Have you ever wondered how to interact with complex machine learning models while keeping things simple? In this blog, we’ll explore the Meta-Llama-3-8B-Instruct model and its remarkable capabilities, specifically focusing on the JSON mode and function calling. Get ready to dive into user-friendly instructions that can boost your AI projects!

Model Description

The Meta-Llama-3-8B-Instruct model has been fine-tuned for function calling and JSON mode. This powerful model allows you to generate structured responses, making it an excellent tool for building applications that require a conversational interface.

Usage of JSON Mode

To get started using the JSON mode of the Meta-Llama-3-8B-Instruct, you’ll first need to set up your environment. Below is a concise guide to doing just that.

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "hiieuMeta-Llama-3-8B-Instruct-function-calling-json-mode"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant, answer in JSON with key 'message'"},
    {"role": "user", "content": "Who are you?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids(eot_id)]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Imagine you are a chef preparing a gourmet dish. Each ingredient you add contributes to the final flavor, just as each line of code contributes to the model’s understanding. The tokenizer serves as your trusted sous chef, prepping the ingredients (or in our case, the input data) in a format the model can digest. The model is the oven that combines everything and brings your dish to life, generating meaningful output based on what you’ve set up!

Function Calling

Function calling is an exciting feature that allows the model to perform specific actions based on user commands. It requires two-step inferences. Here’s how to set it up:

Step 1: Define the Function

python
functions_metadata = [
    {
        "type": "function",
        "function": {
            "name": "get_temperature",
            "description": "get temperature of a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "name of the city"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

messages = [
    {"role": "system", "content": f"You are a helpful assistant with access to the following functions: {str(functions_metadata)}\n\nTo use these functions respond with:\nfunctioncall  name: function_name, arguments: arg_1: value_1, ...   functioncall\n\nEdge cases you must handle:\n - If there are no functions that match the user request, respond politely that you cannot help."},
    {"role": "user", "content": "What is the temperature in Tokyo right now?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids(eot_id)]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Step 2: Execute the Function

python
messages = [
    {"role": "system", "content": f"You are a helpful assistant with access to the following functions: {str(functions_metadata)}\n\nTo use these functions respond with:\nfunctioncall  name: function_name, arguments: arg_1: value_1, ...   functioncall\n\nEdge cases you must handle:\n - If there are no functions that match the user request, respond politely that you cannot help."},
    {"role": "user", "content": "What is the temperature in Tokyo right now?"},
    # Extract the tag functioncall from the previous prediction and append it:
    {"role": "assistant", "content": "functioncall name: get_temperature, arguments: city: Tokyo functioncall"},
    {"role": "user", "content": "function_response temperature:30 C function_response"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids(eot_id)]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

In this two-step dance, the model first prepares to ask about the temperature and then executes that question like a performer ready for a grand finale. The intricate communication between steps allows for smooth user interactions and dynamic responses!

Troubleshooting Ideas

If you encounter any issues while implementing the above code, here are some troubleshooting ideas:

  • Ensure that you have all required libraries installed, especially transformers and torch.
  • Check if your model_id is correctly set and accessible.
  • Verify that eot_id is defined if you plan to use it for terminators.
  • If the model returns errors, examine the structure of your messages and functions; a mismatch could lead to confusion.
  • Consider experimenting with temperature and top_p parameters for different output variability.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With the capabilities of the Meta-Llama-3 model at your fingertips, you can create sophisticated applications effortlessly. By leveraging JSON mode and function calling, you can ensure that your interactions are not only efficient but also meaningful.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox