How to Use the Swallow Model for Text Generation

Jul 1, 2024 | Educational

The Swallow model, derived from the renowned LLaMA-2 family, has been tailored for enhanced performance in text generation, especially when working with Japanese language data. This guide will walk you through using this powerful model step by step, ensuring that you can leverage its capabilities in your projects.

Getting Started

First, you’ll need to ensure that you have the necessary library installed. The Swallow model relies on the Transformers library for implementation.

Install the required dependencies:

sh
pip install -r requirements.txt

Using the Swallow Instruct Model

To interact with the Swallow Instruct Model, follow these commands:

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "tokyotech-llm/Swallow-7b-instruct-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map="auto")

PROMPT_DICT = {
    "prompt_input": (
        "以下に、あるタスクを説明する指示があり、それに付随する入力が更なる文脈を提供しています。\n"
        "リクエストを適切に完了するための回答を記述してください。\n\n"
        "### 指示:\n{instruction}\n\n"
        "### 入力:\n{input}\n\n"
        "### 応答:\n"
    ),
    "prompt_no_input": (
        "以下に、あるタスクを説明する指示があります。\n"
        "リクエストを適切に完了するための回答を記述してください。\n\n"
        "### 指示:\n{instruction}\n\n"
        "### 応答:\n"
    ),
}

def create_prompt(instruction, input=None):
    """Generates a prompt based on the given instruction and an optional input."""
    if input:
        return PROMPT_DICT["prompt_input"].format(instruction=instruction, input=input)
    else:
        return PROMPT_DICT["prompt_no_input"].format(instruction=instruction)

# Example usage
instruction_example = "以下のトピックに関する詳細な情報を提供してください。"
input_example = "東京工業大学の主なキャンパスについて教えてください。"
prompt = create_prompt(instruction_example, input_example)

input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
tokens = model.generate(input_ids.to(device=model.device), max_new_tokens=128, temperature=0.99, top_p=0.95, do_sample=True)
out = tokenizer.decode(tokens[0], skip_special_tokens=True)

print(out)

Using the Base Model

If you prefer to use the base Swallow model without any additional instruction tuning, use the following commands:

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "tokyotech-llm/Swallow-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")

prompt = "東京工業大学の主なキャンパスは、"
input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")

tokens = model.generate(input_ids.to(device=model.device), max_new_tokens=128, temperature=0.99, top_p=0.95, do_sample=True)
out = tokenizer.decode(tokens[0], skip_special_tokens=True)

print(out)

Understanding the Code Analogy

Imagine you’re a chef in a restaurant (the model) and you have a detailed recipe book (the tokenizer) that guides you through making intricate dishes (text generation). Each dish requires various ingredients (input) – some might need specific spices (context), while others simply follow basic flavors (general instructions).

When a customer (user) places an order (instruction), you look up the recipe that outlines how to create the dish based on their preferences and any additional context they’ve provided. After gathering the ingredients, you whip up a delightful meal (output) that satisfies the customer’s taste. This entire process reflects how the Swallow model operates, where the instructions guide the generation, resulting in cohesive and meaningful text outputs.

Troubleshooting

If you encounter any issues while using the Swallow model, consider the following troubleshooting ideas:

Ensure all dependencies are properly installed as specified in the requirements.txt.
Check the model name and make sure it’s spelled correctly in your code.
Verify that you are using an appropriate environment that supports the necessary hardware for model execution.
Consider adjusting the parameters like temperature and top_p for better results.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

The Swallow model is a robust tool for text generation, particularly for Japanese and English languages. By following the steps outlined in this guide, you will be well-equipped to begin creating your own text generation applications.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox