Welcome to the world of Taiwanese Hokkien translation! If you are looking to bridge the language gap between Traditional Chinese, English, and Taiwanese Hokkien, you’re in for a treat with the Taigi-Llama-2-Translator. This model, built on the resilient Taigi-Llama-2 series, leverages fine-tuning on an impressive 263k pairs of parallel data. Here’s a friendly guide to get you started.
Understanding the Model
The Taigi-Llama-2-Translator is designed to cater to translation needs across various scripts. Here are the details:
- Base Model: BohanluTaigi-Llama-2-13B
- Usage: Translate between Traditional Chinese, English, and Taiwanese Hokkien (Hanzi, POJ, Hanlo).
- Model Size: 13 billion parameters
Using the Model
Now, let’s dive into how you can make the most of this translation model. The process requires a few key components, similar to baking a cake where precision is key:
Ingredients You’ll Need:
- Python – The programming language we’ll use.
- Transformers Library – For accessing the model and tokenizer.
- Pytorch – A deep learning framework.
- Accelerate – For managing resources effectively.
The Recipe
Let’s walk through the code to see how we can translate sentences. Think of it as layering your cake:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextGenerationPipeline
import torch
import accelerate
def get_pipeline(path:str, tokenizer:AutoTokenizer, accelerator:accelerate.Accelerator) -> TextGenerationPipeline:
model = AutoModelForCausalLM.from_pretrained(
path, torch_dtype=torch.float16, device_map='auto', trust_remote_code=True)
terminators = [tokenizer.eos_token_id, tokenizer.pad_token_id]
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer, num_workers=accelerator.state.num_processes*4, pad_token_id=tokenizer.pad_token_id, eos_token_id=terminators)
return pipeline
model_dir = "BohanluTaigi-Llama-2-Translator-13B"
tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=False)
accelerator = accelerate.Accelerator()
pipe = get_pipeline(model_dir, tokenizer, accelerator)
PROMPT_TEMPLATE = "[TRANS]\nsource_sentence\n[TRANS]\n[target_language]\n"
def translate(source_sentence:str, target_language:str) -> str:
prompt = PROMPT_TEMPLATE.format(source_sentence=source_sentence, target_language=target_language)
out = pipe(prompt, return_full_text=False, repetition_penalty=1.1, do_sample=False)[0]['generated_text']
return out[:out.find(']')].strip()
source_sentence = "How are you today?"
print("To Hanzi: " + translate(source_sentence, "HAN"))
print("To POJ: " + translate(source_sentence, "POJ"))
print("To Traditional Chinese: " + translate(source_sentence, "ZH"))
print("To Hanlo: " + translate(source_sentence, "HL"))
Breaking Down the Code
Imagine you’re making a sandwich. Each step adds a layer to your creation:
- Setting the Table: We start by importing the necessary libraries. This is akin to gathering your utensils.
- Baking the Foundation: The `get_pipeline` function initializes the model and tokenizer. It prepares the infrastructure for translation.
- Spreading the Ingredients: The `PROMPT_TEMPLATE` sets up the structure of your input sentence, guiding the model on what to do.
- Final Touch: The `translate` function is where the magic happens, generating the translated text according to the specified target language.
Troubleshooting
If you find yourself stuck, here are some common troubleshooting tips:
- Ensure all libraries (transformers, torch, accelerate) are properly installed and updated.
- Double-check the model directory and paths to ensure they lead to the correct resources.
- Make sure the input sentence is correctly formatted. Each target language code (ZH, EN, POJ, HL, HAN) must be accurate.
- If the output is not as expected, consider tweaking parameters like
repetition_penaltyor checking the model outputs for any tokens that may affect display.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
By applying this powerful model to your linguistic projects, you can enhance communication within Taiwanese Hokkien and its associated languages. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

