Welcome to the colorful world of IndicTrans2, where language barriers are dismantled, and communication becomes seamless. IndicTrans2 allows you to translate between numerous Indian languages and English effortlessly using state-of-the-art transformer models. In this guide, we’ll walk you through how to set up and use IndicTrans2 for translation.
Getting Started with IndicTrans2
To start using IndicTrans2, you’ll need to ensure you have the necessary libraries. The main components you need to import are torch and the necessary classes from the transformers library.
python
import torch
from transformers import (
AutoModelForSeq2SeqLM,
AutoTokenizer,
)
from IndicTransTokenizer import IndicProcessor
How to Set Up Your Translation Environment
Follow these steps to prepare your environment:
- Download the IndicTrans2 model:
- Initialize the tokenizer and model:
Use the model name ai4bharat/indictrans2-indic-en-dist-200M to set up the necessary components.
python
model_name = "ai4bharat/indictrans2-indic-en-dist-200M"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, trust_remote_code=True)
ip = IndicProcessor(inference=True)
You can input sentences in your source language (in this case, Hindi) for translation. Here are some examples:
python
input_sentences = [
"जब मैं छोटा था, मैं हर रोज़ पार्क जाता था।",
"हमने पिछले सप्ताह एक नई फिल्म देखी जो कि बहुत प्रेरणादायक थी।",
]
Running the Translation
Now, let’s dive into the fascinating part — running the translation!
- Preprocess your input sentences:
python
src_lang, tgt_lang = "hin_Deva", "eng_Latn"
batch = ip.preprocess_batch(input_sentences, src_lang=src_lang, tgt_lang=tgt_lang)
python
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
python
inputs = tokenizer(batch, truncation=True, padding="longest", return_tensors="pt", return_attention_mask=True).to(DEVICE)
python
with torch.no_grad():
generated_tokens = model.generate(
**inputs,
use_cache=True,
min_length=0,
max_length=256,
num_beams=5,
num_return_sequences=1,
)
python
with tokenizer.as_target_tokenizer():
generated_tokens = tokenizer.batch_decode(
generated_tokens.detach().cpu().tolist(),
skip_special_tokens=True,
clean_up_tokenization_spaces=True,
)
python
translations = ip.postprocess_batch(generated_tokens, lang=tgt_lang)
for input_sentence, translation in zip(input_sentences, translations):
print(f"src_lang: {input_sentence}")
print(f"tgt_lang: {translation}")
Understanding the Code with an Analogy
Imagine you are a chef in a bustling restaurant. Your job is to turn raw ingredients (input sentences) into delightful dishes (translated sentences). Here’s how the process works:
- Ingredients Gathering: You collect various ingredients (import necessary libraries).
- Prepping the Ingredients: You chop and marinate them (tokenizing and preparing your sentences).
- Cooking: You cook them in a special pot (the model processes the inputs to create translations).
- Plating: Finally, you arrange the cooked meal beautifully on a plate (postprocessing your translations to clean and format them).
Troubleshooting Tips
If you run into issues, here are some ideas to help you out:
- Library not found: Ensure that all libraries (torch, transformers) are installed with suitable versions.
- CUDA Errors: If the code fails to recognize your GPU, make sure that the appropriate CUDA toolkit is installed.
- Tokenization Issues: Verify that you are using the latest version of the IndicTransTokenizer.
- If you need more assistance, for more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
By following these steps, you can easily utilize IndicTrans2 to translate text between Hindi and English. This powerful model promises to make multilingual communication a breeze!
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

