This guide will walk you through the process of utilizing a pretrained GPT-2 model specifically adapted for the Bulgarian language. By the end, you’ll be able to generate text, auto-complete sentences, or even correct spelling—all using the power of AI!
Understanding the Model
The model we’re working with is a small version of GPT-2, tailored for the Bulgarian language using techniques like progressive module replacing. It has been trained on various datasets including OSCAR, Chitanka, and Wikipedia. Whether you’re interested in text generation or fine-tuning for specific tasks, this model can help!
Intended Uses
- Text generation
- Auto-complete functionality
- Spelling correction
Additionally, it can be fine-tuned for downstream tasks, enabling tailored applications in your projects.
How to Use the Model in PyTorch
Let’s dive into how to implement this model. Below is a step-by-step guide that will help you through the code.
python
from transformers import AutoModel, AutoTokenizer
model_id = "rmihaylov/gpt2-small-theseus-bg"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
input_ids = tokenizer.encode("Здравей,", add_special_tokens=False, return_tensors='pt')
output_ids = model.generate(input_ids, do_sample=True, max_length=50, top_p=0.92, pad_token_id=2, top_k=0)
output = tokenizer.decode(output_ids[0])
output = output.replace("

