In today’s tutorial, we will explore how to build a BERT-based temporal tagged token classifier using the German Gelectra model. This innovative model allows us to tag specific tokens in plain text with temporal classifications, making it incredibly useful for various natural language processing tasks.
Understanding the Model
The German Gelectra model is a transformer model pretrained on a vast corpus of German data. It operates in a self-supervised manner, which means it learns patterns in the data without directly labeled input. This allows for a nuanced understanding of language, similar to how humans learn by observing rather than being explicitly taught.
Tagging Classification System
When using this model, the tokens in your text can be classified into several tags based on time-related contexts. Here’s a breakdown of the tagging system:
- O — Outside of a tag
- I-TIME — Inside tag of time
- B-TIME — Beginning tag of time
- I-DATE — Inside tag of date
- B-DATE — Beginning tag of date
- I-DURATION — Inside tag of duration
- B-DURATION — Beginning tag of duration
- I-SET — Inside tag of the set
- B-SET — Beginning tag of the set
How to Use the Model
Now that we understand the tagging system, let’s explore how to implement the Gelectra model for token classification in a user-friendly manner.
Step 1: Load the Model
You can load the model using the following code:
tokenizer = AutoTokenizer.from_pretrained('satyaalmasiantemporal_tagger_German_GELECTRA', use_fast=False)
model = BertForTokenClassification.from_pretrained('satyaalmasiantemporal_tagger_German_GELECTRA')
Step 2: Process Your Text
For inference, prepare your input text as follows:
processed_text = tokenizer(input_text, return_tensors='pt')
result = model(**processed_text)
classification = result[0]
Step 3: Post-Processing
For a comprehensive understanding of the output, use the function provided in the repository to merge tokens:
For detailed examples, refer to the repository.
Step 4: Fine-Tuning the Model
To further fine-tune the model, employ the Trainer from Hugging Face. You can check a similar fine-tuning example here.
Training Data
For pre-training, the model utilizes a large corpus of automatically annotated news articles through HeidelTime with two distinct data sources for fine-tuning:
- Tempeval-3 – Automatically translated into German.
- KRAUTS dataset.
Training Procedure
The model is trained using publicly available checkpoints on Hugging Face’s deepset Gelectra-large, with notable specifications:
- Batch size for pre-training: 192
- Learning rate: 1e-07 with Adam optimizer and linear weight decay
- Batch size for fine-tuning: 16
- Learning rate for fine-tuning: 5e-05
Training utilizes 2 NVIDIA A100 GPUs with 40GB of memory.
Troubleshooting
If you encounter issues, here are some troubleshooting tips:
- Make sure all dependencies and library versions are up to date.
- Double-check your input data format to ensure it matches the expected requirements of the model.
- Monitor the GPU memory usage to prevent out-of-memory errors.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

