The National Library of Sweden (KBLab) has released three exciting pretrained language models based on BERT and ALBERT, aiming to enhance your Natural Language Processing (NLP) tasks in Swedish. In this guide, we will tackle how to set up and use these models effectively, along with troubleshooting tips to get you started smoothly.
What are the Available Models?
You have three options at your disposal:
- bert-base-swedish-cased (v1) – A standard BERT model trained with the same hyperparameters as initially published by Google.
- bert-base-swedish-cased-ner (experimental) – A fine-tuned BERT specifically for Named Entity Recognition (NER) using SUC 3.0.
- albert-base-swedish-cased-alpha (alpha) – An early attempt at creating an ALBERT model for Swedish.
Setting Up Your Environment
To efficiently experiment with these models, ensure that you have the correct dependencies in your environment. Here are the step-by-step instructions:
- Clone the repository:
git clone https://github.com/Kungbib/swedish-bert-models - Change into the cloned directory:
cd swedish-bert-models - Create a virtual environment:
python3 -m venv venv - Activate the virtual environment:
source venv/bin/activate - Upgrade pip and install requirements:
pip install --upgrade pip -
pip install -r requirements.txt
Loading the BERT Models
Once your environment is ready, you can load the BERT models using Huggingface Transformers. Think of BERT as a very intelligent librarian; it has read thousands of books and can help you find the right information. Here’s how to bring it into your code:
from transformers import AutoModel, AutoTokenizer
# Load BERT Base Swedish
tok = AutoTokenizer.from_pretrained("KBLab/bert-base-swedish-cased")
model = AutoModel.from_pretrained("KBLab/bert-base-swedish-cased")
Using the Fine-tuned BERT for Swedish NER
If you want to detect entities like names and organizations, utilize the fine-tuned NER model. Imagine this model as a specialized assistant identifying important keywords in a large text:
from transformers import pipeline
nlp = pipeline("ner", model="KBLab/bert-base-swedish-cased-ner", tokenizer="KBLab/bert-base-swedish-cased-ner")
result = nlp("Idag släpper KB tre språkmodeller.")
print(result) # Outputs detected entities
Troubleshooting Common Issues
Should you run into any hiccups along your coding journey, try these solutions:
- Model Not Found: Ensure that the model names are correct and the internet connection is stable, as the models load from Huggingface’s servers.
- Installation Errors: Double-check your Python version and package installations. If using a lower version of Transformers, follow the specific guidelines mentioned in the README.
- Token Splitting Issues: Remember that some words might be split into tokens prefixed by ‘##’. Use a loop to concatenate the tokens if necessary.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

