In this blog, we will explore how to fine-tune the XLM-Roberta model for extracting answers from text. This process can be particularly useful in natural language processing tasks like question-answering.
Understanding the Team Behind the Project
The project is carried out by a dedicated team of third-year students from the University of Technology – Vietnam National University, Hanoi. The members of the team are:
- Nguyễn Quang Chiều
- Nguyễn Quang Huy
- Nguyễn Trần Anh Đức
This project represents the ‘Reader’ phase in their final year assignment for the course on Modern Issues in IT.
Performance Metrics
After fine-tuning the XLM-Roberta model on the UIT-vquad dataset, we achieved impressive results:
- Exact Match (EM): 60.63
- F1 Score: 79.63
How to Run the Model
Here’s how you can run the fine-tuned model using Python and the `transformers` library. Think of the coding steps as preparing a dish in the kitchen where you gather ingredients and follow the recipe:
from transformers import pipeline
# Replace this with your own checkpoint
model_checkpoint = "chieunq/XLM-R-base-finetuned-uit-vquad"
question_answerer = pipeline("question-answering", model=model_checkpoint)
context = "Nhóm của chúng tôi là sinh viên năm 4 trường ĐH Công Nghệ - ĐHQG Hà Nội. Nhóm gồm 3 thành viên: Nguyễn Quang Chiều, Nguyễn Quang Huy và Nguyễn Trần Anh Đức. Đây là pha Reader trong dự án cuồi kì môn Các vấn đề hiện đại trong CNTT của nhóm."
question = "3 thành viên trong nhóm gồm những ai?"
# Generate the answer
result = question_answerer(question=question, context=context)
In this analogy, the `pipeline` is similar to your cooking station, the `model_checkpoint` is like the main ingredient you’ve chosen, and the `context` provides the necessary background information for your dish. The final answer is the delightful product of your culinary efforts.
Output Interpretation
Upon running the code, the output produced will look something like this:
- Score: 0.9928902387619019
- Start: 98
- End: 158
- Answer: Nguyễn Quang Chiều, Nguyễn Quang Huy và Nguyễn Trần Anh Đức.
Troubleshooting Common Issues
If you encounter issues while implementing the above process, consider the following troubleshooting ideas:
- Ensure that all library versions are compatible with each other:
- Transformers: 4.24.0
- Pytorch: 1.12.1+cu113
- Datasets: 2.7.0
- Tokenizers: 0.13.2
- If you receive errors related to the pipeline, confirm that the model checkpoint is accessible and correctly specified.
- Verify that your Python environment has the necessary libraries installed and updated.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

