Welcome to your ultimate guide on leveraging the PersianQA dataset for your Persian Question Answering (QA) projects. With over 9,000 entries sourced from Persian Wikipedia, this dataset serves as a robust resource for reading comprehension in Persian, featuring a mix of answerable and impossible questions to enhance your machine learning models.
Getting Started with PersianQA
The PersianQA dataset is designed to boost your Natural Language Processing (NLP) capabilities. To begin exploring the dataset, you can follow these steps:
Accessing the Dataset
- Download the dataset from the dataset directory.
- To utilize the dataset, you can install it using Python as follows:
python
import read_qa # Available at src/read_ds.py
train_ds = read_qa("pqa_train.json")
test_ds = read_qa("pqa_test.json")
Using HuggingFace Datasets
If you prefer using HuggingFace datasets, first install the library:
sh
pip install -q datasets
Then, load the PersianQA dataset:
python
from datasets import load_dataset
dataset = load_dataset("SajjadAyoubi/persian_qa")
Examples of Questions and Answers
Here’s a glimpse of the questions and answers that can be derived from the dataset:
Title | Context | Question | Answer
----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-----------------
خوب، بد، زشت | خوب، بد، زشت یک فیلم درژانر وسترن اسپاگتی حماسی است... | در فیلم خوب بد زشت شخصیت ها کجایی صحبت می کنند؟| مخلوطی از ایتالیایی و انگلیسی
قرارداد کرسنت | قرارداد کرسنت قراردادی برای فروش روزانه معادل ۵۰۰ میلیون فوت مکعب... | طرفین قرار داد کرسنت کیا بودن؟ | کرسنت پترولیوم و شرکت ملی نفت ایران
چهارشنبهسوری | چهارشنبهسوری یکی از جشنهای ایرانی است که از غروب آخرین سهشنبه ی ماه اسفند... | نام جشن اخرین شنبه ی سال چیست؟ | No Answer
Building Models with PersianQA
The models trained on this dataset, such as xlm-roberta-large-fa-qa and bert-base-fa-qa, yield impressive results in terms of F1 scores and exact match ratios.
To get started with model training and testing, you must ensure:
- Install the necessary libraries:
sh
pip install transformers sentencepiece
Sample Usage with Transformers
If you’re new to Transformers, you can utilize pipelining as a shortcut:
python
from transformers import pipeline
model_name = "SajjadAyoubi/bert-base-fa-qa"
qa_pipeline = pipeline("question-answering", model=model_name, tokenizer=model_name)
text = "سلام من سجاد ایوبی هستم و به پردازش زبان طبیعی علاقه دارم"
questions = ["اسمم چیه؟", "علاقه مندیم چیه؟"]
for question in questions:
print(qa_pipeline(context=text, question=question))
Troubleshooting Ideas
While you navigate through this project, you might encounter various challenges. Here are some troubleshooting tips:
- Installation Issues: Ensure that you have installed the dependencies correctly using pip.
- Model Performance: If the model yields unsatisfactory results, consider fine-tuning it further using more extensive training.
- Data Handling: Verify the format of your input data to ensure compatibility with the model.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

