How to Use the PersianQA Dataset for Question Answering

Mar 18, 2023 | Data Science

Welcome to your ultimate guide on leveraging the PersianQA dataset for your Persian Question Answering (QA) projects. With over 9,000 entries sourced from Persian Wikipedia, this dataset serves as a robust resource for reading comprehension in Persian, featuring a mix of answerable and impossible questions to enhance your machine learning models.

Getting Started with PersianQA

The PersianQA dataset is designed to boost your Natural Language Processing (NLP) capabilities. To begin exploring the dataset, you can follow these steps:

Accessing the Dataset

  • Download the dataset from the dataset directory.
  • To utilize the dataset, you can install it using Python as follows:
python
import read_qa  # Available at src/read_ds.py
train_ds = read_qa("pqa_train.json")
test_ds  = read_qa("pqa_test.json")

Using HuggingFace Datasets

If you prefer using HuggingFace datasets, first install the library:

sh
pip install -q datasets

Then, load the PersianQA dataset:

python
from datasets import load_dataset
dataset = load_dataset("SajjadAyoubi/persian_qa")

Examples of Questions and Answers

Here’s a glimpse of the questions and answers that can be derived from the dataset:

Title           | Context                                                                                                                                                         | Question                                       | Answer
----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-----------------
خوب، بد، زشت    | خوب، بد، زشت یک فیلم درژانر وسترن اسپاگتی حماسی است...                                                                                                        | در فیلم خوب بد زشت شخصیت ها کجایی صحبت می کنند؟| مخلوطی از ایتالیایی و انگلیسی
قرارداد کرسنت  | قرارداد کرسنت قراردادی برای فروش روزانه معادل ۵۰۰ میلیون فوت مکعب...                                                                                      | طرفین قرار داد کرسنت کیا بودن؟              | کرسنت پترولیوم و شرکت ملی نفت ایران
چهارشنبه‌سوری   | چهارشنبه‌سوری یکی از جشن‌های ایرانی است که از غروب آخرین سه‌شنبه ی ماه اسفند...                                                                           | نام جشن اخرین شنبه ی سال چیست؟               | No Answer

Building Models with PersianQA

The models trained on this dataset, such as xlm-roberta-large-fa-qa and bert-base-fa-qa, yield impressive results in terms of F1 scores and exact match ratios.

To get started with model training and testing, you must ensure:

  • Install the necessary libraries:
sh
pip install transformers sentencepiece

Sample Usage with Transformers

If you’re new to Transformers, you can utilize pipelining as a shortcut:

python
from transformers import pipeline
model_name = "SajjadAyoubi/bert-base-fa-qa"
qa_pipeline = pipeline("question-answering", model=model_name, tokenizer=model_name)
text = "سلام من سجاد ایوبی هستم و به پردازش زبان طبیعی علاقه دارم"
questions = ["اسمم چیه؟", "علاقه مندیم چیه؟"]
for question in questions:
    print(qa_pipeline(context=text, question=question))

Troubleshooting Ideas

While you navigate through this project, you might encounter various challenges. Here are some troubleshooting tips:

  • Installation Issues: Ensure that you have installed the dependencies correctly using pip.
  • Model Performance: If the model yields unsatisfactory results, consider fine-tuning it further using more extensive training.
  • Data Handling: Verify the format of your input data to ensure compatibility with the model.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox