In the digital age, enhancing accessibility to literature for various age groups is more important than ever. This blog post walks you through the journey of building a book classification system, specifically geared towards identifying age-appropriate books. We will explore the project undertaken by the Abra Muhara team for the TEKNOFEST Natural Language Processing competition.
Understanding the Project Goals
The main objective of the project is to classify the age range of a given book into categories of **0-8**, **8-12**, **12-15**, **15-18**, and **18+**. Additionally, it provides various statistics related to the book based on its text.
Project Phases
Here’s a step-by-step breakdown of the project’s phases:
- Users upload a book in PDF format, which is then converted to text.
- Using a fine-tuned BERTURK model, the sentences are classified based on inappropriate content.
- A custom word list is used to further classify words based on suitability.
- A comprehensive analysis is performed that includes metrics like sentence count, syllable count, average word per sentence, as well as various readability scores.
- Finally, the verified data is communicated back to the user about the book’s appropriateness.
Diving into the Age Classification Model
The model’s effectiveness is evaluated using various machine learning algorithms. After rigorous testing, the **CatBoost algorithm**, optimized through **Optuna**, emerged as the leading performer with an accuracy of **95.65%**. Here’s where the analogy comes into play:
Imagine you’re a chef trying to perfect a recipe. You have different ingredients (the algorithms) which you mix and match to get the best flavor (accuracy). Some recipes (algorithms) will add a spice that overwhelms the dish, while others will bring out the essence of the main ingredient. Just like a chef discovers the perfect combination through experimentation, we used various machine learning methods to find the one that works best for age classification.
Readability Scores
Readability scores help determine how easy a piece of text can be read, a critical aspect for age-appropriate content. The formulas used for calculating these scores include:
- COE (Çetinkaya Readability Index)
- Ateşman Score
- FRES (Flesch Readability Score)
Utilizing FastAPI for Accessibility
To enhance user accessibility, the final model is deployed using FastAPI within the Hugging Face Space environment, allowing users to interact without downloading models. They can easily determine the suitability of sentences or classify books based on their age ranges.
Troubleshooting
If you face issues while accessing the model or obtaining results, consider the following troubleshooting steps:
- Ensure that the PDF format is valid and easily convertible to text.
- Check internet connectivity when using FastAPI endpoints.
- If the response isn’t as expected, verify that your input data matches the required structure.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
In conclusion, the Abra Muhara team’s project demonstrates a robust method to ensure that literary works cater to the right audience based on age appropriateness. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.