How to Detect AI-Generated Essays with EssAI

Aug 17, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_22_258

In today’s digitally-driven world, the authenticity of written content is a critical concern, especially for educators and researchers. The EssAI project takes on this challenge by providing a reliable tool for detecting AI-generated essays. In this article, we will guide you through the installation and usage of the EssAI model, as well as explore its unique features and functionalities.

Overview
Features
Files
Installation
Usage
Model Details
Dataset
Fine-tuning
Results
Additional Resources
License
Contact

Overview

The EssAI project utilizes a Large Language Model (LLM) to detect AI-generated essays. By fine-tuning the model on a massive dataset, it helps users establish the authenticity of written content with confidence.

Features

Detects AI-generated essays with very high accuracy (over 95%).
Fine-tuned on a massive dataset combining approximately 500K human-written and AI-generated essays.

Files

Throughout the EssAI project, several essential scripts are provided:

requirements.txt: Lists all the required Python packages.
essai_user_input.py: Handles user inputs to check essays.
training.py: Manages the model training process.
testing.py: Evaluates model performance and metrics.
data_insights.py: Generates insights and visualizations from the dataset.

Installation

To set up the EssAI project, you will need to install necessary dependencies. Follow these steps:

git clone https://github.com/diegovelilla/EssAI
cd EssAI
pip install -r requirements.txt

Usage

Using the model is straightforward. Run the essai_user_input.py file and input your essays as follows:

input_list = [ WRITE HERE YOUR FIRST ESSAY, WRITE HERE YOUR SECOND ESSAY ]

Keep in mind that the model has been trained with essays around 350-400 words long. For additional insights into the dataset, refer to the data_insights notebook.

Model Details

The backbone of the EssAI project is the BERT base model. Think of BERT as a deeply knowledgeable librarian who understands the nuances of language. This model was pre-trained on vast amounts of written text to comprehend context, making it an excellent choice for identifying generated content.

Dataset

The dataset used includes approximately 500K essays sourced from Kaggle, containing about 60% human-written and 40% AI-generated content. You can explore this dataset here. For more details, check out the data_insights, training, and testing notebooks.

Fine-tuning

To optimize resource usage, only 1% of the dataset was employed for training. This equates to 4,000 essays for training and 1,000 essays for testing. It’s highly encouraged for users to train the model further using larger datasets with the training notebook.

Results

Initial results were promising; the model achieved an impressive 98% accuracy over 1,000 essays tested. Subsequent tests with a larger sample of 20,000 essays showed accuracy remaining high at 97%. More evaluations can be conducted using the testing notebook.

Additional Resources

Here are some additional resources that might be beneficial:

Tutorials and Documentation: Hugging Face NLP Course
Articles and Papers: BERT: Pre-training of Deep Bidirectional Transformers
Tools and Libraries: Kaggle Datasets
YouTube channels: Andrej Karpathy

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for more details.

Contact

For any queries or feedback, feel free to reach out via:

Troubleshooting

If you experience any issues during installation or usage, here are some troubleshooting tips:

Ensure you have Python installed and the correct version of pip.
Double-check your essay input format in the essai_user_input.py file.
For version compatibility, ensure that all libraries listed in requirements.txt are installed properly.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox