Graph Convolutional Network for Bible Book Classification

Jul 20, 2021 | Data Science

The text-based Graph Convolutional Network (GCN) is a state-of-the-art semi-supervised learning model that’s poised to make accurate predictions on unknown textual data. Just as a librarian organizes books by subject, GCN organizes information in a manner that allows for swift retrieval of knowledge. This model employs a clever technique where it combines documents and words into a structured graph, paving the way for precise text classification. In this article, we will explore how to implement this methodology using the Holy Bible as our corpus for classifying its various books.

Overview

The primary goal of using a GCN with the Bible is to classify which book an unlabelled chapter belongs to, drawing on relationships evidenced in other labelled chapters. The Bible text represents a unique challenge, featuring a total of 66 Books and 1189 Chapters, each rich with context and meaning. As we mask 10-20% of the chapter labels for testing, it’s essential for our GCN to effectively distinguish the unique contexts associated with each book. For example, the Book of Genesis heavily references Adam and Eve, whereas Ecclesiastes is centered around King Solomon’s life.

By embedding chapters and words into a graph, with edges weighted based on their relationships (like term frequency-inverse document frequency or tf-idf), a GCN model can accurately infer the missing labels of unlabelled chapters using the knowledge acquired from labelled ones.

Dataset

The Bible data utilized for this project (specifically the BBE version) is accessible through GitHub.

Implementation

The implementation adheres closely to the conceptual framework laid out in the paper on the Text-based Graph Convolutional Network, which can be found at arxiv. For a more detailed look at the scripts and implementations, refer to this informative article on Towards Data Science.

Requirements

  • Python (3.6+)
  • networkx (2.1)
  • torch (1.0.0)
  • torchvision (0.2.1)
  • Standard Python Libraries

Contents of the Implementation

The following scripts and data files are included in the implementation:

  1. generate_train_test_datasets.py – Computes edge weights, builds and saves the graph.
  2. models.py – Contains the GCN model.
  3. text_GCN.py – The main program that builds the dataset and graph, constructs the GCN, and trains the model.
  4. evaluate_results.py – Evaluates predictions and misclassifications.
  5. Data Folder – Contains the Bible data file (t_bbe.csv).

How to Use

To utilize the GCN model for classifying Bible chapters, follow these steps:

  1. Clone the repository to your local machine.
  2. Run the text_GCN.py script. For additional arguments, use the command -h.

Troubleshooting

If you encounter any issues during implementation, consider these troubleshooting tips:

  • Environment Issues: Ensure all dependencies are properly installed and compatible with your version of Python.
  • Data Loading Problems: Verify that the dataset is correctly formatted and accessible in the specified directory.
  • Model Training Failures: Check if there are any errors in your hyperparameters or data preprocessing steps.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions.
Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox