How to Use the Quality Classifier DeBERTa

Aug 6, 2024 | Educational

Are you looking to automate the process of document quality assessment? The Quality Classifier DeBERTa model might just be the tool you need! This powerful text classification model categorizes documents into three quality classes: “High”, “Medium”, and “Low”. In this article, we’ll guide you through how to use this model, including setup, configuration, and troubleshooting tips.

Understanding the Model

The Quality Classifier model utilizes the DeBERTa V3 Base architecture. It was trained on a dataset consisting of 22,828 text samples sourced from Common Crawl, labeled by human annotators based on multiple quality factors. The model not only helps in qualitative data annotation but also enables the creation of quality-specific blends and the addition of metadata tags.

How to Implement the Model

Implementing the Quality Classifier DeBERTa is straightforward, especially if you’re using it within the NVIDIA NeMo Curator or Transformers library. Below, I’ll walk you through both methods.

1. Using in NeMo Curator

To get started with NeMo Curator, follow these steps:

Download the model from the Hugging Face Model Hub.
Access the inference code on the NeMo Curator’s GitHub repository.
Check out this example notebook for guidance.

2. Using in Transformers

If you’re familiar with PyTorch and Transformers, implementing the model can be done with the following code:

import torch
from torch import nn
from transformers import AutoModel, AutoTokenizer, AutoConfig
from huggingface_hub import PyTorchModelHubMixin

class QualityModel(nn.Module, PyTorchModelHubMixin):
    def __init__(self, config):
        super(QualityModel, self).__init__()
        self.model = AutoModel.from_pretrained(config["base_model"])
        self.dropout = nn.Dropout(config["fc_dropout"])
        self.fc = nn.Linear(self.model.config.hidden_size, len(config["id2label"]))
        
    def forward(self, input_ids, attention_mask):
        features = self.model(
            input_ids=input_ids, attention_mask=attention_mask
        ).last_hidden_state
        dropped = self.dropout(features)
        outputs = self.fc(dropped)
        return torch.softmax(outputs[:, 0, :], dim=1)

device = "cuda" if torch.cuda.is_available() else "cpu"

config = AutoConfig.from_pretrained("nvidia/quality-classifier-deberta")
tokenizer = AutoTokenizer.from_pretrained("nvidia/quality-classifier-deberta")
model = QualityModel.from_pretrained("nvidia/quality-classifier-deberta").to(device)
model.eval()

text_samples = [".?@fdsa Low quality text.", "This sentence is ok."]
inputs = tokenizer(
    text_samples, return_tensors="pt", padding="longest", truncation=True
).to(device)

outputs = model(inputs["input_ids"], inputs["attention_mask"])
predicted_classes = torch.argmax(outputs, dim=1)
predicted_domains = [
    config.id2label[class_idx.item()] for class_idx in predicted_classes.cpu().numpy()
]
print(predicted_domains)  # ['Low', 'Medium']

Analogy Time!

Think of the Quality Classifier model as a refined restaurant critic. Just like a critic evaluates multiple aspects of a dish—taste, presentation, and originality—the model evaluates various quality factors of the text—accuracy, clarity, coherence, grammar, depth, and overall usefulness. After tasting a selection of dishes (reading the text), the critic assigns a rating of “High”, “Medium”, or “Low”. This allows diners to make informed choices based on the critiques, similar to how this model helps classify text.

Troubleshooting Tips

If you run into trouble while setting up or using the Quality Classifier model, here are some troubleshooting tips:

Data Formatting: Ensure that your input data is in the correct format—text paragraphs. The model may not function correctly if the format is off.
CUDA Issues: If you encounter errors regarding CUDA, confirm that your device has CUDA installed and properly set up.
Model Not Loading: Double-check your paths and ensure that the model and tokenizer are correctly specified.

For additional resources and support, stay connected with fxis.ai.

Evaluation and Limitations

The model has been evaluated on a sample set of 7,128 documents, achieving an accuracy score of 0.8252. However, it’s essential to note the inherent subjectivity in quality assessments, which may vary among different annotators.

Conclusion

With the Quality Classifier DeBERTa, quality assessment of documents is now at your fingertips. By implementing this model, you can refine your text analysis processes and enhance the quality of your data pipelines.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox