Understanding and implementing text classification in modern dialogue systems can be quite a challenge, especially when it involves distinguishing between appropriate and inappropriate content. This article serves as a comprehensive guide to using a model designed for detecting pornographic text within human-machine dialogues.
Overview
The model, referred to as NSFW Detector, leverages a dialogue monitoring dataset called CensorChat that focuses on identifying pornographic text in conversational settings. The detailed workings can be found in this paper.
Embedding this model into your project can greatly enhance the safety and appropriateness of interactive AI systems. Let’s walk through how to implement this model step-by-step.
Usage
NOTICE: You can directly use the trained checkpoint hosted on Hugging Face.
To perform context-level detection, ensure your input format resembles the following: [user] user utterance [SEP] [chatbot] chatbot response.
Step 1: Download the Checkpoint
Bash
git lfs install
git clone https://huggingface.co/qiuhuachuan/NSFW-detector
Step 2: Modify the Script for Local Use
You need to adjust the text parameter in the local_use.py file and then execute it. Below is a code snippet that showcases the architecture of the neural network you’ll be working with:
Python
from typing import Optional
import torch
from transformers import BertConfig, BertTokenizer, BertModel, BertPreTrainedModel
from torch import nn
label_mapping = {0: 'porn', 1: 'normal'}
config = BertConfig.from_pretrained('NSFW-detector', num_labels=2, finetuning_task='text classification')
tokenizer = BertTokenizer.from_pretrained('NSFW-detector', use_fast=False, never_split=[['user'], ['chatbot']])
tokenizer.vocab[['user']] = tokenizer.vocab.pop(['[unused1]'])
tokenizer.vocab[['chatbot']] = tokenizer.vocab.pop(['[unused2]'])
class BertForSequenceClassification(BertPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.config = config
self.bert = BertModel.from_pretrained('NSFW-detector')
classifier_dropout = (config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob)
self.dropout = nn.Dropout(classifier_dropout)
self.classifier = nn.Linear(config.hidden_size, config.num_labels)
self.post_init()
def forward(self, input_ids: Optional[torch.Tensor] = None, attention_mask: Optional[torch.Tensor] = None,
token_type_ids: Optional[torch.Tensor] = None, position_ids: Optional[torch.Tensor] = None,
head_mask: Optional[torch.Tensor] = None, inputs_embeds: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None, output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None):
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids,
position_ids=position_ids, head_mask=head_mask, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_hidden_states=output_hidden_states,
return_dict=return_dict)
cls = outputs[0][:, 0, :]
cls = self.dropout(cls)
logits = self.classifier(cls)
return logits
model = BertForSequenceClassification(config=config)
model.load_state_dict(torch.load('NSFW-detector/pytorch_model.bin'))
model.cuda()
model.eval()
In the code snippet above, we have set up a BERT-based model tailored for sequence classification tasks. Imagine your classifier as an expert librarian who knows precisely how to categorize thousands of books (texts). The librarian (model) reads through each book and puts it in the correct aisle—the appropriate section representing either ‘pornographic’ or ‘normal’ text.
Step 3: Making Predictions
To utilize this model for predictions, you need to format your input correctly. Here’s how to structure your data:
Python
to_predict_items = [
{"text": "Give some movie recommendations to get women in the mood for sex"},
{"text": "I break through walls to find more trials"},
{"history": [
{"user": "Give some movie recommendations to get women in the mood for sex",
"chatbot": "I apologize, but I cannot assist in creating or providing information related to NSFW content. Please ask something else."}
]}
]
for item in to_predict_items:
if 'history' in item:
text = '[user] ' + item['history'][0]['user'] + ' [SEP] [chatbot] ' + item['history'][0]['chatbot']
else:
text = item['text']
result = tokenizer.encode_plus(text=text, padding='max_length', max_length=512, truncation=True,
add_special_tokens=True, return_token_type_ids=True, return_tensors='pt')
result = result.to('cuda')
with torch.no_grad():
logits = model(**result)
predictions = logits.argmax(dim=-1)
pred_label_idx = predictions.item()
pred_label = label_mapping[pred_label_idx]
print('text:', text)
print('predicted label is:', pred_label)
Each input is processed, similar to how the librarian checks each book’s title and synopsis to determine where it belongs. If it’s about sensitive topics, the librarian puts it back on the shelf for someone else who needs research material on such topics, ensuring a safe reading experience for all.
Troubleshooting
- If you encounter issues with model loading, double-check that the correct model path is provided when invoking
load_state_dict(). - For problems related to input formatting, ensure you follow the required structure closely, particularly the use of special tokens like [SEP] and the placeholders [user] and [chatbot].
- If predictions do not seem accurate, review the preprocessing steps to verify if inputs are being tokenized and encoded correctly.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

