Welcome to the world of Doccano Transformer, a tool designed to simplify the process of transforming your exported datasets into formats that are compatible with your favorite machine learning libraries. Whether you’re diving into Named Entity Recognition or simply need to change the dataset format, this guide covers everything you need to know.
What is Doccano Transformer?
Doccano Transformer is a Python package that allows users to convert datasets from Doccano into multiple formats, such as CoNLL 2003 and spaCy. It’s like a universal adapter for your machine learning datasets; allowing you to switch between the different formats seamlessly.
Supported Formats
Doccano Transformer supports the following formats:
- CoNLL 2003
- spaCy
Installing Doccano Transformer
Getting started with Doccano Transformer is incredibly straightforward. You can install it using pip by running the following command:
pip install doccano-transformer
Examples of Use
Let’s dive into an example to understand how to use Doccano Transformer for Named Entity Recognition. Think of it like a chef preparing ingredients for a dish. You’ll need to gather your dataset, then transform it to suit your recipe (model). Here’s how you can do that:
python
from doccano_transformer.datasets import NERDataset
from doccano_transformer.utils import read_jsonl
# Load the dataset
dataset = read_jsonl(filepath='example.jsonl', dataset=NERDataset, encoding='utf-8')
# Transform to CoNLL 2003 format
dataset.to_conll2003(tokenizer=str.split)
# Transform to spaCy format
dataset.to_spacy(tokenizer=str.split)
In the code above, you first import the necessary classes and methods, then load your dataset. It’s similar to gathering your ingredients in cooking. Once you have everything ready, you can convert your dataset into the desired format, as if you were preparing it for the final dish.
Troubleshooting Ideas
If you encounter issues while using Doccano Transformer, here are some common troubleshooting ideas:
- Ensure that your input JSONL file is properly formatted.
- Check for any spelling mistakes in the format names or dataset class names.
- Make sure you have the proper version of Python installed that is compatible with Doccano Transformer.
- If you experience performance issues, consider optimizing your dataset or breaking it into smaller chunks.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Contributing to Doccano Transformer
Your contributions are always welcome! If you have ideas or improvements for Doccano Transformer, please check out the Contributing to Doccano Transformer guide for guidelines on how to proceed.
License
Doccano Transformer is licensed under the MIT license.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

