Tails Tell Tales: Chapter-Wide Manga Transcriptions With Character Names

Aug 18, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_6_255

Welcome to an exciting journey where we blend the realms of technology and storytelling to bring character names to life in manga! This tutorial will guide you on how to use advanced techniques, including Object Detection, Optical Character Recognition (OCR), Clustering, and Diarisation to achieve chapter-wide manga transcriptions.

Getting Started

To begin your adventure in manga transcriptions, ensure you have the necessary Python libraries installed. Here’s a list of what you’ll need:

PIL – for image manipulation
NumPy – for working with arrays
Transformers – for leveraging advanced models
Torch – for deep learning computations

Setting Up Your Environment

First, install the required libraries by using the following pip command:

pip install Pillow numpy transformers torch

Understanding the Code: An Analogy

Imagine you are a director preparing a movie. You have a script (the input images), you need to identify characters and their quotes (named entities), and then make the story flow seamlessly (timestamps for dialogues).

The code provided in the README is like a well-organized crew working behind the scenes:

read_image: Your script department that reads and prepares the script (images) for shooting.
model: The director who guides the actors (character models) on how to act (interpret images).
per_page_results: The filming crew that captures each scene (individual image results) and processes them.
transcript: The editing team that assembles the final cut of the movie (the final dialogue list).

Code Implementation

Below is how you can implement the solution:

python
from PIL import Image
import numpy as np
from transformers import AutoModel
import torch

model = AutoModel.from_pretrained('ragavsachdeva/magiv2', trust_remote_code=True).cuda().eval()

def read_image(path_to_image):
    with open(path_to_image, 'rb') as file:
        image = Image.open(file).convert('L').convert('RGB')
        image = np.array(image)
    return image

chapter_pages = ['page1.png', 'page2.png', 'page3.png']
character_bank = {
    'images': ['char1.png', 'char2.png', 'char3.png'],
    'names': ['Luffy', 'Sanji', 'Zoro'],
}

chapter_pages = [read_image(x) for x in chapter_pages]
character_bank['images'] = [read_image(x) for x in character_bank['images']]

with torch.no_grad():
    per_page_results = model.do_chapter_wide_prediction(chapter_pages, character_bank, use_tqdm=True, do_ocr=True)

transcript = []
for i, (image, page_result) in enumerate(zip(chapter_pages, per_page_results)):
    model.visualise_single_image_prediction(image, page_result, f'page_{i}.png')
    speaker_name = {text_idx: page_result['character_names'][char_idx] for text_idx, char_idx in page_result['text_character_associations']}
    for j in range(len(page_result['ocr'])):
        if not page_result['is_essential_text'][j]:
            continue
        name = speaker_name.get(j, 'unsure')
        transcript.append(f'{name}: {page_result['ocr'][j]}')

with open('transcript.txt', 'w') as fh:
    for line in transcript:
        fh.write(line + '\n')

Troubleshooting Common Issues

If you encounter any issues while implementing the code, here are some troubleshooting tips:

If the model fails to load, ensure that you have the correct model name and internet connection.
Check that your image paths are correct and that the images exist in the specified directory.
Make sure your system has GPU capabilities, as the model requires CUDA for optimal performance.
Refer to the console output for any error logs that might guide you towards solving the problem.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Embrace the magic of technology and storytelling with chapter-wide manga transcriptions. With the model and code provided, you can bring your characters and their stories to life, enhancing your reading experience.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox