How to Use Genji-JP 6B: A Causal Language Model for Japanese Storytelling

Aug 9, 2022 | Educational

Are you fascinated by the world of Japanese storytelling? Do you wish to generate engaging narratives using state-of-the-art AI? Look no further! This guide will walk you through the steps to utilize the Genji-JP 6B model—a fine-tuned version of EleutherAI’s GPT-J 6B—designed for generating Japanese web novels.

Model Description

Genji-JP 6B boasts impressive specifications:

  • Parameters: 6,053,381,344
  • Layers: 28
  • Model Dimension: 4,096
  • Feedforward Dimension: 16,384
  • Heads: 16
  • Context Length: 2,048
  • Vocabulary: 50,400
  • Position Encoding: Rotary position encodings (RoPE)

To give you a clearer picture, think of this model as a grand library with 28 floors (layers), housing over 6 billion individual stories (parameters). Each floor is divided into different sections (heads), making it uniquely versatile for various genres of storytelling.

Training Data

The Genji-JP 6B model underwent pre-training on the Pile, a curated dataset created by EleutherAI, aiming to sharpen its language abilities. Post pre-training, the model was fine-tuned using a dedicated Japanese storytelling dataset for added finesse.

How to Use the Genji-JP 6B Model

Follow these steps to effectively utilize the Genji-JP 6B model for generating Japanese text:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained("NovelAI/genji-jp", torch_dtype=torch.float16, low_cpu_mem_usage=True).eval().cuda()

text = "Your prompt here"
tokens = tokenizer(text, return_tensors="pt").input_ids

generated_tokens = model.generate(tokens.long().cuda(), use_cache=True,
                                   do_sample=True, temperature=1, top_p=0.9,
                                   repetition_penalty=1.125, min_length=1,
                                   max_length=len(tokens[0]) + 400,
                                   pad_token_id=tokenizer.eos_token_id)

last_tokens = generated_tokens[0]
generated_text = tokenizer.decode(last_tokens).replace("
			

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox