How to Use Genji-JP 6B: A Causal Language Model for Japanese Storytelling

Aug 9, 2022 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_2_494

Are you fascinated by the world of Japanese storytelling? Do you wish to generate engaging narratives using state-of-the-art AI? Look no further! This guide will walk you through the steps to utilize the Genji-JP 6B model—a fine-tuned version of EleutherAI’s GPT-J 6B—designed for generating Japanese web novels.

Model Description

Genji-JP 6B boasts impressive specifications:

Parameters: 6,053,381,344
Layers: 28
Model Dimension: 4,096
Feedforward Dimension: 16,384
Heads: 16
Context Length: 2,048
Vocabulary: 50,400
Position Encoding: Rotary position encodings (RoPE)

To give you a clearer picture, think of this model as a grand library with 28 floors (layers), housing over 6 billion individual stories (parameters). Each floor is divided into different sections (heads), making it uniquely versatile for various genres of storytelling.

Training Data

The Genji-JP 6B model underwent pre-training on the Pile, a curated dataset created by EleutherAI, aiming to sharpen its language abilities. Post pre-training, the model was fine-tuned using a dedicated Japanese storytelling dataset for added finesse.

How to Use the Genji-JP 6B Model

Follow these steps to effectively utilize the Genji-JP 6B model for generating Japanese text:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained("NovelAI/genji-jp", torch_dtype=torch.float16, low_cpu_mem_usage=True).eval().cuda()

text = "Your prompt here"
tokens = tokenizer(text, return_tensors="pt").input_ids

generated_tokens = model.generate(tokens.long().cuda(), use_cache=True,
                                   do_sample=True, temperature=1, top_p=0.9,
                                   repetition_penalty=1.125, min_length=1,
                                   max_length=len(tokens[0]) + 400,
                                   pad_token_id=tokenizer.eos_token_id)

last_tokens = generated_tokens[0]
generated_text = tokenizer.decode(last_tokens).replace("


				
				
				
				
				

    
        Stay Informed with the Newest F(x) Insights and Blogs
    
    
        Tech News and Blog Highlights, Straight to Your Inbox


				
				
				
				
				
				
				
				
				
				
				
				
				
			
				
				
				
				
				Let’s Build Success Together
				
				
				
					
						
				
				
				
				
				Name
				
			

				
				
				
				
				Company Name 
				
			

				
				
				
				
				Summarize Needs
				
			

				
				
				
				
				Email