How to Generate Videos Using Pyramid Flow

Oct 29, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesrain1011_pyramid-flow-sd3

The world of video generation has seen numerous innovations, and one standout is the Pyramid Flow model. This method allows you to produce high-quality, autoregressive video content using open-source datasets. In this guide, we will walk you through the steps to harness the power of Pyramid Flow for generating videos, whether from text prompts or images.

What is Pyramid Flow?

Pyramid Flow is an efficient method for autoregressive video generation based on flow matching. It can create stunning videos up to 10 seconds long at 768p resolution and 24 frames per second (FPS). Think of Pyramid Flow as a talented director in the world of AI, transforming scripts (text prompts) and visual aids (images) into captivating films.

Getting Started with Pyramid Flow

Step 1: Download the Model

To kick off your video generation journey, you’ll first need to download the necessary model:

from huggingface_hub import snapshot_download

model_path = "PATH"   # The local directory to save downloaded checkpoint
snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_use_symlinks=False, repo_type="model")

Think of this step like downloading a script for a movie. You need that script to begin filming!

Step 2: Load the Model

Next, you’ll need to load the downloaded model:

import torch
from PIL import Image
from pyramid_dit import PyramidDiTForVideoGeneration
from diffusers.utils import load_image, export_to_video

torch.cuda.set_device(0)
model_dtype, torch_dtype = bf16, torch.bfloat16   # Choose your model's data type

model = PyramidDiTForVideoGeneration(
    "PATH",                                         # The downloaded checkpoint dir
    model_dtype,
    model_variant="diffusion_transformer_768p",    # alternative: diffusion_transformer_384p
)
model.vae.to("cuda")
model.dit.to("cuda")
model.text_encoder.to("cuda")
model.vae.enable_tiling()

This step is comparable to a filmmaker assembling their cast and crew to bring the script to life.

Step 3: Generate Videos

Now that your model is set up, it’s time to create videos! You can generate videos from text prompts:

prompt = "A movie trailer featuring the adventures of the 30-year-old spaceman wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors"

with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
    frames = model.generate(
        prompt=prompt,
        num_inference_steps=[20, 20, 20],
        video_num_inference_steps=[10, 10, 10],
        height=768,             
        width=1280,
        temp=16,
        guidance_scale=9.0,         
        video_guidance_scale=5.0,   
        output_type="pil",
    )

export_to_video(frames, "text_to_video_sample.mp4", fps=24)

In this step, you’re directing your film. Just provide the premise, and watch as the magic unfolds!

Image to Video Generation

Pyramid Flow can also generate videos from images:

image = Image.open("assets/the_great_wall.jpg").convert("RGB").resize((1280, 768))
prompt = "FPV flying over the Great Wall"

with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
    frames = model.generate_i2v(
        prompt=prompt,
        input_image=image,
        num_inference_steps=[10, 10, 10],
        temp=16,
        video_guidance_scale=4.0,
        output_type="pil",
    )

export_to_video(frames, "image_to_video_sample.mp4", fps=24)

This is similar to using a storyboard from which you can create a video. You initiate the scene with an image!

Troubleshooting Tips

Model Loading Issues: Ensure that the correct path to the model checkpoint is provided. Double-check your directory structure.
CUDA Errors: Make sure your GPU is correctly configured and that you’re using compatible pytorch versions with GPU capabilities.
Video Quality Problems: Tweak the guidance_scale and video_guidance_scale parameters; higher values usually yield better quality.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Notes

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox