Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Jul 20, 2024 | Educational

Welcome to our guide on how to set up and use the symbolic music generator powered by non-differentiable rule guided diffusion models. This project draws inspiration from stochastic control and opens new avenues in music generation. If you’re eager to explore symbolic music creation, follow the steps below!

1. Setting Up the Environment

To get started, you will need to configure your environment properly. This is akin to preparing a canvas before painting a masterpiece.

First, place the pretrained VAE checkpoint in the directory: taming-transformers/checkpoints.
Next, create a conda virtual environment using the following command:

conda env create -f environment.yml

Activate the virtual environment with:

conda activate guided

2. Downloading Pretrained Checkpoints

Now, you need to download the necessary pretrained checkpoints. Think of this as gathering your materials before you start crafting something beautiful.

Place the pretrained VAE checkpoint under: trained_models/VAE (taming-transformers/checkpoints/all_onset/epoch_14.ckpt).
Put the pretrained diffusion model checkpoint in: trained_models/diffusion (loggings/checkpoints/ema_0.9999_1200000.pt).
Finally, add the pretrained classifiers for each rule to: trained_models/classifier (loggings/classifier/).

3. Rule Guided Generation

Our next stop is the rule-guided generation process. This can be visualized as composing music by following a specific set of rules – just like a conductor leads an orchestra.

All configurations related to rule guidance are stored in the scripts/configs/ directory. Here’s how you can proceed:

To guide diffusion models on multiple rules simultaneously, use the config file: scripts/configs/cond_table/all/scg_classifier_all.yml.
The results of this operation will be saved in: loggings/cond_table/all/scg_classifier_all.

4. Running the Rule-Guided Sampling Code

Now we run the sampling script. Imagine this as telling the orchestra to play your composed piece:

python sample_rule.py
    --config_path 
    --batch_size 4
    --num_samples 20
    --data_dir 
    --model DiTRotary_XL_8
    --model_path loggings/checkpoints/ema_0.9999_1200000.pt
    --image_size 128 16
    --in_channels 4
    --scale_factor 1.2465
    --class_cond True
    --num_classes 3
    --class_label 1

Understanding the Hyper-parameters:

config_path: Path to your configuration file.
batch_size: Number of samples produced per batch.
num_samples: Total samples you wish to generate.
data_dir: Directory for data storage.
model: Specifies the backbone model.
model_path: Path to the pretrained model.
image_size: Dimensions of the generated piano roll.
in_channels: Channels used in the latent space.
scale_factor: Standard deviation calculation factor.
class_cond: Condition on music genre.
num_classes: Number of classes for music.
class_label: Specifies the class label for desired music genre.

5. Troubleshooting

If you encounter any issues during setup or operation, consider the following troubleshooting tips:

Ensure all checkpoints are in their specified directories.
Double-check the correctness of your config paths.
If you receive an error regarding batch size, adjust your batch_size and num_samples parameters accordingly.
Make sure your virtual environment is activated correctly.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

6. Training the Diffusion Model

Lastly, to train a diffusion model for symbolic music generation, follow this script:

mpiexec -n 8 python scripts/train_dit.py
    --dir 
    --data_dir 
    --model DiTRotary_XL_8
    --image_size 128 16
    --in_channels 4
    --batch_size 32
    --encode_rep 4
    --shift_size 4
    --pr_image_size 2560
    --microbatch_encode -1
    --class_cond True
    --num_classes 3
    --scale_factor 
    --fs 100
    --save_interval 10000
    --resume

This extensive command requires further understanding of hyper-parameters for ultimate success!

Conclusion

By following these steps, you can successfully implement and generate symbolic music using non-differentiable rule guided diffusion. This innovative approach promises a fantastic creative journey ahead!

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox