How to Use MusicGen for Text-to-Music Generation

Mar 8, 2024 | Educational

Are you excited to create your own unique music using text-based prompts? In this guide, we will walk you through how to use MusicGen—a state-of-the-art model designed for generating music from text descriptions. You’ll learn how to install the necessary libraries, run the model locally, and troubleshoot any issues along the way. Let’s compose some tunes!

What is MusicGen?

MusicGen is a powerful text-to-music model developed by the FAIR team of Meta AI. It allows users to generate high-quality music samples based on textual prompts or audio descriptions. Think of it as a talented musician who can create fresh tracks just by listening to your ideas. With its ability to produce stereophonic sound, MusicGen can truly bring music to life!

Getting Started with MusicGen

To start generating music using MusicGen, you need to set up the necessary libraries and run the model. Here’s how you can do that:

1. Install Required Libraries

First, you need to install the Transformers library and SciPy. Open your terminal/command prompt and run:

pip install --upgrade pip

pip install --upgrade git+https://github.com/huggingface/transformers.git scipy

2. Run the Model for Inference

Once you’ve installed the necessary libraries, it’s time to run the MusicGen for generating music:

Follow these steps to process your text descriptions:

Use the Text-to-Audio (TTA) pipeline:

python
import torch
import soundfile as sf
from transformers import pipeline

synthesiser = pipeline("text-to-audio", "facebook/musicgen-stereo-medium", device="cuda:0", torch_dtype=torch.float16)
music = synthesiser("lo-fi music with a soothing melody", forward_params={"max_new_tokens": 256})
sf.write("musicgen_out.wav", music["audio"][0].T, music["sampling_rate"])

Or run the model using more control:

python
from transformers import AutoProcessor, MusicgenForConditionalGeneration

processor = AutoProcessor.from_pretrained("facebook/musicgen-stereo-medium")
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-stereo-medium").to("cuda")

inputs = processor(
    text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
    padding=True,
    return_tensors="pt",
).to("cuda")

audio_values = model.generate(**inputs, max_new_tokens=256)

3. Save or Listen to Your Music

After generating the audio, you can either save it as a .wav file or listen to it directly in your notebook:

python
from IPython.display import Audio

sampling_rate = model.config.audio_encoder.sampling_rate
Audio(audio_values[0].cpu().numpy(), rate=sampling_rate)

Alternatively, save it like so:

python
import soundfile as sf

audio_values = audio_values.cpu().numpy()
sf.write("musicgen_out.wav", audio_values[0].T, sampling_rate)

Understanding MusicGen Internals

Imagine MusicGen as a chef in a culinary kitchen. Instead of traditional ingredients, it utilizes text prompts to whip up delightful musical recipes. The model takes in various “ingredients” of sound (the text you provide) and crafts them into a delicious piece of music. Just like a chef needs to know which ingredients work best together, you might need to experiment with different prompts to achieve the desired sound. The model performs magic by processing data streams much like a synchronized orchestra, where every musician (token) contributes harmoniously to create a melodious output.

Troubleshooting Ideas

If you encounter any issues while running MusicGen, here are some troubleshooting tips:

Ensure that you have the necessary libraries installed correctly.
Check if your CUDA driver is properly configured, especially if you’re using a GPU for inference.
If your generated audio is silent or broken, review the text prompts for clarity or accuracy.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With MusicGen, the possibilities of music creation are limitless! Whether you are an aspiring musician or a tech enthusiast dabbling in AI, this model provides an exciting avenue to explore. Remember, experimenting with different text prompts is key to finding your unique sound. Enjoy your musical journey!

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox