Do you ever find yourself longing to hear your favorite pop songs played in a beautiful piano cover? With the advent of machine learning, you can now generate piano covers directly from pop music audio using the innovative Pop2Piano model! In this guide, we will walk you through the usage of Pop2Piano and troubleshoot common issues.
What is Pop2Piano?
Pop2Piano is a Transformer-based model designed to create piano covers from audio waveforms of pop music. Inspired by the research paper titled “Pop2Piano: Pop Audio-based Piano Cover Generation” by Jongho Choi and Kyogu Lee, this model eliminates the need for separate melody and chord extraction modules, thus enabling direct generation of piano covers!
Model Details
Working similarly to an interpreter, Pop2Piano listens to the “language” of pop music in its audio waveform format. The model architecture is based on the T5 framework, processing audio into latent representations, and then generating corresponding MIDI token IDs for piano covers. Each token represents different musical properties, such as time, velocity, and note.
How to Use Pop2Piano
Follow these steps to start generating piano covers:
Installation
First, ensure that you’ve set up your environment with the necessary libraries. You can install them using the following commands:
- Install the 🤗 Transformers library:
pip install git+https://github.com/huggingface/transformers.git
pip install pretty-midi==0.2.9 essentia==2.1b6.dev1034 librosa scipy
Note that you may need to restart your runtime after installation.
Generating Piano Covers with Your Own Audio
Here’s a code example that demonstrates how to generate piano covers with your audio files:
python
import librosa
from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor
# Load your audio file
audio, sr = librosa.load(your_audio_file_here, sr=44100)
# Initialize and load the model and processor
model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")
# Process the audio and generate output
inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt")
model_output = model.generate(input_features=inputs["input_features"], composer="composer1")
# Decode the output to MIDI
tokenizer_output = processor.batch_decode(model_output, feature_extractor_output=inputs)
tokenizer_output[0].write("outputs/midi_output.mid")
Generating Piano Covers with Audio from Hugging Face Hub
If you want to generate piano covers using pre-existing audio datasets, you can do that as well:
python
from datasets import load_dataset
from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor
# Load the model and processor
model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")
# Load dataset
ds = load_dataset("sweetcocoa/pop2piano_ci", split="test")
# Process the audio and generate output
inputs = processor(
audio=ds["audio"][0]["array"],
sampling_rate=ds["audio"][0]["sampling_rate"],
return_tensors="pt"
)
model_output = model.generate(input_features=inputs["input_features"], composer="composer1")
# Decode the output to MIDI
tokenizer_output = processor.batch_decode(model_output, feature_extractor_output=inputs)
tokenizer_output[0].write("outputs/midi_output.mid")
Examples of Generated MIDI
Check out the example MIDI generated from actual pop music and compare it with the original audio.
- Actual Pop Music:
- Generated MIDI:
Troubleshooting
Here are a few common issues you might face while using Pop2Piano and how to resolve them:
- Library Installation Issues: Ensure you have the latest version of each library and that you run the installation command with the correct permissions. If you encounter issues, try restarting your runtime.
- Audio Format Errors: Make sure the audio file you are using is compatible and not corrupted. Try converting it to a standard format like WAV or MP3 if you encounter this error.
- Model Output Errors: If your generated outputs do not match expectations, try adjusting the sampling rate and exploring different composer options for variation.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

