Welcome to the world of sound compression! Today, we’ll explore how to use SNAC (Multi-Scale Neural Audio Codec), an innovative tool that compresses audio into discrete codes at remarkably low bitrates. Whether you’re into music or sound effects (SFX) generation, SNAC has got you covered. So let’s dive in!
Overview of SNAC
SNAC efficiently encodes audio into hierarchical tokens, a method reminiscent of other audio codecs like SoundStream and EnCodec. However, SNAC takes a creative twist: it samples coarse tokens less frequently, allowing it to cover a broader time span.
To put it simply, think of SNAC as a sophisticated “translator” that converts your musical compositions into a compact language of codes, making them easier to transmit and store. This model compresses 44 kHz audio at a low bitrate of just 2.6 kbps while using four different Resolution Vector Quantization (RVQ) levels at various token rates (14, 29, 57, and 115 Hz).
Pretrained Models
Currently, SNAC only supports a single audio channel (mono). Below is a list of the pretrained models available:
- hubertsiuzdaksnac_24khz – 0.98 kbps, 24 kHz, 19.8M Parameters, 🗣️ Speech
- hubertsiuzdaksnac_32khz – 1.9 kbps, 32 kHz, 54.5M Parameters, 🎸 Music, Sound Effects
- hubertsiuzdaksnac_44khz (this model) – 2.6 kbps, 44 kHz, 54.5M Parameters, 🎸 Music, Sound Effects
How to Install and Use SNAC
Follow these steps to get started with SNAC:
Installation
First, you need to install the SNAC library. Open your terminal and execute the following command:
pip install snac
Encoding and Decoding Audio
Once installed, you can begin encoding and decoding audio with SNAC. Below is a sample code to help you get started:
import torch
from snac import SNAC
model = SNAC.from_pretrained('hubertsiuzdaksnac_44khz').eval().cuda()
audio = torch.randn(1, 1, 44100).cuda() # B, 1, T
with torch.inference_mode():
codes = model.encode(audio)
audio_hat = model.decode(codes)
Alternatively, you can encode and reconstruct audio audio in one seamless operation:
with torch.inference_mode():
audio_hat, codes = model(audio)
Note that the output, codes, consists of a list of token sequences of variable lengths, each related to different temporal resolutions. For example, you might see shapes like:
[code.shape[1] for code in codes] # [16, 32, 64, 128]
Troubleshooting Tips
If you run into issues while using SNAC, here are some common troubleshooting steps:
- Installation Issues: Make sure you have the latest version of
pip. You can upgrade it usingpip install --upgrade pip. - CUDA Errors: If your model isn’t recognizing CUDA, check if your GPU is properly configured and that you have the necessary NVIDIA drivers installed.
- Audio Quality Problems: Experiment with different RVQ levels and token rates to see if a different setting yields better results for your specific audio type.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
With SNAC, compressing music and sound effects has never been easier or more efficient. By utilizing its innovative encoding methodology, you can preserve the quality of audio while minimizing storage needs.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

