Separate instrument tracks from a song
How to separate instrument tracks from a song into vocals, drums, bass, and other.
NOTE: For research purposes only. Use at your own risk.
I study/read/work with soundtracks. That’s the best way I can focus. I cannot concentrate listening to music that has vocals. I’ve been wondering what is like to listen to other music I like without the vocals.
I wonder what Daft Punk sounds without vocals. I know. Most songs by Daft Punk are already instrumental.
I found this repo with a list of ML projects. See here
One of them is a project called
demucs, used to separate tracks from songs. More about the project here
Install using this:
pip install demucs
The output shows that a lot of dependencies need to be installed:
- torch>=1.8.1 (this was the largest with approx 900MB)
- nvidia-cudnn-cu11==184.108.40.206 (also a big one 500MB)
- nvidia-cublas-cu11==220.127.116.11 (300MB)
Successfully built demucs julius dora-search antlr4-python3-runtime treetable Installing collected packages: lameenc, antlr4-python3-runtime, treetable, tqdm, retrying, omegaconf, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cublas-cu11, numpy, einops, Cython, cloudpickle, submitit, nvidia-cudnn-cu11, torch, torchaudio, julius, dora-search, diffq, openunmix, demucs Successfully installed Cython-0.29.33 antlr4-python3-runtime-4.9.3 cloudpickle-2.2.1 demucs-4.0.0 diffq-0.2.3 dora-search-0.1.11 einops-0.6.0 julius-0.2.7 lameenc-1.4.2 numpy-1.24.2 nvidia-cublas-cu11-18.104.22.168 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-22.214.171.124 omegaconf-2.3.0 openunmix-1.2.1 retrying-1.3.4 submitit-1.4.5 torch-1.13.1 torchaudio-0.13.1 tqdm-4.64.1 treetable-0.2.5
Run with this:
Output running the program
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`. Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /home/tom/.cache/torch/hub/checkpoints/955717e8-8726e21a.th 80.2M/80.2M Selected model is a bag of 1 models. You will see that many progress bars per track. Separated tracks will be stored in /home/tom/Music/separated/htdemucs
It took 25 minutes to run an
8MB mp3 song.
Reviewing the output
Only for research purposes I used the song “Touch” by Daft Punk.
The result was four
.wav tracks named
vocals. Each one was
Loaded them into Audacity.
The result is wow. Amazing.
Listening to each individual track. They are almost like different songs.
Only bass and drums. Or bass and vocals. Or bass and other. You could make a whole album mixing this song in different ways.
vocals were removed from the beginning, end, and when you can clearly hear the singer. In the end, you only hear the piano. The vocals couldn’t be removed in the middle when there is a mix of human/robot singing. The angelic vocals were almost removed. You can hear them very quietly in the background. The vocals track also has the sound of wind blowing in the beginning.
demucs program is a product of Facebook Research. See the repo here. It has a research paper with all the proper complicated math that requires superhuman understanding. It did an amazing job with the song I tried.