Source Separation – GSoC 2021 Week 2

Hi all! Here are this week’s updates: 

I finished writing the processing code! Right now, the Source Separation effect in Audacity is capable of loading deep learning models from disk and using them to perform source separation on a user’s selected audio. When the Source Separation effect is applied to the audio, the effect creates a new track for each separated source. Because Source Separation models tend to operate at lower sample rates (a common sample rate for deep learning models is 16 kHz), each output track is resampled and converted to the same sample format as the original mix track. To name each of the new source tracks, each of the source labels are appended to the Mix’s track name. For example, if my original track is called “mix”, and my sources are [“drums”, “bass”], the new tracks will be named “mix – drums” and “mix – bass”. Here’s a quick demo of my progress so far:

Goals for next week:

  • Each new separated source track should be placed below the original mix. Right now, this is not the case, as all the new tracks are written at the bottom of the tracklist. I’d like to amend this behavior so that each separated source is grouped along with its parent mix.
  • Pass multi-channel audio to deep learning models as a multi-channel tensor. Let the model author decide what to do with multiple channels of audio (i.e. downmix). Write a downmix wrapper in PyTorch. 
  • Refactor processing code. A lot of the preprocessing and postprocessing steps could be abstracted away to make a base DeepLearningEffect class that contains useful methods that make use of deep learning models for different applications (e.g. automatic labeling, sound generation). 
  • Brainstorm different ideas for making a more useful // attractive UI for the Source Separation effect. 

One idea for a Source Separation UI that’s been on the back of my head is to take advantage of Audacity’s Spectral Editing Tools (refer to fellow GSoC project by Edward Hui) to make a Source Separation Mask editor. This means that we would first have a deep learning model estimate a separation mask for a given spectrogram, and then let a user edit and fine-tune the spectral mask using spectral editing tools. This would let a user improve the source separation output through photoshop-like post processing, or even potentially give the model hints about what to separate!

GSOC 2021 with Audacity – Week 1

In this week, I have conducted meetings with my mentor and learned more about the rendering logic, how different methods of inherited UIHandler works together, and we have set the expectation for the following few weeks, I have also completed a prototype of the brush tool.

Works done in this week:

  1. Created BrushHandle, inherited some basic functions and logic.
  2. Tried different approaches for displaying the brush trails in real-time, including rendering from the BrushHandle and SpectrumView respectively.
  3. Setup data structure to store and convert the mouse events into frequency-time bins (adapted automatically to different user scaling).

Next week’s goal:

  1. Change the color of the selected area, to adapt to the existing color gradient scheme
  2. Refactor the data structure of the selected area
  3. Implement new UI components, to erase or apply the editing effect
  4. Append the selection to the state history

Source Separation – GSoC 2021 Week 1

Hi all! My first week of GSoC went great. Here are some project updates:

I started prototyping some wrappers to export pretrained PyTorch source separation models for use in Audacity. The pretrained models will most likely be grabbed from Asteroid, an open source library with lots of recipes for training state of the art source separation models. Most Asteroid models are torchscript compatible via tracing (see this pull request), and I’ve already successfully traced a couple of ConvTasNet models trained for speech separation that should be ready for Audacity. You can look at these wrapper prototypes in this Colab notebook. The idea is to ship the model with JSON encoded metadata that we can display to the user. This way, we can inform a user of a model’s domain (e.g. speech // music), sample rate, size (larger models require a larger amount of compute), and output sources (e.g. Bass, Drums, Voice, Other).

Wrapping a model for Audacity should be straightforward for people familiar with source separation in PyTorch, and I’m planning on adding a small set of wrappers to nussl that facilitate the process, accompanied by a short tutorial. Ideally, this should encourage the research community to share their groundbreaking source separation models with the Audacity community, giving us access to the latest and greatest source separation! 🙂 

On the Audacity side, I added a CMake script that adds libtorch to audacity. However, because CPU-only libtorch is a whopping 200MB, source separation will likely be an optional feature, which means that libtorch needs to be an optional download that can be linked at runtime.  I will be in touch with my mentor Dmitry Vedenko about figuring out a way forward from there. 

I started writing code for the SourceSep effect in Audacity. So far, I’m able to load torchscript models into the effect, and display each model’s metadata.  By next week, I’d like to finish writing the processing code, where the audio data is converted from Audacity’s WaveTrack to a Torch Tensor, processed by the torchscript model, and the output is converted back to a WaveTrack.

Audacity’s Effect interface lacks the capability to write the output of an effect to new WaveTracks. This behavior is desirable for source separation, since a model that separates into 4 sources (Drums, Bass, Voice, and Other), would ideally create 4 new WaveTracks bound to the input track, one track for each source.  Analysis effects (like FindClipping) already create a new label track that gets bound to the input track. I’ll dig deeper into how this is done, and see if I can extend this behavior so a variable number of WaveTracks can be created to write the separation output.

Goals for Next Week:

  • Finish writing the Effect Processing code so each output source is appended to the input WaveTrack. 
  • Start thinking about an approach to writing the separation output to new, multiple WaveTracks.

Audacity – Spectral editing tools introduction

Hello all, this is Edward Hui from Hong Kong. I have a strong interest in audio/signal processing and Neuroscience, and I have been selected for the project “spectral editing tool” this summer, mentored by Paul Licameli. Here are links to my GitHub and LinkedIn profiles, please feel free to connect with me.

Background of spectral editing

Considering one of the most popular songs in history, Hey Jude by The Beatles as an example. I have fetched the song from YouTube as a wav file and imported it to Audacity (no CD or Vinyl magic happening), the snippet is attached here.

This is the original spectrogram, using the logarithmic scaling, window size of 4096, and band limited to [100 – 5000]Hz.

And two “ding” sounds were added as unwanted noises; at around 2900Hz, a modified snippet is attached here.

Using spectrogram view, the above noises are visualized and easily spotted by users, they are not blending much into the original mixing and their spectral energies are usually high. Common spectral editing examples include: removing unwanted doorbells during voice recording, eliminating coughing from the concert recording, those are common usage of noise removal for ordinary users.

In fact, there is built-in function for handling simple spectral editing, but it is strictly limited to straight line, making the editing not flexible enough to accommodate slightly more complicated noises with pitch variations, say the cat’s meow during voice recording as in the following graph.

The basic deliverable of the project

Brush tool will be introduced as the basic deliverable of this project, making spectral editing more user-friendly and effective, users can simply drag through the desired area, and regions with high spectral energy will be approximated and selected, like the following graph. 

There are few challenges involved in this project

  1. The UI design of the tool and how should it be positioned in the existing toolbar, for better editing experience
  2. The data structure representing the brush and selected area, and which algorithm should we use to estimate the bounded points from continuous mouse position in real-time (most likely Bresenham’s algorithm or Midpoint circle algorithm, combined with Flood fill algorithm)
  3. The method of transforming the calculated area into the corresponding frequency components
  4. The combination of parameters for performing the Short-time Fourier transform and the inverse of it after the editing, i.e. window type, FFT size, and overlapping ratio etc.

Optional features

The brush tool is expected to be completed and delivered before the first evaluation dated 12 July, one of the following features will be selected and developed according to the schedule.

1. Overtone selection

The aforementioned noises in real-life are similar to other audio, consisting of both fundamental frequency(F0) and overtone resonances; to effectively eliminate the unwanted noises, they should be all selected and removed.

It would be nice to approximate the overtones automatically from the F0, without users’ manual selection, and the threshold decision for such approximation is important.

2. Area re-selection

The area selected by the new tools can be adjusted using UI components like sliders, to decide the spectral energy threshold, for improving the editing experience.


This project aims to make spectral editing more widely accessible for all users regardless of their editing experience, the above features are hopefully to complete the spectral editing function of Audacity and empower more creative editing ideas. 

Thanks to the Audacity team once again for accepting my proposal and I am looking forward to the coding stage! I will be writing weekly blogs during development and the links will also be updated here.

Source Separation and Extensible MIR Tools for Audacity

Hello! My name is Hugo Flores Garcia. I have been selected to build source separation and other music information retrieval (MIR) tools for Audacity, as a project for Google Summer of Code. My mentor is Dmitry Vedenko from Audacity’s new development team, with Roger from the old team providing assistance.

What does source separation do, anyway?

Source separation would bring many exciting opportunities for Audacity users. The goal of audio source separation is to isolate the sound sources in a given mixture of sounds. For example, a saxophone player may wish to learn the melody to a particular jazz tune, and can use source separation to isolate the saxophone from the rest of the band and learn the part. On the other hand, a user may separate the vocal track from their favorite song to generate a karaoke track. In recent years, source separation has enabled a wide range of applications for people in the audio community, from “upmixing” vintage tracks to cleaning up podcast audio

Source separation aims to isolate individual sounds from the rest of a mixture. It is the opposite of mixing different sounds, which can be a complex, non-linear process. How sources are mixed makes separating them a difficult problem, suitable for deep learning. Image used courtesy of Ethan Manilow, Prem Seetharaman, and Justin Salamon [source].

For an in-depth tutorial on the concepts behind source separation and coding up your own source separation models in Python, I recommend this awesome ISMIR 2020 tutorial

Project Details

This project proposes the integration of deep learning based computer audition tools into Audacity. Though the focus of this project is audio source separation, the framework can be constructed such that the integration of other desirable MIR tools, such as automatic track labeling and tagging, can be later incorporated with relative ease by using the same deep learning infrastructure and simply introducing new interfaces for users to interact with. 

State of the art (SOTA) source separation systems are based on deep learning models. One thing to note is that individual source separation models are designed for specific audio domains. That is, users will have to choose different models for different tasks. For example, a user must pick a speech separation model to separate human speakers, and a music separation model to separate musical instruments in a song. 

Moreover, there can be a tradeoff between separation quality and the size of the model, and larger models will take a considerably longer amount of time to separate audio. This is especially true when users are performing separation without a GPU, which is our expected use case.  We need to find the right balance of quality and performance that will be suitable for most users. That being said, we expect users to have different machines and quality requirements, and want to provide support for a wide variety of potential use cases. 

Because we want to cater to this wide variety of source separation models, I plan on using a  modular approach to incorporating deep models into Audacity, such that different models can be swapped and used for different purposes, as long as the program is aware of the input and output constraints. PyTorch’s torchscript API lets us achieve such a design, as Python models can be exported into “black box” models that can be used in C++ applications. 

With a modular deep learning framework incorporated into Audacity, staying up to date with SOTA source separation models is simple. I plan to work closely with the Northwestern University Source Separation Library (nussl), which is developed and maintained by scientists with a strong presence in the source separation research community.  Creating a bridge between pretrained nussl models and the Audacity source separation interface ensures that users will always have access to the latest models in source separation research. 

Another advantage of this modular design is that it lays all the groundwork necessary for the incorporation of other deep learning-based systems in Audacity, such as speech recognition and automatic audio labeling!

I believe the next generation of audio processing tools will be powered by deep learning. I am excited to introduce the users of the world’s biggest free and open source audio editor to this new class of tools, and look forward to seeing what people around the world will do with audio source separation in Audacity! You can look at my original project proposal here, and keep track of my progress on my fork of Audacity on GitHub. 


My name is Hugo Flores García. Born and raised in Honduras, I’m a doctoral student at Northwestern University and a member of the Interactive Audio Lab. My research interests include sound event detection, audio source separation, and designing accessible music production and creation interfaces. 

Audacity & MuseScore Announcement!

Martin Keary (aka Tantacrul) recently posted a super YouTube video about Audacity containing the following announcement:

Audacity has just joined Muse Group, a collection of brands that includes another popular open source music app called MuseScore, which I’m currently in charge of. And since things are going rather well at MuseScore, I was asked to step up and also manage Audacity in partnership with its open source community. And just like we’re doing at MuseScore, we’re now planning on significantly improving the feature set and ease of use of Audacity – providing dedicated designers and developers to give it the attention it deserves – while keeping it free and open source.


We’re scared and excited.

We hope you are too.

Audacity 3.0.2 Released

We’re pleased to announce release of Audacity 3.0.2 which replaces all previous versions for Windows, macOS and Linux. This is a significant bug fixing release.

Better Diagnostics:
As well as bug fixing we have also added more detailed reporting into Audacity to track down some hopefully not too common problems with the new format we introduced in 3.0.0. If you see an unexpected error message with a “Show Log…” button on it, please send the log to [email protected], tell us how the problem happened, and whether it’s repeatable. We think, but do not know for sure yet, that some problems some users of 3.0.0 have had may be caused by networked drives which are slower than drives on the same laptop. We’ve increased a ‘timeout’ which should fix that.

Macro Output:
Users of the Macro feature in Audacity to process multiple files will find there is a new preference, Macro output, for where the results are put. The old way of doing things put the results with the files being processed.

New preference for macro output directory

Untangling Code:
In parallel with 3.0.2 and 3.0.0 work, we’ve been doing a lot of other work on Audacity on another branch that is for the future and not in 3.0.2. Paul Licameli has been untangling dependencies in Audacity and making many graphs of the structure to guide what to untangle next. Here is a small extract of one of those graphs.

Extract from Paul’s work on untangling the Audacity code

If code is hard to work with, we work more slowly. These changes to untangle the code should make Audacity more flexible, and make it easier to work with the code. We kept these changes out of 3.0.0 and 3.0.2, as the changes were substantial and the important aup3 work took precedence. Hopefully the more flexible cleaner structure will be a big win for future versions of Audacity.

Bug Fixes:
3.0.2 has some simple to do but important bug fixes. The compressor effect was not working for longer selections. We were also very occasionally getting an error messages at start up of Audacity, requiring a restart of Audacity. You can read more about what we did for 3.0.2 on the New Features page of the manual.

We hope you enjoy Audacity 3.0.2. We’ve made the big move to aup3 format and Paul has untangled a lot of our code for easier future work. We’re hoping we can now start moving forward more quickly with more visible improvements with these changes done.

Google Summer of Code 2021

Audacity is proud to be taking part in Google Summer of Code 2021, having last participated in 2008 and 2009.

Google’s Summer of Code Logo

We created a web page with four seed project ideas for students to base their project ideas on. We then applied to Google to be a mentoring organisation this year, and they said “Yes”. We hope to get two students working on projects for us this summer. The response so far has been phenomenal. Hopefully you will see cool and useful outcomes from their work in September.

Audacity 3.0.0 Released

We’re pleased to announce release of Audacity 3.0.0 which replaces all previous versions for Windows, macOS and Linux.

Audacity 3.0.0

.aup3 Project Format

Audacity 3.0.0 is a major update on our previous Audacity 2.4.2. We’ve changed the format in which we save Audacity projects! Previously we saved projects as a sometimes large number of small files, with an ‘.aup’ file to coordinate the lot. This way of doing things is sometimes called ‘pile of files’ storage.

The problem, which happened all too often, was that data files and .aup file parted ways. Users quite reasonably expected the .aup file to contain the entire project. Well, the new .aup3 file does contain the data as well. The technical detail is that we are using an open source database, SQLite3, to store everything in one .aup3 file. That all happens ‘behind the scenes’. SQLite3 is open source, and it is a delight to work with. Nevertheless, this was a huge change, and we decided it was too risky to include many other changes we wanted to make at the same time – so 3.0.0 is almost entirely about this big format change.

Working with .aup3 projects editing audio should on most machines be a little faster than before, because there are fewer files being worked on. Finishing and closing a project at the end of working can be quite a lot slower, since there is more to do when a project is closed. We think the trade offs are worth it.

Importantly note that you can open your older .aup projects in Audacity 3.0.0 where they will be converted to the new .aup3 format.

Label Sounds & Noise Gate

We did have time to improve our ‘Noise Gate’ effect and add a new analyzer, ‘Label Sounds’, that can label sounds and silences. We also made a few small tweaks elsewhere. You can now import and export macros, and there are a couple of new commands for using the last used tool or last used analyzer that you can give shortcuts to.

Bugs fixed

We also fixed over 160 bugs that had been accumulating over the years. This is quite a staggering amount of work. The majority of these bugs were minor problems, easily worked around. Some though were really juicy high priority bugs that would have mattered a lot to the people affected by them. We’re really glad to have these bugs fixed now.

We hope you enjoy using Audacity 3.0.0 as much as we enjoyed putting it together.