GSoC 2021 – Work Product – Source Separation and Deep Learning Tools

Hi all! Google Summer of Code has wrapped up, and I have mostly completed my work contributing a Source Separation effect and a deep learning toolkit for Audacity. I still have code review fixes to address, but the code is in a fully functional state, and all the proposed features have been completed.

Code Changes

You can view the commit history and code reviews on the Pull Request I submitted to the main Audacity repo.

More Links

Here are links to more information on this project:

Work Product Summary

  • Deep Learning Effects
    • EffectSourceSep: A built-in effect for performing source separation in Audacity. While this effect is technically able to do more than just source separation (the internal effect functions as a generic deep learning processor that can produce a multi-track output given a single-track input), it is branded as Source Separation, as we expect the majority of model contributions to be focused on source separation. 
    • EffectDeepLearning: A base class for a built-in effect that uses PyTorch models. EffectDeepLearning takes care of data type conversions between torch::Tensor and WaveTrack/WaveClip data types. 
    • (In Progress) EffectLabeler: With the help of Aldo Aguilar, we are hoping to contribute an effect capable of performing automatic track labeling. Such an effect would enable users to perform automatic speech-to-text transcription or annotation of different target sounds within a track.
  • Deep Learning Tools: an internal toolkit for managing and using deep learning models anywhere within Audacity. 
    • DeepModelManager: A class for fetching, downloading, installing, and uninstalling deep learning models from HuggingFace repositories.
    • DeepModel and ModelCard
      • DeepModel: a wrapper class for PyTorch models. Loads an internal resampling module, which is used for resampling input audio to the model’s sample rate, and resampling output audio back to Audacity’s project rate. Takes care of exception handling during if loading the model fails, as well as internal errors during the model’s forward pass. 
      • ModelCard: class for holding model metadata.
  • Deep Model Manager UI: GUI elements for interacting with deep learning models hosted in HuggingFace. 
    • ManagerToolsPanel: The top panel, as seen on the image above. Contains controls for exploring models in HuggingFace and importing them onto the Model Manager UI.
    • ModelCardPanel scroller: a scroller for navigating through the fetched models. Contains a short description of the model’s purpose, as well as a color-coded tag meant to inform the user of the model’s intended data domain (that is, models tagged with “music” are meant to be used with music data, while models that with “speech” are meant to be used with speech data). 
    • DetailedModelCardPanel: a detailed view for deep models. Contains a longer description, model sample rate, additional tags, and a button that links to the HuggingFace repo’s README file, for even more information on the model.

Future Work

  • Finish addressing code review this week
  • Extract internal deep learning utilities to lib-deeplearning
  • Open a PR that incorporates EffectLabeler for deep learning-based sound labeling and tagging within Audacity

Special thanks to Bryan Pardo, Ethan Manilow, and Aldo Aguilar from the Interactive Audio Lab, as well as Dmitry Vedenko from the Audacity team for all the helpful discussions and support I received throughout the project. I hope my contribution to Audacity provides the groundwork for a bringing a new wave of effects based on deep learning to the hands of audio users.