GSoC 2021 Success!

This week marked the end of the Google Summer of Code (GSoC) program for 2021, which saw over 1200 students work on over 200 open source projects. This year, 2 students joined us at Audacity, and we are happy to report that both completed their projects successfully! The projects were as follows:

Source Separation

Hugo Flores Garcia, mentored by Dmitry Vedenko, implemented a deep learning AI tool that, given an appropriately trained model, is able to take an audio track with multiple sound sources (e.g. a combined “singer + piano” track) and splits it into multiple tracks, with one track for each source (i.e. a “singer” track and a “piano” track). This opens up a whole variety of interesting use cases, including karaoke and background noise removal. You can learn more about the Source Separation project in Hugo’s blog.

Spectral Editing

Edward Hui, mentored by Paul Licameli, implemented the ability to edit audio tracks by drawing on the spectogram rather than the waveform as is usually the case in Audacity. He also implemented smart selection tools to automatically select regions of contiguous “colour” on the spectogram, and to select overtones (harmonics) in addition to the fundamental frequency. Spectral editing is useful for removing unwanted sounds and background noises without distorting the main part of the audio signal. You can learn more about the Spectral Editing project in Edward’s blog.

Next steps

We will continue to work with the students over the coming weeks to make the final touches necessary to get their code merged into the program, at which point it will become available in GitHub Actions builds of the master branch and a subsequent stable release of Audacity.

GSOC 2021 with Audacity – Work product – Spectral editing tool

Hello all, it has been a rewarding summer working with the passionate audio lovers, and I am happy to share about the finalized work product here, which is a multi-featured spectral editing tool. Here’s the link for my previous works:

Weekly dev blog: https://www.audacityteam.org/category/gsoc/gsoc-2021-spectral-selection/

Pull request: https://github.com/audacity/audacity/pull/1193#issue-680302149

The mid-term prototype

Full demo available here

New feature – frequency snapping

The frequency snapping was introduced to help users pick the spectral selection more precisely, even with unsteady cursor movement. It evaluates the real-time dragging frequency bin and perform searching vertically during Fast Fourier Transform, and automatically pick the frequency bins with highest spectral energy. Not only does it provide more reliable way of spectral editing, but also yields a better spectral editing result, which minimizes the damage to the original spectrogram.

Demo video

New features – overtones selection

To achieve a good removal result of some of the unwanted sounds from the original track requires multiple editing, since most of the noises observed consist of the fundamental frequency (f0) and the overtones. Hand picking these overtones can be repetitive, especially for wind or brass instruments, which generates more overtones than general instruments. This feature is introduced to help picking these overtone automatically, user simply need to drag over the fundamental frequency, the overtones will be approximated and chosen.

It works similarly as the smart selection (frequency snapping), instead it takes a step forward and check for the multiples of the f0 for the similar spectral energy.

The technical summary

BrushHandle (The front-end)

It is inherited from the UIHandle class, and this is more like the “front-end” of our tool, the place where we interact with cursor coordinates, convert them to sample count hops and frequency bins, then we will be using Bresenham’s line and circle drawing algorithm to add the spectral data to the backend.

Our screen consists of limited pixels, meaning that it is impossible to draw pixel-perfect line, or even circle! The algorithm mentioned is critical for simulating the stroke of the brush, or the selection will be barely usable since it’s just an one-pixel thin line. The idea of the algorithm is simple, it check for the x vs y differentiation and pick the next coordinates based on the accumulated errors.

Apart from above algorithm, this class is also responsible for adapting the selection according to the zoom level. Since the user may want to zoom in the spectrogram and make a finer selection, the brush tool should be able to detect and adjust the selection later when users zoom out! Initially I have stored the spectral data in absolute mouse coordinates and that will not be able to scale up and down and it was later modified to sample count and frequency.

Lastly, it stores extra parameters like frequency snapping and overtone threshold, brush radius etc.

SpectralDataManager (The back-end)

This is the core of the calculation where the magic happens, it is partially inherited from the SpectrumTransformer, a class rewritten by my mentor Paul Licameli, to handle common transformations and calculations of FFT and IFFT. The entry point of these methods (ProcessTracks(), FindFrequencySnappingBin(), FindHighestFrequencyBins()) are static methods, and ultimately the calculation will be completed in another static methods with the Processor suffix.

Noted for these processor, the completed Fourier Transform coefficients can be considered as black-box for them, whereas they are exposed to single window of data only.

SpectralDataDialog (The GUI front-end)

This class is rather an interesting class for me, it inherits from the wxWidget UI components. Comparing to conventional C++ workflow, this class works more like the asynchronized JavaScript for me, it binds methods with events, which is broadcasted / received as global state. On top of this events-trigger system, there is another factory that is used to optimize the dependency management, we can statically attach object or even GUI window to the factory and use it whenever necessary, it helps to tackle some of the common problems like cycling dependencies.

This is the existing control panel for the brush tool, where we can select “smart selection”, “overtones selection” and adjust the brush radius using the slider.

What’s next?

First I need to give a big shout out to my mentor Paul Licameli, who has been an extremely passionate mentor and experienced C++ developers, he has been continuously providing assistance to me from high level architectural design to the lower level bug fixes suggestions, I would also like to thank you the Audacity team for arranging the program and the assistance provided!

I will be finishing the code review with Paul before the official end of the GSOC program, it is hoped that the frequency snapping and overtones can then be optimized. Afterwards, I will rebase the current branch onto the master and hopefully the tool will be merged and be available in the next release of Audacity.

GSoC 2021 – Work Product – Source Separation and Deep Learning Tools

Hi all! Google Summer of Code has wrapped up, and I have mostly completed my work contributing a Source Separation effect and a deep learning toolkit for Audacity. I still have code review fixes to address, but the code is in a fully functional state, and all the proposed features have been completed.

Code Changes

You can view the commit history and code reviews on the Pull Request I submitted to the main Audacity repo.

More Links

Here are links to more information on this project:

Work Product Summary

  • Deep Learning Effects
    • EffectSourceSep: A built-in effect for performing source separation in Audacity. While this effect is technically able to do more than just source separation (the internal effect functions as a generic deep learning processor that can produce a multi-track output given a single-track input), it is branded as Source Separation, as we expect the majority of model contributions to be focused on source separation. 
    • EffectDeepLearning: A base class for a built-in effect that uses PyTorch models. EffectDeepLearning takes care of data type conversions between torch::Tensor and WaveTrack/WaveClip data types. 
    • (In Progress) EffectLabeler: With the help of Aldo Aguilar, we are hoping to contribute an effect capable of performing automatic track labeling. Such an effect would enable users to perform automatic speech-to-text transcription or annotation of different target sounds within a track.
  • Deep Learning Tools: an internal toolkit for managing and using deep learning models anywhere within Audacity. 
    • DeepModelManager: A class for fetching, downloading, installing, and uninstalling deep learning models from HuggingFace repositories.
    • DeepModel and ModelCard
      • DeepModel: a wrapper class for PyTorch models. Loads an internal resampling module, which is used for resampling input audio to the model’s sample rate, and resampling output audio back to Audacity’s project rate. Takes care of exception handling during if loading the model fails, as well as internal errors during the model’s forward pass. 
      • ModelCard: class for holding model metadata.
  • Deep Model Manager UI: GUI elements for interacting with deep learning models hosted in HuggingFace. 
    • ManagerToolsPanel: The top panel, as seen on the image above. Contains controls for exploring models in HuggingFace and importing them onto the Model Manager UI.
    • ModelCardPanel scroller: a scroller for navigating through the fetched models. Contains a short description of the model’s purpose, as well as a color-coded tag meant to inform the user of the model’s intended data domain (that is, models tagged with “music” are meant to be used with music data, while models that with “speech” are meant to be used with speech data). 
    • DetailedModelCardPanel: a detailed view for deep models. Contains a longer description, model sample rate, additional tags, and a button that links to the HuggingFace repo’s README file, for even more information on the model.

Future Work

  • Finish addressing code review this week
  • Extract internal deep learning utilities to lib-deeplearning
  • Open a PR that incorporates EffectLabeler for deep learning-based sound labeling and tagging within Audacity

Special thanks to Bryan Pardo, Ethan Manilow, and Aldo Aguilar from the Interactive Audio Lab, as well as Dmitry Vedenko from the Audacity team for all the helpful discussions and support I received throughout the project. I hope my contribution to Audacity provides the groundwork for a bringing a new wave of effects based on deep learning to the hands of audio users.

Source Separation – GSoC 2021 Week 9

Hi all!

GSoC is starting to wrap up, and I’ve created two project boards to finalize the last of the work that needs to be completed to wrap up the Deep Learning Tools project for Audacity. The first project board is concerned with pending bug fixes and enhancements for the internal functionality of the Deep Model Manager (see the github link). The second board is concerned with improving the UI for model selection (see the github link). All of the high priority tasks in the first project board are done, and I am planning to finish both project boards by the end of the week (with help from Aldo Aguilar in the interactive audio lab). 

The manager UI will contain a new detailed view for ModelCards that offers a link for opening the model in HuggingFace, as well as a longer description of the model within Audacity. Additionally, using colored domain tags should help users pick the right model with more ease. 

GSOC 2021 with Audacity – Week 9

This is second to last week of the GSOC program, I have finalized the majority of the new codes, and I have conducted more frequent meetings with Paul regarding the code review.

The over-computation

Currently, the brush stroke is calculated based on Bresenham’s algorithm based on mouse coordinate system, however, the data we collected will require more calculation than the FFT transform can handle, in other words, we have collected too much spectral data but only be able to process limited number of them. Therefore, the whole brush stroke calculation will need to be refactored to sampleCount hop vs frequency bins, so we will not be wasting computation power on the area between each Fourier transform window.

The code review

My mentor Paul has been reviewing my code and gave me extremely detailed and helpful comments starting from last week, some of them are just code styles/unused header imports, however, there are critical bug fixes that he has spotted and pointed out. And I am currently resolving his comments, the history of the conversations can be viewed in this PR link.

Before the final week

It is hoped that the transformation and re-factorization of the hop vs bin space will be completed before next week, so we can try to optimize the frequency snapping and launch it as soon as possible.

GSOC 2021 with Audacity – Week 8

This week I have finished one additional feature, which is frequency snapping, this optional feature allows users to select the spectral data more accurately.

The frequency snapping

It is an optional feature, which is associated with the smart selection in the spectral editing dialog, it allows more precise selection from user, the brush stroke will be calculated and snap to the nearest frequency bins with highest spectral energy.

Preparing for PR and final review

Originally I have approximately 50+ commits, and it can be overwhelming for the code review, considering that some of the commits in between were obsoleted (already!), while some changes were reverting/refactoring the previous written codes. I have tried to rebase the whole branch and pick the important updates, reordering and combining multiple commits, and I have encountered quite a lot of conflicts that needed to be resolved.

Source Separation – GSoC 2021 Week 8

Hi all! Here are some updates for this week. 

  • I cleaned up the commit history for the deep learning implementation and opened a pull request in the official audacity repo. 
  • Added a dialog for manually specifying a HuggingFace repo to fetch (github). 
  • Fixed a bug where ModelCards weren’t scrollable until the user manually resized the window (github).
  • Amended the download behavior so the downloaded model file is written to file incrementally, lowering memory consumption (github). 
  • Added sorting to ModelCard panels (github).
  • Fixed several other bugs in the Model Manger and its UI (github).

To do

  • Start writing documentation for model contributors. The documentation should provide instructions on how to properly structure a HuggingFace repo for an audacity model, write a metadata file, and properly export the deep model to torchscript, ensuring that it meets the input/output constraints in Audacity. 
  • Continue to fix open issues with the model manager. 
  • Make ModelCards collapsible. Right now, only 2-3 can be shown on screen at a time. It may be a good idea to offer a collapsed view of the ModelCard. 
  • Provide a hyperlink (or a more info button) that points to the model’s HuggingFace readme somewhere in the ModelCard panel, so users can view more information about the model online (e.g. datasets, benchmarks, performance, examples).

GSOC 2021 with Audacity – Week 7

This week I have been working hard on adding a new feature called frequency snapping, I have also added other optimization of the brush tool.

The new cursor

For the old cursor, I have recycled the envelope cursor which doesn’t look good enough if we increase the radius of the brush, the new cursor will be positioned in the middle of the brush.

Major change to brush stroke

In previous development, I have used Bresenham’s algorithm to draw thick line to mimic the brush stroke, which is not realistic and rough edges can be observed, I have modified the algorithm to draw fully-circular brush stroke.

Source Separation – GSoC 2021 Week 7

Hi all! Here are some updates for this week:

  • The issue related to the download progress gauge appearing on the bottom corner has been fixed, though the size of the gauge itself still needs tweaking. 
  • In order to let the user know how large a model is prior to installing, model cards now show the model’s file size.
  • ModelCard (a class for containing model metadata) was refactored last week so that it doesn’t hold on to the JSON document, but rather serializes/deserializes only when downloading from HuggingFace or installing to disk.
  • I’ve started work on a top panel for the model manager UI, which will contain the controls for refreshing repos, searching and filtering, as well as manually adding a repo

In other news, Aldo Aguilar from the Interactive Audio Lab has been working on a Labeler effect built using EffectDeepLearning that will be capable of creating a label track with annotations for a given audio track. Possible applications of this effect include music tagging and speech-to-text, given that we can find pretrained models for both tasks. 

To do

  • Continue work on the top panel for the model manager UI. 
  • Right now, the response content for deep models is all held in memory at once while installing. This causes an unnecessary amount of memory consumption. Instead we want to incrementally write the response data to disk. 
  • Dmitry pointed out that the deep model’s forward pass is blocking the UI thread, since it can process large selections of audio at a time. Though a straightforward solution is to cut up the audio into smaller chunks, some deep learning models require a longer context window and/or are non-causal. I will spend more time investigating potential solutions to this. 
  • Layout work for model manager UI. Right now, most elements look out of place. I haven’t spent as much time on this because I’d like to finish writing the core logic of the DeepModelManager before digging into the details of the UI. 

GSOC 2021 with Audacity – Week 6

This week’s focus will be potential bug fixes for the brush tool prototype, and planning for the next rollout, containing more exciting features that I would like to bring to the community!

Control panel for the brush tool

Instead of the temporary red button, I have implemented a non-modal dialog for the control panel. It took longer development time than I expected, since I would like to use a native way of implementing dialog in Audacity codebase. And I have used AttachedWindows and AttachedObjects for decoupling the dependencies between SpectralDataManager, SpectralDialog etc, so when users click on the brush tool icon, the dialog will be created on-demand.

The back-end for overtones and smart selection is yet to be completed, but I prefer to firstly setup the front-end for prototyping and gain early feedback from the team regarding the UI design.

More incoming features!

It came to the second stage of the GSOC program, there are two or more features that I would like to complete before the second evaluation. When I think about overtones selection and threshold re-selection, these are indeed similar features which based on smart selection. I would need to modify the existing SpectrumTransformer to consider more windows for calculation, in fact, I prefer to set a fixed length for the smart selection to function properly, since it seems rather inappropriate to take the whole track into calculation.

Variable brush size and preview

A slider front-end has been added to adjust the brush radius in real-time, it would be user-friendly if we can include the predicted radius to replace the existing cursor. However, the current cursor implementation takes a bitmap file and fixed size as input, we can’t simply increase the size and scale up the bitmap as the radius increases, a work-around will be adding empty cursor and draw the brush preview manually in real-time.

However, here comes another challenges with the rendering routine of UIHandle, it doesn’t necessary call Draw() when hovering, but we need to manually drag or click to make the drawing visible.