GSOC 2021 with Audacity – Work product – Spectral editing tool

Hello all, it has been a rewarding summer working with the passionate audio lovers, and I am happy to share about the finalized work product here, which is a multi-featured spectral editing tool. Here’s the link for my previous works:

Weekly dev blog: https://www.audacityteam.org/category/gsoc/gsoc-2021-spectral-editing/

Pull request: https://github.com/audacity/audacity/pull/1193#issue-680302149

The mid-term prototype

Full demo available here

New feature – frequency snapping

The frequency snapping was introduced to help users pick the spectral selection more precisely, even with unsteady cursor movement. It evaluates the real-time dragging frequency bin and perform searching vertically during Fast Fourier Transform, and automatically pick the frequency bins with highest spectral energy. Not only does it provide more reliable way of spectral editing, but also yields a better spectral editing result, which minimizes the damage to the original spectrogram.

Demo video

New features – overtones selection

To achieve a good removal result of some of the unwanted sounds from the original track requires multiple editing, since most of the noises observed consist of the fundamental frequency (f0) and the overtones. Hand picking these overtones can be repetitive, especially for wind or brass instruments, which generates more overtones than general instruments. This feature is introduced to help picking these overtone automatically, user simply need to drag over the fundamental frequency, the overtones will be approximated and chosen.

It works similarly as the smart selection (frequency snapping), instead it takes a step forward and check for the multiples of the f0 for the similar spectral energy.

The technical summary

BrushHandle (The front-end)

It is inherited from the UIHandle class, and this is more like the “front-end” of our tool, the place where we interact with cursor coordinates, convert them to sample count hops and frequency bins, then we will be using Bresenham’s line and circle drawing algorithm to add the spectral data to the backend.

Our screen consists of limited pixels, meaning that it is impossible to draw pixel-perfect line, or even circle! The algorithm mentioned is critical for simulating the stroke of the brush, or the selection will be barely usable since it’s just an one-pixel thin line. The idea of the algorithm is simple, it check for the x vs y differentiation and pick the next coordinates based on the accumulated errors.

Apart from above algorithm, this class is also responsible for adapting the selection according to the zoom level. Since the user may want to zoom in the spectrogram and make a finer selection, the brush tool should be able to detect and adjust the selection later when users zoom out! Initially I have stored the spectral data in absolute mouse coordinates and that will not be able to scale up and down and it was later modified to sample count and frequency.

Lastly, it stores extra parameters like frequency snapping and overtone threshold, brush radius etc.

SpectralDataManager (The back-end)

This is the core of the calculation where the magic happens, it is partially inherited from the SpectrumTransformer, a class rewritten by my mentor Paul Licameli, to handle common transformations and calculations of FFT and IFFT. The entry point of these methods (ProcessTracks(), FindFrequencySnappingBin(), FindHighestFrequencyBins()) are static methods, and ultimately the calculation will be completed in another static methods with the Processor suffix.

Noted for these processor, the completed Fourier Transform coefficients can be considered as black-box for them, whereas they are exposed to single window of data only.

SpectralDataDialog (The GUI front-end)

This class is rather an interesting class for me, it inherits from the wxWidget UI components. Comparing to conventional C++ workflow, this class works more like the asynchronized JavaScript for me, it binds methods with events, which is broadcasted / received as global state. On top of this events-trigger system, there is another factory that is used to optimize the dependency management, we can statically attach object or even GUI window to the factory and use it whenever necessary, it helps to tackle some of the common problems like cycling dependencies.

This is the existing control panel for the brush tool, where we can select “smart selection”, “overtones selection” and adjust the brush radius using the slider.

What’s next?

First I need to give a big shout out to my mentor Paul Licameli, who has been an extremely passionate mentor and experienced C++ developers, he has been continuously providing assistance to me from high level architectural design to the lower level bug fixes suggestions, I would also like to thank you the Audacity team for arranging the program and the assistance provided!

I will be finishing the code review with Paul before the official end of the GSOC program, it is hoped that the frequency snapping and overtones can then be optimized. Afterwards, I will rebase the current branch onto the master and hopefully the tool will be merged and be available in the next release of Audacity.