GSOC 2021 with Audacity – Work product – Spectral editing tool

Hello all, it has been a rewarding summer working with the passionate audio lovers, and I am happy to share about the finalized work product here, which is a multi-featured spectral editing tool. Here’s the link for my previous works:

Weekly dev blog: https://www.audacityteam.org/category/gsoc/gsoc-2021-spectral-selection/

Pull request: https://github.com/audacity/audacity/pull/1193#issue-680302149

The mid-term prototype

Full demo available here

New feature – frequency snapping

The frequency snapping was introduced to help users pick the spectral selection more precisely, even with unsteady cursor movement. It evaluates the real-time dragging frequency bin and perform searching vertically during Fast Fourier Transform, and automatically pick the frequency bins with highest spectral energy. Not only does it provide more reliable way of spectral editing, but also yields a better spectral editing result, which minimizes the damage to the original spectrogram.

Demo video

New features – overtones selection

To achieve a good removal result of some of the unwanted sounds from the original track requires multiple editing, since most of the noises observed consist of the fundamental frequency (f0) and the overtones. Hand picking these overtones can be repetitive, especially for wind or brass instruments, which generates more overtones than general instruments. This feature is introduced to help picking these overtone automatically, user simply need to drag over the fundamental frequency, the overtones will be approximated and chosen.

It works similarly as the smart selection (frequency snapping), instead it takes a step forward and check for the multiples of the f0 for the similar spectral energy.

The technical summary

BrushHandle (The front-end)

It is inherited from the UIHandle class, and this is more like the “front-end” of our tool, the place where we interact with cursor coordinates, convert them to sample count hops and frequency bins, then we will be using Bresenham’s line and circle drawing algorithm to add the spectral data to the backend.

Our screen consists of limited pixels, meaning that it is impossible to draw pixel-perfect line, or even circle! The algorithm mentioned is critical for simulating the stroke of the brush, or the selection will be barely usable since it’s just an one-pixel thin line. The idea of the algorithm is simple, it check for the x vs y differentiation and pick the next coordinates based on the accumulated errors.

Apart from above algorithm, this class is also responsible for adapting the selection according to the zoom level. Since the user may want to zoom in the spectrogram and make a finer selection, the brush tool should be able to detect and adjust the selection later when users zoom out! Initially I have stored the spectral data in absolute mouse coordinates and that will not be able to scale up and down and it was later modified to sample count and frequency.

Lastly, it stores extra parameters like frequency snapping and overtone threshold, brush radius etc.

SpectralDataManager (The back-end)

This is the core of the calculation where the magic happens, it is partially inherited from the SpectrumTransformer, a class rewritten by my mentor Paul Licameli, to handle common transformations and calculations of FFT and IFFT. The entry point of these methods (ProcessTracks(), FindFrequencySnappingBin(), FindHighestFrequencyBins()) are static methods, and ultimately the calculation will be completed in another static methods with the Processor suffix.

Noted for these processor, the completed Fourier Transform coefficients can be considered as black-box for them, whereas they are exposed to single window of data only.

SpectralDataDialog (The GUI front-end)

This class is rather an interesting class for me, it inherits from the wxWidget UI components. Comparing to conventional C++ workflow, this class works more like the asynchronized JavaScript for me, it binds methods with events, which is broadcasted / received as global state. On top of this events-trigger system, there is another factory that is used to optimize the dependency management, we can statically attach object or even GUI window to the factory and use it whenever necessary, it helps to tackle some of the common problems like cycling dependencies.

This is the existing control panel for the brush tool, where we can select “smart selection”, “overtones selection” and adjust the brush radius using the slider.

What’s next?

First I need to give a big shout out to my mentor Paul Licameli, who has been an extremely passionate mentor and experienced C++ developers, he has been continuously providing assistance to me from high level architectural design to the lower level bug fixes suggestions, I would also like to thank you the Audacity team for arranging the program and the assistance provided!

I will be finishing the code review with Paul before the official end of the GSOC program, it is hoped that the frequency snapping and overtones can then be optimized. Afterwards, I will rebase the current branch onto the master and hopefully the tool will be merged and be available in the next release of Audacity.

GSOC 2021 with Audacity – Week 9

This is second to last week of the GSOC program, I have finalized the majority of the new codes, and I have conducted more frequent meetings with Paul regarding the code review.

The over-computation

Currently, the brush stroke is calculated based on Bresenham’s algorithm based on mouse coordinate system, however, the data we collected will require more calculation than the FFT transform can handle, in other words, we have collected too much spectral data but only be able to process limited number of them. Therefore, the whole brush stroke calculation will need to be refactored to sampleCount hop vs frequency bins, so we will not be wasting computation power on the area between each Fourier transform window.

The code review

My mentor Paul has been reviewing my code and gave me extremely detailed and helpful comments starting from last week, some of them are just code styles/unused header imports, however, there are critical bug fixes that he has spotted and pointed out. And I am currently resolving his comments, the history of the conversations can be viewed in this PR link.

Before the final week

It is hoped that the transformation and re-factorization of the hop vs bin space will be completed before next week, so we can try to optimize the frequency snapping and launch it as soon as possible.

GSOC 2021 with Audacity – Week 8

This week I have finished one additional feature, which is frequency snapping, this optional feature allows users to select the spectral data more accurately.

The frequency snapping

It is an optional feature, which is associated with the smart selection in the spectral editing dialog, it allows more precise selection from user, the brush stroke will be calculated and snap to the nearest frequency bins with highest spectral energy.

Preparing for PR and final review

Originally I have approximately 50+ commits, and it can be overwhelming for the code review, considering that some of the commits in between were obsoleted (already!), while some changes were reverting/refactoring the previous written codes. I have tried to rebase the whole branch and pick the important updates, reordering and combining multiple commits, and I have encountered quite a lot of conflicts that needed to be resolved.

GSOC 2021 with Audacity – Week 7

This week I have been working hard on adding a new feature called frequency snapping, I have also added other optimization of the brush tool.

The new cursor

For the old cursor, I have recycled the envelope cursor which doesn’t look good enough if we increase the radius of the brush, the new cursor will be positioned in the middle of the brush.

Major change to brush stroke

In previous development, I have used Bresenham’s algorithm to draw thick line to mimic the brush stroke, which is not realistic and rough edges can be observed, I have modified the algorithm to draw fully-circular brush stroke.

GSOC 2021 with Audacity – Week 6

This week’s focus will be potential bug fixes for the brush tool prototype, and planning for the next rollout, containing more exciting features that I would like to bring to the community!

Control panel for the brush tool

Instead of the temporary red button, I have implemented a non-modal dialog for the control panel. It took longer development time than I expected, since I would like to use a native way of implementing dialog in Audacity codebase. And I have used AttachedWindows and AttachedObjects for decoupling the dependencies between SpectralDataManager, SpectralDialog etc, so when users click on the brush tool icon, the dialog will be created on-demand.

The back-end for overtones and smart selection is yet to be completed, but I prefer to firstly setup the front-end for prototyping and gain early feedback from the team regarding the UI design.

More incoming features!

It came to the second stage of the GSOC program, there are two or more features that I would like to complete before the second evaluation. When I think about overtones selection and threshold re-selection, these are indeed similar features which based on smart selection. I would need to modify the existing SpectrumTransformer to consider more windows for calculation, in fact, I prefer to set a fixed length for the smart selection to function properly, since it seems rather inappropriate to take the whole track into calculation.

Variable brush size and preview

A slider front-end has been added to adjust the brush radius in real-time, it would be user-friendly if we can include the predicted radius to replace the existing cursor. However, the current cursor implementation takes a bitmap file and fixed size as input, we can’t simply increase the size and scale up the bitmap as the radius increases, a work-around will be adding empty cursor and draw the brush preview manually in real-time.

However, here comes another challenges with the rendering routine of UIHandle, it doesn’t necessary call Draw() when hovering, but we need to manually drag or click to make the drawing visible.

GSOC 2021 with Audacity – Week 5

It has been an exciting week for me, after the completion of the first brush tool prototype! Currently each windowed sample will be passed via SpectrumTransformer into the newly added SpectralDataManager, it checks against the previously selected data and zeros out all the selected frequency bin. (Link to the commit)

The brush tool demo

I have chosen “Killing me softly” by Roberta Flack (one of my all-time favorites!), snippet of four seconds has been extracted from the beginning. I have also added a meow sound to it since we all love cats and more importantly, it consists of pitch variation which cannot be effectively selected by the current tool (horizontal line)

To use the brush tool, we simply dragged through the meow sound and its overtones, and click the apply button afterwards, then the selected frequency content will be zeroed out.

The full demo video is available here (with before v.s. after audio comparison):

https://drive.google.com/file/d/1bQJGncHWj_GqD19LOPeEp_og3j70akw8/view?usp=sharing

What’s next?

This is still, rather an early stage for this new feature, there are lots of potential improvements. For instance, we can definitely do better than zeroing out all the selected frequency bins, like average sampling from the non-selected windows (horizontally) or frequencies (vertically), or both!

Moreover, I would also like to make the selection smarter. For photo editing, say we were to remove or isolate subject from the background image, we would have prioritized and relied on tools like magic wand for picking up most of the desired area for us intelligently, then followed by the fine tuning using drawing tool. Being said, I hope that the tool will be able to guess and pick up user’s selection (or at least most of them), then the user can add/remove spectral data from the edges using brush tool.

A step even further will be picking up the overtones automatically for the user, during the “magic wand” stage. However, the overtones can be a bit tricky to calculate, since their shapes are kinda skewed in linear view and we need to take logarithmic scale as reference when performing the computation (User can edit in logarithmic view but we cannot easily “select view” for the computation). Without the introduction of advance area approximation algorithm, a possible way can be sliding the fundamental frequency area across the frequency bins that are close to its multiples, then we can estimate and spot the overtones by calculating their spectral energy similarity.

GSOC 2021 with Audacity – Week 4

This week I have performed several bug fixes and preparing for the last missing puzzle before the first evaluation – to perform FFT on selected frames, edit the spectral data and reverse it using IFFT, which the functions required have been modularized by Paul on the branch Noise-reduction-refactoring, where the SpectrumTransformer is introduced to the codebase.

Use EXPERIMENTAL flags on brush tool

The brush tool has now been refactored under the experimental flag EXPERIMENTAL_BRUSH_TOOL, which is a custom CMAKE flag during compile time, the features can now be safely merged to existing codebase, team members will be able to test the feature by simply reverse the flag. (Link to commit)

List of bug fixes

After applying the effect, the state is now properly appended to history stack, making the undo and redo possible. (Link to commit)

The trace of brush stroke will be distorted after undo and redo, it has now been fixed. (Link to commit)

The apply effect button will now be correctly triggered, even in waveform view. (Link to commit)

Rebase current brushtool branch and prepare for first deliverable!

The development of brush tool has now been rebased to Noise-reduction-refactoring, currently I am setting up new class called SpectrumDataManager, to encapsulate most of the spectral editing behind the scenes, with a worker class inherited from TrackSpectrumTransformer.

GSOC 2021 with Audacity – Week 3

Multiple features were added in this week to the tool as scheduled, I have spent most of the development time to understand some internal architectures of Audacity.

For instance, how RegisteredFactory provides ways of binding data to a host class to avoid circular dependencies, and how UndoManger utilize multiple layers of polymorphism to achieve complete state backup.

The native color scheme

Referencing to the original selection tool, it blends nicely into the spectrogram view without losing transparency, so the brush tool will also be modified accordingly, providing similar user experience. (Link to commit)

The eraser tool

The basic eraser tool has been added, user will now be able to erase the selected area. This feature is currently triggered by pressing Ctrl while dragging, the detailed designs will be further discussed with the team’s designers. (Link to commit)

The apply effect button

A prototype button has been added to apply different effects onto the selected area in the future (currently it simply removes the selection), it will most likely to be replaced with non-modal dialogue, with optional features UI like brush size slider. (Link to commit)

The redo/undo feature

In the project manager, the undo manger will trigger virtual CopyTo() that are inherited by different derived classes, I have added another method right after it reaches WaveTrackView, since different data from sub-views should also be copied into the state history. Taking the whole Track as an argument into single sub-view seems to be counter-intuitive since we are expecting one-to-one sub-view copying. (Link to commit)

GSOC 2021 with Audacity – Week 2

In this week, I have finished the basic prototype of the brush tool, the original data structure designed for storing spectral data has now been refactored. Adapting to agile development, I have setup a Kraken board on GitHub Projects over Jira since most of the team members are already on GitHub, the real-time progress of the project will now be traceable.

The refactorization of the structure

The first design is to access SpectrumView via static variable and it has now been fixed, the data should be local to each SpectrumView, meaning that for each stereo channel, the selection is stored separately, same applies for different tracks.

Link to the commit

Suggested by Paul, we are sticking to the workflow of the SpectrumView::DrawClipSpectrum, the structure has been modified from Frequency -> Time points to Time point -> Frequency Bins.

Link to the commit

The missing cursor coordinates

The mouse events associated with UIHandle is not captured constantly (or frequent enough), meaning that if the user drag the mouse dramatically, some of the coordinates will be missed, considering the following graph, where the dragging speed increases from top to bottom:

An easy cheat will be to connect the last visited coordinate to the current one using wxDC::DrawLine, we can even customize the thickness with single parameter, however it only affects the selected area visually, the continuous coordinates are still missing from the structure. Since it is impossible to capture the pixel-perfect line on our screen, we need algorithm to estimate the line, ideally with customizable thickness, and Bresenham’s line drawing algorithm has been chosen, it will be further modified since we expect the brush to be circular, but it will be good enough for prototyping.

Link to the commit

To be done: the UX and undo/redo

As an ordinary application user, using keyboard shortcuts like Ctrl+Z & Ctrl+Y almost becomes our muscle memory, and of course we expect to have similar functionality for the tool! Since the structure is new to the codebase, and we cannot simply reuse the ModifyState from the ProjectHistory, we will need to inform the base class about copying this new structure when adding to the state history.

GSOC 2021 with Audacity – Week 1

In this week, I have conducted meetings with my mentor and learned more about the rendering logic, how different methods of inherited UIHandler works together, and we have set the expectation for the following few weeks, I have also completed a prototype of the brush tool.

Works done in this week:

  1. Created BrushHandle, inherited some basic functions and logic.
  2. Tried different approaches for displaying the brush trails in real-time, including rendering from the BrushHandle and SpectrumView respectively.
  3. Setup data structure to store and convert the mouse events into frequency-time bins (adapted automatically to different user scaling).

Next week’s goal:

  1. Change the color of the selected area, to adapt to the existing color gradient scheme
  2. Refactor the data structure of the selected area
  3. Implement new UI components, to erase or apply the editing effect
  4. Append the selection to the state history