GSOC 2021 with Audacity – Week 6

This week’s focus will be potential bug fixes for the brush tool prototype, and planning for the next rollout, containing more exciting features that I would like to bring to the community!

Control panel for the brush tool

Instead of the temporary red button, I have implemented a non-modal dialog for the control panel. It took longer development time than I expected, since I would like to use a native way of implementing dialog in Audacity codebase. And I have used AttachedWindows and AttachedObjects for decoupling the dependencies between SpectralDataManager, SpectralDialog etc, so when users click on the brush tool icon, the dialog will be created on-demand.

The back-end for overtones and smart selection is yet to be completed, but I prefer to firstly setup the front-end for prototyping and gain early feedback from the team regarding the UI design.

More incoming features!

It came to the second stage of the GSOC program, there are two or more features that I would like to complete before the second evaluation. When I think about overtones selection and threshold re-selection, these are indeed similar features which based on smart selection. I would need to modify the existing SpectrumTransformer to consider more windows for calculation, in fact, I prefer to set a fixed length for the smart selection to function properly, since it seems rather inappropriate to take the whole track into calculation.

Variable brush size and preview

A slider front-end has been added to adjust the brush radius in real-time, it would be user-friendly if we can include the predicted radius to replace the existing cursor. However, the current cursor implementation takes a bitmap file and fixed size as input, we can’t simply increase the size and scale up the bitmap as the radius increases, a work-around will be adding empty cursor and draw the brush preview manually in real-time.

However, here comes another challenges with the rendering routine of UIHandle, it doesn’t necessary call Draw() when hovering, but we need to manually drag or click to make the drawing visible.

GSOC 2021 with Audacity – Week 5

It has been an exciting week for me, after the completion of the first brush tool prototype! Currently each windowed sample will be passed via SpectrumTransformer into the newly added SpectralDataManager, it checks against the previously selected data and zeros out all the selected frequency bin. (Link to the commit)

The brush tool demo

I have chosen “Killing me softly” by Roberta Flack (one of my all-time favorites!), snippet of four seconds has been extracted from the beginning. I have also added a meow sound to it since we all love cats and more importantly, it consists of pitch variation which cannot be effectively selected by the current tool (horizontal line)

To use the brush tool, we simply dragged through the meow sound and its overtones, and click the apply button afterwards, then the selected frequency content will be zeroed out.

The full demo video is available here (with before v.s. after audio comparison):

https://drive.google.com/file/d/1bQJGncHWj_GqD19LOPeEp_og3j70akw8/view?usp=sharing

What’s next?

This is still, rather an early stage for this new feature, there are lots of potential improvements. For instance, we can definitely do better than zeroing out all the selected frequency bins, like average sampling from the non-selected windows (horizontally) or frequencies (vertically), or both!

Moreover, I would also like to make the selection smarter. For photo editing, say we were to remove or isolate subject from the background image, we would have prioritized and relied on tools like magic wand for picking up most of the desired area for us intelligently, then followed by the fine tuning using drawing tool. Being said, I hope that the tool will be able to guess and pick up user’s selection (or at least most of them), then the user can add/remove spectral data from the edges using brush tool.

A step even further will be picking up the overtones automatically for the user, during the “magic wand” stage. However, the overtones can be a bit tricky to calculate, since their shapes are kinda skewed in linear view and we need to take logarithmic scale as reference when performing the computation (User can edit in logarithmic view but we cannot easily “select view” for the computation). Without the introduction of advance area approximation algorithm, a possible way can be sliding the fundamental frequency area across the frequency bins that are close to its multiples, then we can estimate and spot the overtones by calculating their spectral energy similarity.

GSOC 2021 with Audacity – Week 4

This week I have performed several bug fixes and preparing for the last missing puzzle before the first evaluation – to perform FFT on selected frames, edit the spectral data and reverse it using IFFT, which the functions required have been modularized by Paul on the branch Noise-reduction-refactoring, where the SpectrumTransformer is introduced to the codebase.

Use EXPERIMENTAL flags on brush tool

The brush tool has now been refactored under the experimental flag EXPERIMENTAL_BRUSH_TOOL, which is a custom CMAKE flag during compile time, the features can now be safely merged to existing codebase, team members will be able to test the feature by simply reverse the flag. (Link to commit)

List of bug fixes

After applying the effect, the state is now properly appended to history stack, making the undo and redo possible. (Link to commit)

The trace of brush stroke will be distorted after undo and redo, it has now been fixed. (Link to commit)

The apply effect button will now be correctly triggered, even in waveform view. (Link to commit)

Rebase current brushtool branch and prepare for first deliverable!

The development of brush tool has now been rebased to Noise-reduction-refactoring, currently I am setting up new class called SpectrumDataManager, to encapsulate most of the spectral editing behind the scenes, with a worker class inherited from TrackSpectrumTransformer.

GSOC 2021 with Audacity – Week 3

Multiple features were added in this week to the tool as scheduled, I have spent most of the development time to understand some internal architectures of Audacity.

For instance, how RegisteredFactory provides ways of binding data to a host class to avoid circular dependencies, and how UndoManger utilize multiple layers of polymorphism to achieve complete state backup.

The native color scheme

Referencing to the original selection tool, it blends nicely into the spectrogram view without losing transparency, so the brush tool will also be modified accordingly, providing similar user experience. (Link to commit)

The eraser tool

The basic eraser tool has been added, user will now be able to erase the selected area. This feature is currently triggered by pressing Ctrl while dragging, the detailed designs will be further discussed with the team’s designers. (Link to commit)

The apply effect button

A prototype button has been added to apply different effects onto the selected area in the future (currently it simply removes the selection), it will most likely to be replaced with non-modal dialogue, with optional features UI like brush size slider. (Link to commit)

The redo/undo feature

In the project manager, the undo manger will trigger virtual CopyTo() that are inherited by different derived classes, I have added another method right after it reaches WaveTrackView, since different data from sub-views should also be copied into the state history. Taking the whole Track as an argument into single sub-view seems to be counter-intuitive since we are expecting one-to-one sub-view copying. (Link to commit)

GSOC 2021 with Audacity – Week 2

In this week, I have finished the basic prototype of the brush tool, the original data structure designed for storing spectral data has now been refactored. Adapting to agile development, I have setup a Kraken board on GitHub Projects over Jira since most of the team members are already on GitHub, the real-time progress of the project will now be traceable.

The refactorization of the structure

The first design is to access SpectrumView via static variable and it has now been fixed, the data should be local to each SpectrumView, meaning that for each stereo channel, the selection is stored separately, same applies for different tracks.

Link to the commit

Suggested by Paul, we are sticking to the workflow of the SpectrumView::DrawClipSpectrum, the structure has been modified from Frequency -> Time points to Time point -> Frequency Bins.

Link to the commit

The missing cursor coordinates

The mouse events associated with UIHandle is not captured constantly (or frequent enough), meaning that if the user drag the mouse dramatically, some of the coordinates will be missed, considering the following graph, where the dragging speed increases from top to bottom:

An easy cheat will be to connect the last visited coordinate to the current one using wxDC::DrawLine, we can even customize the thickness with single parameter, however it only affects the selected area visually, the continuous coordinates are still missing from the structure. Since it is impossible to capture the pixel-perfect line on our screen, we need algorithm to estimate the line, ideally with customizable thickness, and Bresenham’s line drawing algorithm has been chosen, it will be further modified since we expect the brush to be circular, but it will be good enough for prototyping.

Link to the commit

To be done: the UX and undo/redo

As an ordinary application user, using keyboard shortcuts like Ctrl+Z & Ctrl+Y almost becomes our muscle memory, and of course we expect to have similar functionality for the tool! Since the structure is new to the codebase, and we cannot simply reuse the ModifyState from the ProjectHistory, we will need to inform the base class about copying this new structure when adding to the state history.

GSOC 2021 with Audacity – Week 1

In this week, I have conducted meetings with my mentor and learned more about the rendering logic, how different methods of inherited UIHandler works together, and we have set the expectation for the following few weeks, I have also completed a prototype of the brush tool.

Works done in this week:

  1. Created BrushHandle, inherited some basic functions and logic.
  2. Tried different approaches for displaying the brush trails in real-time, including rendering from the BrushHandle and SpectrumView respectively.
  3. Setup data structure to store and convert the mouse events into frequency-time bins (adapted automatically to different user scaling).

Next week’s goal:

  1. Change the color of the selected area, to adapt to the existing color gradient scheme
  2. Refactor the data structure of the selected area
  3. Implement new UI components, to erase or apply the editing effect
  4. Append the selection to the state history

Audacity – Spectral editing tools introduction

Hello all, this is Edward Hui from Hong Kong. I have a strong interest in audio/signal processing and Neuroscience, and I have been selected for the project “spectral editing tool” this summer, mentored by Paul Licameli. Here are links to my GitHub and LinkedIn profiles, please feel free to connect with me.

Background of spectral editing

Considering one of the most popular songs in history, Hey Jude by The Beatles as an example. I have fetched the song from YouTube as a wav file and imported it to Audacity (no CD or Vinyl magic happening), the snippet is attached here.

This is the original spectrogram, using the logarithmic scaling, window size of 4096, and band limited to [100 – 5000]Hz.

And two “ding” sounds were added as unwanted noises; at around 2900Hz, a modified snippet is attached here.

Using spectrogram view, the above noises are visualized and easily spotted by users, they are not blending much into the original mixing and their spectral energies are usually high. Common spectral editing examples include: removing unwanted doorbells during voice recording, eliminating coughing from the concert recording, those are common usage of noise removal for ordinary users.

In fact, there is built-in function for handling simple spectral editing, but it is strictly limited to straight line, making the editing not flexible enough to accommodate slightly more complicated noises with pitch variations, say the cat’s meow during voice recording as in the following graph.

The basic deliverable of the project

Brush tool will be introduced as the basic deliverable of this project, making spectral editing more user-friendly and effective, users can simply drag through the desired area, and regions with high spectral energy will be approximated and selected, like the following graph. 

There are few challenges involved in this project

  1. The UI design of the tool and how should it be positioned in the existing toolbar, for better editing experience
  2. The data structure representing the brush and selected area, and which algorithm should we use to estimate the bounded points from continuous mouse position in real-time (most likely Bresenham’s algorithm or Midpoint circle algorithm, combined with Flood fill algorithm)
  3. The method of transforming the calculated area into the corresponding frequency components
  4. The combination of parameters for performing the Short-time Fourier transform and the inverse of it after the editing, i.e. window type, FFT size, and overlapping ratio etc.

Optional features

The brush tool is expected to be completed and delivered before the first evaluation dated 12 July, one of the following features will be selected and developed according to the schedule.

1. Overtone selection

The aforementioned noises in real-life are similar to other audio, consisting of both fundamental frequency(F0) and overtone resonances; to effectively eliminate the unwanted noises, they should be all selected and removed.

It would be nice to approximate the overtones automatically from the F0, without users’ manual selection, and the threshold decision for such approximation is important.

2. Area re-selection

The area selected by the new tools can be adjusted using UI components like sliders, to decide the spectral energy threshold, for improving the editing experience.

Conclusion

This project aims to make spectral editing more widely accessible for all users regardless of their editing experience, the above features are hopefully to complete the spectral editing function of Audacity and empower more creative editing ideas. 

Thanks to the Audacity team once again for accepting my proposal and I am looking forward to the coding stage! I will be writing weekly blogs during development and the links will also be updated here.