GSOC 2021 with Audacity – Week 6

This week’s focus will be potential bug fixes for the brush tool prototype, and planning for the next rollout, containing more exciting features that I would like to bring to the community!

Control panel for the brush tool

Instead of the temporary red button, I have implemented a non-modal dialog for the control panel. It took longer development time than I expected, since I would like to use a native way of implementing dialog in Audacity codebase. And I have used AttachedWindows and AttachedObjects for decoupling the dependencies between SpectralDataManager, SpectralDialog etc, so when users click on the brush tool icon, the dialog will be created on-demand.

The back-end for overtones and smart selection is yet to be completed, but I prefer to firstly setup the front-end for prototyping and gain early feedback from the team regarding the UI design.

More incoming features!

It came to the second stage of the GSOC program, there are two or more features that I would like to complete before the second evaluation. When I think about overtones selection and threshold re-selection, these are indeed similar features which based on smart selection. I would need to modify the existing SpectrumTransformer to consider more windows for calculation, in fact, I prefer to set a fixed length for the smart selection to function properly, since it seems rather inappropriate to take the whole track into calculation.

Variable brush size and preview

A slider front-end has been added to adjust the brush radius in real-time, it would be user-friendly if we can include the predicted radius to replace the existing cursor. However, the current cursor implementation takes a bitmap file and fixed size as input, we can’t simply increase the size and scale up the bitmap as the radius increases, a work-around will be adding empty cursor and draw the brush preview manually in real-time.

However, here comes another challenges with the rendering routine of UIHandle, it doesn’t necessary call Draw() when hovering, but we need to manually drag or click to make the drawing visible.

Source Separation – GSoC 2021 Week 6

Hi all! 

There aren’t many updates for this week. I spent the past week cleaning out bugs in the model manager related to networking and threading. I hit a block around Wednesday, when the deep learning effect stopped showing up on the Plugin Manager entirely. It took a couple of days for me to figure out , but I’m back on track now, and I’m ready to keep the ball rolling. 

To do:

  • Fix a bug where download progress gauge appears in the bottom left corner of the ModelCardPanel, instead of on top of the install button. 
  • Refactor ModelCard, so that we serialize // deserialize the internal JSON object only when necessary. 
  • Add a top panel for the model manager UI, with the following functionality
    • Search through model cards
    • Filter by 
      • domain (music, speech, etc)
      • Task (separation, enhancement)
      • Other metadata keys
    • Manually add a huggingface repo
  • If a model is installed and there’s a newer version available, let the user know.

Source Separation – GSoC 2021 Week 5

Hi all! Here are this week’s updates: 

I’ve made progress on the Model Manager! Right now, all HuggingFace repositories with the tag “audacity” are downloaded and displayed as model cards (as seen below). If a user chooses to install a model, the model manager queries HuggingFace for the actual model file (the heavy stuff) and installs it into a local directory. This interface lets users choose from a variety of Deep Learning models trained by contributors around the world for a wide variety of applications.

To do: 

  • GUI work
  • searching and filtering between model cards
  • Grab a music separation model!
A prettier GUI coming soon!

GSOC 2021 with Audacity – Week 5

It has been an exciting week for me, after the completion of the first brush tool prototype! Currently each windowed sample will be passed via SpectrumTransformer into the newly added SpectralDataManager, it checks against the previously selected data and zeros out all the selected frequency bin. (Link to the commit)

The brush tool demo

I have chosen “Killing me softly” by Roberta Flack (one of my all-time favorites!), snippet of four seconds has been extracted from the beginning. I have also added a meow sound to it since we all love cats and more importantly, it consists of pitch variation which cannot be effectively selected by the current tool (horizontal line)

To use the brush tool, we simply dragged through the meow sound and its overtones, and click the apply button afterwards, then the selected frequency content will be zeroed out.

The full demo video is available here (with before v.s. after audio comparison):

https://drive.google.com/file/d/1bQJGncHWj_GqD19LOPeEp_og3j70akw8/view?usp=sharing

What’s next?

This is still, rather an early stage for this new feature, there are lots of potential improvements. For instance, we can definitely do better than zeroing out all the selected frequency bins, like average sampling from the non-selected windows (horizontally) or frequencies (vertically), or both!

Moreover, I would also like to make the selection smarter. For photo editing, say we were to remove or isolate subject from the background image, we would have prioritized and relied on tools like magic wand for picking up most of the desired area for us intelligently, then followed by the fine tuning using drawing tool. Being said, I hope that the tool will be able to guess and pick up user’s selection (or at least most of them), then the user can add/remove spectral data from the edges using brush tool.

A step even further will be picking up the overtones automatically for the user, during the “magic wand” stage. However, the overtones can be a bit tricky to calculate, since their shapes are kinda skewed in linear view and we need to take logarithmic scale as reference when performing the computation (User can edit in logarithmic view but we cannot easily “select view” for the computation). Without the introduction of advance area approximation algorithm, a possible way can be sliding the fundamental frequency area across the frequency bins that are close to its multiples, then we can estimate and spot the overtones by calculating their spectral energy similarity.

Source Separation – GSoC 2021 Week 4

Hi all! Here are this week’s updates: 

Though the focus of the project is on making a source separation effect, a lot of the code written for this effect has shown to be generic enough that it can be used with any deep-learning based audio processor, given that it meets certain input-output constraints. Thus, we will be providing a way for researchers and deep learning practitioners to share their source separation (and more!) models with the Audacity community. 

The “Deep Learning Effect” infrastructure can be used with any PyTorch-based models that take a single-channel (multichannel optional) waveform, and output an arbitrary number of audio waveforms, which are then written to output tracks.   

This opens up the opportunity to make available an entire suite of different processors, like speech denoisers, speech enhancers, source separation, audio superresolution, etc., with contributions from the community. People will be able to upload the models they want to contribute to HuggingFace, and we will provide an interface for users to see and download these models from within Audacity. I will be working with nussl to provide wrappers and guidelines for making sure that the uploaded models are compatible with Audacity. 

I met with Ethan from the nussl team, as well as Jouni and Dmitry from the Audacity team. We talked about what the UX design would look like for using the Deep Learning effects in Audacity. In order to make these different models available to users, we plan on designing a package manager-style interface for installing and uninstalling deep models in Audacity. 

I made a basic wireframe of what the model manager UI would look like:

Goals for this week:

  • Work on the backend for the deep model manager in audacity. The manager should be able to 
    • Query HuggingFace for model repos that match certain tags (e.g. “Audacity”). 
    • Keep a collection of these repos, along with their metadata.
    • Search and filter through the repos with respect to different metadata fields.
    • Be able to install and uninstall different models upon request.

GSOC 2021 with Audacity – Week 4

This week I have performed several bug fixes and preparing for the last missing puzzle before the first evaluation – to perform FFT on selected frames, edit the spectral data and reverse it using IFFT, which the functions required have been modularized by Paul on the branch Noise-reduction-refactoring, where the SpectrumTransformer is introduced to the codebase.

Use EXPERIMENTAL flags on brush tool

The brush tool has now been refactored under the experimental flag EXPERIMENTAL_BRUSH_TOOL, which is a custom CMAKE flag during compile time, the features can now be safely merged to existing codebase, team members will be able to test the feature by simply reverse the flag. (Link to commit)

List of bug fixes

After applying the effect, the state is now properly appended to history stack, making the undo and redo possible. (Link to commit)

The trace of brush stroke will be distorted after undo and redo, it has now been fixed. (Link to commit)

The apply effect button will now be correctly triggered, even in waveform view. (Link to commit)

Rebase current brushtool branch and prepare for first deliverable!

The development of brush tool has now been rebased to Noise-reduction-refactoring, currently I am setting up new class called SpectrumDataManager, to encapsulate most of the spectral editing behind the scenes, with a worker class inherited from TrackSpectrumTransformer.