GSOC 2021 with Audacity – Week 9

This is second to last week of the GSOC program, I have finalized the majority of the new codes, and I have conducted more frequent meetings with Paul regarding the code review.

The over-computation

Currently, the brush stroke is calculated based on Bresenham’s algorithm based on mouse coordinate system, however, the data we collected will require more calculation than the FFT transform can handle, in other words, we have collected too much spectral data but only be able to process limited number of them. Therefore, the whole brush stroke calculation will need to be refactored to sampleCount hop vs frequency bins, so we will not be wasting computation power on the area between each Fourier transform window.

The code review

My mentor Paul has been reviewing my code and gave me extremely detailed and helpful comments starting from last week, some of them are just code styles/unused header imports, however, there are critical bug fixes that he has spotted and pointed out. And I am currently resolving his comments, the history of the conversations can be viewed in this PR link.

Before the final week

It is hoped that the transformation and re-factorization of the hop vs bin space will be completed before next week, so we can try to optimize the frequency snapping and launch it as soon as possible.

GSOC 2021 with Audacity – Week 8

This week I have finished one additional feature, which is frequency snapping, this optional feature allows users to select the spectral data more accurately.

The frequency snapping

It is an optional feature, which is associated with the smart selection in the spectral editing dialog, it allows more precise selection from user, the brush stroke will be calculated and snap to the nearest frequency bins with highest spectral energy.

Preparing for PR and final review

Originally I have approximately 50+ commits, and it can be overwhelming for the code review, considering that some of the commits in between were obsoleted (already!), while some changes were reverting/refactoring the previous written codes. I have tried to rebase the whole branch and pick the important updates, reordering and combining multiple commits, and I have encountered quite a lot of conflicts that needed to be resolved.

Source Separation – GSoC 2021 Week 8

Hi all! Here are some updates for this week. 

  • I cleaned up the commit history for the deep learning implementation and opened a pull request in the official audacity repo. 
  • Added a dialog for manually specifying a HuggingFace repo to fetch (github). 
  • Fixed a bug where ModelCards weren’t scrollable until the user manually resized the window (github).
  • Amended the download behavior so the downloaded model file is written to file incrementally, lowering memory consumption (github). 
  • Added sorting to ModelCard panels (github).
  • Fixed several other bugs in the Model Manger and its UI (github).

To do

  • Start writing documentation for model contributors. The documentation should provide instructions on how to properly structure a HuggingFace repo for an audacity model, write a metadata file, and properly export the deep model to torchscript, ensuring that it meets the input/output constraints in Audacity. 
  • Continue to fix open issues with the model manager. 
  • Make ModelCards collapsible. Right now, only 2-3 can be shown on screen at a time. It may be a good idea to offer a collapsed view of the ModelCard. 
  • Provide a hyperlink (or a more info button) that points to the model’s HuggingFace readme somewhere in the ModelCard panel, so users can view more information about the model online (e.g. datasets, benchmarks, performance, examples).

GSOC 2021 with Audacity – Week 7

This week I have been working hard on adding a new feature called frequency snapping, I have also added other optimization of the brush tool.

The new cursor

For the old cursor, I have recycled the envelope cursor which doesn’t look good enough if we increase the radius of the brush, the new cursor will be positioned in the middle of the brush.

Major change to brush stroke

In previous development, I have used Bresenham’s algorithm to draw thick line to mimic the brush stroke, which is not realistic and rough edges can be observed, I have modified the algorithm to draw fully-circular brush stroke.

Source Separation – GSoC 2021 Week 7

Hi all! Here are some updates for this week:

  • The issue related to the download progress gauge appearing on the bottom corner has been fixed, though the size of the gauge itself still needs tweaking. 
  • In order to let the user know how large a model is prior to installing, model cards now show the model’s file size.
  • ModelCard (a class for containing model metadata) was refactored last week so that it doesn’t hold on to the JSON document, but rather serializes/deserializes only when downloading from HuggingFace or installing to disk.
  • I’ve started work on a top panel for the model manager UI, which will contain the controls for refreshing repos, searching and filtering, as well as manually adding a repo

In other news, Aldo Aguilar from the Interactive Audio Lab has been working on a Labeler effect built using EffectDeepLearning that will be capable of creating a label track with annotations for a given audio track. Possible applications of this effect include music tagging and speech-to-text, given that we can find pretrained models for both tasks. 

To do

  • Continue work on the top panel for the model manager UI. 
  • Right now, the response content for deep models is all held in memory at once while installing. This causes an unnecessary amount of memory consumption. Instead we want to incrementally write the response data to disk. 
  • Dmitry pointed out that the deep model’s forward pass is blocking the UI thread, since it can process large selections of audio at a time. Though a straightforward solution is to cut up the audio into smaller chunks, some deep learning models require a longer context window and/or are non-causal. I will spend more time investigating potential solutions to this. 
  • Layout work for model manager UI. Right now, most elements look out of place. I haven’t spent as much time on this because I’d like to finish writing the core logic of the DeepModelManager before digging into the details of the UI. 

Audacity 3.0.3 is out now!

We’re happy to announce our latest release: Audacity 3.0.3.

Key improvements:

  • The Windows version of Audacity is now 64-bit
    • (Note: 32-bit plug-ins will not work on 64-bit Audacity)
  • We have improved the default spectrogram colours
  • We now provide an official binary for Linux in the form of an AppImage
  • Audacity can now check to see if there is a newer version available
  • Users are now able to send us the details of a serious error, if one occurs
  • Multiple bugs fixed

You can download Audacity 3.0.3 for Windows, macOS and GNU/Linux at audacityteam.org/download.

GSOC 2021 with Audacity – Week 6

This week’s focus will be potential bug fixes for the brush tool prototype, and planning for the next rollout, containing more exciting features that I would like to bring to the community!

Control panel for the brush tool

Instead of the temporary red button, I have implemented a non-modal dialog for the control panel. It took longer development time than I expected, since I would like to use a native way of implementing dialog in Audacity codebase. And I have used AttachedWindows and AttachedObjects for decoupling the dependencies between SpectralDataManager, SpectralDialog etc, so when users click on the brush tool icon, the dialog will be created on-demand.

The back-end for overtones and smart selection is yet to be completed, but I prefer to firstly setup the front-end for prototyping and gain early feedback from the team regarding the UI design.

More incoming features!

It came to the second stage of the GSOC program, there are two or more features that I would like to complete before the second evaluation. When I think about overtones selection and threshold re-selection, these are indeed similar features which based on smart selection. I would need to modify the existing SpectrumTransformer to consider more windows for calculation, in fact, I prefer to set a fixed length for the smart selection to function properly, since it seems rather inappropriate to take the whole track into calculation.

Variable brush size and preview

A slider front-end has been added to adjust the brush radius in real-time, it would be user-friendly if we can include the predicted radius to replace the existing cursor. However, the current cursor implementation takes a bitmap file and fixed size as input, we can’t simply increase the size and scale up the bitmap as the radius increases, a work-around will be adding empty cursor and draw the brush preview manually in real-time.

However, here comes another challenges with the rendering routine of UIHandle, it doesn’t necessary call Draw() when hovering, but we need to manually drag or click to make the drawing visible.

Source Separation – GSoC 2021 Week 6

Hi all! 

There aren’t many updates for this week. I spent the past week cleaning out bugs in the model manager related to networking and threading. I hit a block around Wednesday, when the deep learning effect stopped showing up on the Plugin Manager entirely. It took a couple of days for me to figure out , but I’m back on track now, and I’m ready to keep the ball rolling. 

To do:

  • Fix a bug where download progress gauge appears in the bottom left corner of the ModelCardPanel, instead of on top of the install button. 
  • Refactor ModelCard, so that we serialize // deserialize the internal JSON object only when necessary. 
  • Add a top panel for the model manager UI, with the following functionality
    • Search through model cards
    • Filter by 
      • domain (music, speech, etc)
      • Task (separation, enhancement)
      • Other metadata keys
    • Manually add a huggingface repo
  • If a model is installed and there’s a newer version available, let the user know.

Source Separation – GSoC 2021 Week 5

Hi all! Here are this week’s updates: 

I’ve made progress on the Model Manager! Right now, all HuggingFace repositories with the tag “audacity” are downloaded and displayed as model cards (as seen below). If a user chooses to install a model, the model manager queries HuggingFace for the actual model file (the heavy stuff) and installs it into a local directory. This interface lets users choose from a variety of Deep Learning models trained by contributors around the world for a wide variety of applications.

To do: 

  • GUI work
  • searching and filtering between model cards
  • Grab a music separation model!
A prettier GUI coming soon!