This week I have been working hard on adding a new feature called frequency snapping, I have also added other optimization of the brush tool.
The new cursor
For the old cursor, I have recycled the envelope cursor which doesn’t look good enough if we increase the radius of the brush, the new cursor will be positioned in the middle of the brush.
Major change to brush stroke
In previous development, I have used Bresenham’s algorithm to draw thick line to mimic the brush stroke, which is not realistic and rough edges can be observed, I have modified the algorithm to draw fully-circular brush stroke.
The issue related to the download progress gauge appearing on the bottom corner has been fixed, though the size of the gauge itself still needs tweaking.
In order to let the user know how large a model is prior to installing, model cards now show the model’s file size.
ModelCard (a class for containing model metadata) was refactored last week so that it doesn’t hold on to the JSON document, but rather serializes/deserializes only when downloading from HuggingFace or installing to disk.
I’ve started work on a top panel for the model manager UI, which will contain the controls for refreshing repos, searching and filtering, as well as manually adding a repo
In other news, Aldo Aguilar from the Interactive Audio Lab has been working on a Labeler effect built using EffectDeepLearning that will be capable of creating a label track with annotations for a given audio track. Possible applications of this effect include music tagging and speech-to-text, given that we can find pretrained models for both tasks.
Continue work on the top panel for the model manager UI.
Right now, the response content for deep models is all held in memory at once while installing. This causes an unnecessary amount of memory consumption. Instead we want to incrementally write the response data to disk.
Dmitry pointed out that the deep model’s forward pass is blocking the UI thread, since it can process large selections of audio at a time. Though a straightforward solution is to cut up the audio into smaller chunks, some deep learning models require a longer context window and/or are non-causal. I will spend more time investigating potential solutions to this.
Layout work for model manager UI. Right now, most elements look out of place. I haven’t spent as much time on this because I’d like to finish writing the core logic of the DeepModelManager before digging into the details of the UI.
This week’s focus will be potential bug fixes for the brush tool prototype, and planning for the next rollout, containing more exciting features that I would like to bring to the community!
Control panel for the brush tool
Instead of the temporary red button, I have implemented a non-modal dialog for the control panel. It took longer development time than I expected, since I would like to use a native way of implementing dialog in Audacity codebase. And I have used AttachedWindows and AttachedObjects for decoupling the dependencies between SpectralDataManager, SpectralDialog etc, so when users click on the brush tool icon, the dialog will be created on-demand.
The back-end for overtones and smart selection is yet to be completed, but I prefer to firstly setup the front-end for prototyping and gain early feedback from the team regarding the UI design.
More incoming features!
It came to the second stage of the GSOC program, there are two or more features that I would like to complete before the second evaluation. When I think about overtones selection and threshold re-selection, these are indeed similar features which based on smart selection. I would need to modify the existing SpectrumTransformer to consider more windows for calculation, in fact, I prefer to set a fixed length for the smart selection to function properly, since it seems rather inappropriate to take the whole track into calculation.
Variable brush size and preview
A slider front-end has been added to adjust the brush radius in real-time, it would be user-friendly if we can include the predicted radius to replace the existing cursor. However, the current cursor implementation takes a bitmap file and fixed size as input, we can’t simply increase the size and scale up the bitmap as the radius increases, a work-around will be adding empty cursor and draw the brush preview manually in real-time.
However, here comes another challenges with the rendering routine of UIHandle, it doesn’t necessary call Draw() when hovering, but we need to manually drag or click to make the drawing visible.
There aren’t many updates for this week. I spent the past week cleaning out bugs in the model manager related to networking and threading. I hit a block around Wednesday, when the deep learning effect stopped showing up on the Plugin Manager entirely. It took a couple of days for me to figure out , but I’m back on track now, and I’m ready to keep the ball rolling.
Fix a bug where download progress gauge appears in the bottom left corner of the ModelCardPanel, instead of on top of the install button.
Refactor ModelCard, so that we serialize // deserialize the internal JSON object only when necessary.
Add a top panel for the model manager UI, with the following functionality
Search through model cards
domain (music, speech, etc)
Task (separation, enhancement)
Other metadata keys
Manually add a huggingface repo
If a model is installed and there’s a newer version available, let the user know.
I’ve made progress on the Model Manager! Right now, all HuggingFace repositories with the tag “audacity” are downloaded and displayed as model cards (as seen below). If a user chooses to install a model, the model manager queries HuggingFace for the actual model file (the heavy stuff) and installs it into a local directory. This interface lets users choose from a variety of Deep Learning models trained by contributors around the world for a wide variety of applications.
It has been an exciting week for me, after the completion of the first brush tool prototype! Currently each windowed sample will be passed via SpectrumTransformer into the newly added SpectralDataManager, it checks against the previously selected data and zeros out all the selected frequency bin. (Link to the commit)
The brush tool demo
I have chosen “Killing me softly” by Roberta Flack (one of my all-time favorites!), snippet of four seconds has been extracted from the beginning. I have also added a meow sound to it since we all love cats and more importantly, it consists of pitch variation which cannot be effectively selected by the current tool (horizontal line)
To use the brush tool, we simply dragged through the meow sound and its overtones, and click the apply button afterwards, then the selected frequency content will be zeroed out.
The full demo video is available here (with before v.s. after audio comparison):
This is still, rather an early stage for this new feature, there are lots of potential improvements. For instance, we can definitely do better than zeroing out all the selected frequency bins, like average sampling from the non-selected windows (horizontally) or frequencies (vertically), or both!
Moreover, I would also like to make the selection smarter. For photo editing, say we were to remove or isolate subject from the background image, we would have prioritized and relied on tools like magic wand for picking up most of the desired area for us intelligently, then followed by the fine tuning using drawing tool. Being said, I hope that the tool will be able to guess and pick up user’s selection (or at least most of them), then the user can add/remove spectral data from the edges using brush tool.
A step even further will be picking up the overtones automatically for the user, during the “magic wand” stage. However, the overtones can be a bit tricky to calculate, since their shapes are kinda skewed in linear view and we need to take logarithmic scale as reference when performing the computation (User can edit in logarithmic view but we cannot easily “select view” for the computation). Without the introduction of advance area approximation algorithm, a possible way can be sliding the fundamental frequency area across the frequency bins that are close to its multiples, then we can estimate and spot the overtones by calculating their spectral energy similarity.
Though the focus of the project is on making a source separation effect, a lot of the code written for this effect has shown to be generic enough that it can be used with any deep-learning based audio processor, given that it meets certain input-output constraints. Thus, we will be providing a way for researchers and deep learning practitioners to share their source separation (and more!) models with the Audacity community.
The “Deep Learning Effect” infrastructure can be used with any PyTorch-based models that take a single-channel (multichannel optional) waveform, and output an arbitrary number of audio waveforms, which are then written to output tracks.
This opens up the opportunity to make available an entire suite of different processors, like speech denoisers, speech enhancers, source separation, audio superresolution, etc., with contributions from the community. People will be able to upload the models they want to contribute to HuggingFace, and we will provide an interface for users to see and download these models from within Audacity. I will be working with nussl to provide wrappers and guidelines for making sure that the uploaded models are compatible with Audacity.
I met with Ethan from the nussl team, as well as Jouni and Dmitry from the Audacity team. We talked about what the UX design would look like for using the Deep Learning effects in Audacity. In order to make these different models available to users, we plan on designing a package manager-style interface for installing and uninstalling deep models in Audacity.
I made a basic wireframe of what the model manager UI would look like:
Goals for this week:
Work on the backend for the deep model manager in audacity. The manager should be able to
Query HuggingFace for model repos that match certain tags (e.g. “Audacity”).
Keep a collection of these repos, along with their metadata.
Search and filter through the repos with respect to different metadata fields.
Be able to install and uninstall different models upon request.
This week I have performed several bug fixes and preparing for the last missing puzzle before the first evaluation – to perform FFT on selected frames, edit the spectral data and reverse it using IFFT, which the functions required have been modularized by Paul on the branch Noise-reduction-refactoring, where the SpectrumTransformer is introduced to the codebase.
Use EXPERIMENTAL flags on brush tool
The brush tool has now been refactored under the experimental flag EXPERIMENTAL_BRUSH_TOOL, which is a custom CMAKE flag during compile time, the features can now be safely merged to existing codebase, team members will be able to test the feature by simply reverse the flag. (Link to commit)
List of bug fixes
After applying the effect, the state is now properly appended to history stack, making the undo and redo possible. (Link to commit)
The trace of brush stroke will be distorted after undo and redo, it has now been fixed. (Link to commit)
The apply effect button will now be correctly triggered, even in waveform view. (Link to commit)
Rebase current brushtool branch and prepare for first deliverable!
The development of brush tool has now been rebased to Noise-reduction-refactoring, currently I am setting up new class called SpectrumDataManager, to encapsulate most of the spectral editing behind the scenes, with a worker class inherited from TrackSpectrumTransformer.
Privacy & Cookies Policy
These cookies are strictly necessary to provide you with the services available through our Website and to use some of its features, such as security or cookies consent storage. Because these cookies are strictly necessary to deliver the Website to you, you cannot refuse them.