We want to know from you: If you need help with Audacity, where do you go? How do you use our manual and other support options? Please answer this survey here: https://www.surveymonkey.com/r/MH3TMZX
It’ll be really helpful for us to know what you think!
This is second to last week of the GSOC program, I have finalized the majority of the new codes, and I have conducted more frequent meetings with Paul regarding the code review.
The over-computation
Currently, the brush stroke is calculated based on Bresenham’s algorithm based on mouse coordinate system, however, the data we collected will require more calculation than the FFT transform can handle, in other words, we have collected too much spectral data but only be able to process limited number of them. Therefore, the whole brush stroke calculation will need to be refactored to sampleCount hop vs frequency bins, so we will not be wasting computation power on the area between each Fourier transform window.
The code review
My mentor Paul has been reviewing my code and gave me extremely detailed and helpful comments starting from last week, some of them are just code styles/unused header imports, however, there are critical bug fixes that he has spotted and pointed out. And I am currently resolving his comments, the history of the conversations can be viewed in this PR link.
Before the final week
It is hoped that the transformation and re-factorization of the hop vs bin space will be completed before next week, so we can try to optimize the frequency snapping and launch it as soon as possible.
This week I have finished one additional feature, which is frequency snapping, this optional feature allows users to select the spectral data more accurately.
The frequency snapping
It is an optional feature, which is associated with the smart selection in the spectral editing dialog, it allows more precise selection from user, the brush stroke will be calculated and snap to the nearest frequency bins with highest spectral energy.
Preparing for PR and final review
Originally I have approximately 50+ commits, and it can be overwhelming for the code review, considering that some of the commits in between were obsoleted (already!), while some changes were reverting/refactoring the previous written codes. I have tried to rebase the whole branch and pick the important updates, reordering and combining multiple commits, and I have encountered quite a lot of conflicts that needed to be resolved.
Fixed several other bugs in the Model Manger and its UI (github).
To do
Start writing documentation for model contributors. The documentation should provide instructions on how to properly structure a HuggingFace repo for an audacity model, write a metadata file, and properly export the deep model to torchscript, ensuring that it meets the input/output constraints in Audacity.
Continue to fix open issues with the model manager.
Make ModelCards collapsible. Right now, only 2-3 can be shown on screen at a time. It may be a good idea to offer a collapsed view of the ModelCard.
Provide a hyperlink (or a more info button) that points to the model’s HuggingFace readme somewhere in the ModelCard panel, so users can view more information about the model online (e.g. datasets, benchmarks, performance, examples).
This week I have been working hard on adding a new feature called frequency snapping, I have also added other optimization of the brush tool.
The new cursor
For the old cursor, I have recycled the envelope cursor which doesn’t look good enough if we increase the radius of the brush, the new cursor will be positioned in the middle of the brush.
Crosshair cursor
Major change to brush stroke
In previous development, I have used Bresenham’s algorithm to draw thick line to mimic the brush stroke, which is not realistic and rough edges can be observed, I have modified the algorithm to draw fully-circular brush stroke.
The issue related to the download progress gauge appearing on the bottom corner has been fixed, though the size of the gauge itself still needs tweaking.
In order to let the user know how large a model is prior to installing, model cards now show the model’s file size.
ModelCard (a class for containing model metadata) was refactored last week so that it doesn’t hold on to the JSON document, but rather serializes/deserializes only when downloading from HuggingFace or installing to disk.
I’ve started work on a top panel for the model manager UI, which will contain the controls for refreshing repos, searching and filtering, as well as manually adding a repo
In other news, Aldo Aguilar from the Interactive Audio Lab has been working on a Labeler effect built using EffectDeepLearning that will be capable of creating a label track with annotations for a given audio track. Possible applications of this effect include music tagging and speech-to-text, given that we can find pretrained models for both tasks.
To do
Continue work on the top panel for the model manager UI.
Right now, the response content for deep models is all held in memory at once while installing. This causes an unnecessary amount of memory consumption. Instead we want to incrementally write the response data to disk.
Dmitry pointed out that the deep model’s forward pass is blocking the UI thread, since it can process large selections of audio at a time. Though a straightforward solution is to cut up the audio into smaller chunks, some deep learning models require a longer context window and/or are non-causal. I will spend more time investigating potential solutions to this.
Layout work for model manager UI. Right now, most elements look out of place. I haven’t spent as much time on this because I’d like to finish writing the core logic of the DeepModelManager before digging into the details of the UI.
This week’s focus will be potential bug fixes for the brush tool prototype, and planning for the next rollout, containing more exciting features that I would like to bring to the community!
Control panel for the brush tool
Instead of the temporary red button, I have implemented a non-modal dialog for the control panel. It took longer development time than I expected, since I would like to use a native way of implementing dialog in Audacity codebase. And I have used AttachedWindows and AttachedObjects for decoupling the dependencies between SpectralDataManager, SpectralDialog etc, so when users click on the brush tool icon, the dialog will be created on-demand.
The back-end for overtones and smart selection is yet to be completed, but I prefer to firstly setup the front-end for prototyping and gain early feedback from the team regarding the UI design.
More incoming features!
It came to the second stage of the GSOC program, there are two or more features that I would like to complete before the second evaluation. When I think about overtones selection and threshold re-selection, these are indeed similar features which based on smart selection. I would need to modify the existing SpectrumTransformer to consider more windows for calculation, in fact, I prefer to set a fixed length for the smart selection to function properly, since it seems rather inappropriate to take the whole track into calculation.
Variable brush size and preview
A slider front-end has been added to adjust the brush radius in real-time, it would be user-friendly if we can include the predicted radius to replace the existing cursor. However, the current cursor implementation takes a bitmap file and fixed size as input, we can’t simply increase the size and scale up the bitmap as the radius increases, a work-around will be adding empty cursor and draw the brush preview manually in real-time.
Large brush radius
Small brush radius
However, here comes another challenges with the rendering routine of UIHandle, it doesn’t necessary call Draw() when hovering, but we need to manually drag or click to make the drawing visible.
There aren’t many updates for this week. I spent the past week cleaning out bugs in the model manager related to networking and threading. I hit a block around Wednesday, when the deep learning effect stopped showing up on the Plugin Manager entirely. It took a couple of days for me to figure out , but I’m back on track now, and I’m ready to keep the ball rolling.
To do:
Fix a bug where download progress gauge appears in the bottom left corner of the ModelCardPanel, instead of on top of the install button.
Refactor ModelCard, so that we serialize // deserialize the internal JSON object only when necessary.
Add a top panel for the model manager UI, with the following functionality
Search through model cards
Filter by
domain (music, speech, etc)
Task (separation, enhancement)
Other metadata keys
Manually add a huggingface repo
If a model is installed and there’s a newer version available, let the user know.
I’ve made progress on the Model Manager! Right now, all HuggingFace repositories with the tag “audacity” are downloaded and displayed as model cards (as seen below). If a user chooses to install a model, the model manager queries HuggingFace for the actual model file (the heavy stuff) and installs it into a local directory. This interface lets users choose from a variety of Deep Learning models trained by contributors around the world for a wide variety of applications.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.