Rethinking the Manual

Hello

In this post, I want to introduce our plans to create a new online manual, which we intend to replace the current one found here.

First, it is worth explaining why we want to replace the existing manual, which has been meticulously kept up-to-date by a handful of dedicated contributors. The existing system is built on MediaWiki, which unfortunately comes with a lot of trouble attached that limits its overall usefulness: For example, due to spam problems, it has not been open to contributions from the public for a long time. As a result, the number of active editors has been very small (around 3 or 4 people). Consequently, it has also not been translated in quite a long time, which we feel is a big disadvantage.

Secondly, the manual was being packaged with the installer, which meant that it needed to be completely up-to-date prior to a release. Apart from being an unnecessary release blocker, this also meant that any corrections or optimisations to the manual could only be published whenever a new version of Audacity was released.

Thirdly, it was written to serve a dual-purpose: being both a developer reference as well as a user manual, which made it quite complicated – a problem exacerbated by not having a search function. As a result, around 90% of Audacity users were using web search engines such as Google to get help with Audacity, where more user-focused content from third parties massively out-performed our manual (the average manual page gets about 180 clicks per month). This is significantly lower than what we expect from an app as widely used as Audacity is. 

For the above reasons, we have decided that our resources are overwhelmingly better spent on creating a new manual that attempts to get users up to speed as easily as possible. This new manual will be called Audacity Support. 

With that in mind, we want to accomplish the following goals with Audacity Support: 

  • It should be user-focused. That means it should be easy to read and friendly, giving as much information as necessary to achieve a given task, without being either too vague or too detailed. It also should be search optimized, helping users to find what they’re looking for quickly.
  • It should be continuously editable by anyone and not be a release-blocker anymore. Although this means that pages will sometimes be ‘out of date’ at the time of a new release (although we’ll seek to avoid that as much as possible), we feel it is a price worth paying to achieve the goal of a vibrant community of contributors and translators.
  • In addition to written content, the manual should prominently be accompanied by video tutorials when appropriate. We intend to encourage the already large video tutorial community to create content that fits in with the task-focused structure of the manual. This would be a perfectly symbiotic relationship, since the manual will provide them an additional source of views.
  • Crucially, Audacity Support will also be translatable. Mediawiki can do this with great effort, but we’d like to have a system that supports translations more easily.
  • Readers should be able to search for content in Audacity Support itself, and be able to download a PDF version of it should they need access to an offline copy.

The current plan is to host Audacity Support on Gitbook. Gitbook has inherent benefits over MediaWiki for our purposes in several areas: 

For Contributors, Gitbook has a nice visual editor (as opposed to MediaWiki’s plain-text/wiki-syntax editor). It also can sync to Github, so if you prefer working with a local markdown or text editor, you can do that, too. This Github integration also will enable translations down the line (several translation tools integrate with it easily), but since we’re starting fresh, we’ll hold off on translations until the English version has reached a somewhat decent level of maturity. Video tutorial creators will be able to easily embed their content on relevant pages.

There may be individual pages from the old manual which make sense to be ported over to Audacity Support, but generally, we want a fresh start, with the developer reference bits of the old manual not getting ported over. Speaking of which, the old manual would stay as-is and will not get updated or included in the installer past 3.1.x.

As a preview, you can view the Gitbook page on https://audacity.gitbook.io/audacity/. Please bear in mind that we have only just started the process of populating it with content and it still contains a lot of unfinished material. While we are not yet calling for contributions, you can get editor access on https://audacityteam.org/gitbook-access. We will welcome anyone who wants to take part in defining the overall structure and style.

We would be very interested to hear your suggestions on how we can make further improvements to this plan. You can send them in our Forum, or in our new discord server.

Audacity 3.1 is out now!

Watch the release video now

We’re happy to announce that Audacity 3.1 has been released. This release focuses on making audio editing easier. The key improvements are:

  • Added clip handle bars, allowing you to move audio clips around more easily
  • Added smart clips, a way to non-destructively trim clips
  • Reworked the looping feature.

You can download Audacity 3.1 for Windows, macOS and Linux on audacityteam.org/download.

GSoC 2021 Success!

This week marked the end of the Google Summer of Code (GSoC) program for 2021, which saw over 1200 students work on over 200 open source projects. This year, 2 students joined us at Audacity, and we are happy to report that both completed their projects successfully! The projects were as follows:

Source Separation

Hugo Flores Garcia, mentored by Dmitry Vedenko, implemented a deep learning AI tool that, given an appropriately trained model, is able to take an audio track with multiple sound sources (e.g. a combined “singer + piano” track) and splits it into multiple tracks, with one track for each source (i.e. a “singer” track and a “piano” track). This opens up a whole variety of interesting use cases, including karaoke and background noise removal. You can learn more about the Source Separation project in Hugo’s blog.

Spectral Editing

Edward Hui, mentored by Paul Licameli, implemented the ability to edit audio tracks by drawing on the spectogram rather than the waveform as is usually the case in Audacity. He also implemented smart selection tools to automatically select regions of contiguous “colour” on the spectogram, and to select overtones (harmonics) in addition to the fundamental frequency. Spectral editing is useful for removing unwanted sounds and background noises without distorting the main part of the audio signal. You can learn more about the Spectral Editing project in Edward’s blog.

Next steps

We will continue to work with the students over the coming weeks to make the final touches necessary to get their code merged into the program, at which point it will become available in GitHub Actions builds of the master branch and a subsequent stable release of Audacity.

GSOC 2021 with Audacity – Work product – Spectral editing tool

Hello all, it has been a rewarding summer working with the passionate audio lovers, and I am happy to share about the finalized work product here, which is a multi-featured spectral editing tool. Here’s the link for my previous works:

Weekly dev blog: https://www.audacityteam.org/category/gsoc/gsoc-2021-spectral-selection/

Pull request: https://github.com/audacity/audacity/pull/1193#issue-680302149

The mid-term prototype

Full demo available here

New feature – frequency snapping

The frequency snapping was introduced to help users pick the spectral selection more precisely, even with unsteady cursor movement. It evaluates the real-time dragging frequency bin and perform searching vertically during Fast Fourier Transform, and automatically pick the frequency bins with highest spectral energy. Not only does it provide more reliable way of spectral editing, but also yields a better spectral editing result, which minimizes the damage to the original spectrogram.

Demo video

New features – overtones selection

To achieve a good removal result of some of the unwanted sounds from the original track requires multiple editing, since most of the noises observed consist of the fundamental frequency (f0) and the overtones. Hand picking these overtones can be repetitive, especially for wind or brass instruments, which generates more overtones than general instruments. This feature is introduced to help picking these overtone automatically, user simply need to drag over the fundamental frequency, the overtones will be approximated and chosen.

It works similarly as the smart selection (frequency snapping), instead it takes a step forward and check for the multiples of the f0 for the similar spectral energy.

The technical summary

BrushHandle (The front-end)

It is inherited from the UIHandle class, and this is more like the “front-end” of our tool, the place where we interact with cursor coordinates, convert them to sample count hops and frequency bins, then we will be using Bresenham’s line and circle drawing algorithm to add the spectral data to the backend.

Our screen consists of limited pixels, meaning that it is impossible to draw pixel-perfect line, or even circle! The algorithm mentioned is critical for simulating the stroke of the brush, or the selection will be barely usable since it’s just an one-pixel thin line. The idea of the algorithm is simple, it check for the x vs y differentiation and pick the next coordinates based on the accumulated errors.

Apart from above algorithm, this class is also responsible for adapting the selection according to the zoom level. Since the user may want to zoom in the spectrogram and make a finer selection, the brush tool should be able to detect and adjust the selection later when users zoom out! Initially I have stored the spectral data in absolute mouse coordinates and that will not be able to scale up and down and it was later modified to sample count and frequency.

Lastly, it stores extra parameters like frequency snapping and overtone threshold, brush radius etc.

SpectralDataManager (The back-end)

This is the core of the calculation where the magic happens, it is partially inherited from the SpectrumTransformer, a class rewritten by my mentor Paul Licameli, to handle common transformations and calculations of FFT and IFFT. The entry point of these methods (ProcessTracks(), FindFrequencySnappingBin(), FindHighestFrequencyBins()) are static methods, and ultimately the calculation will be completed in another static methods with the Processor suffix.

Noted for these processor, the completed Fourier Transform coefficients can be considered as black-box for them, whereas they are exposed to single window of data only.

SpectralDataDialog (The GUI front-end)

This class is rather an interesting class for me, it inherits from the wxWidget UI components. Comparing to conventional C++ workflow, this class works more like the asynchronized JavaScript for me, it binds methods with events, which is broadcasted / received as global state. On top of this events-trigger system, there is another factory that is used to optimize the dependency management, we can statically attach object or even GUI window to the factory and use it whenever necessary, it helps to tackle some of the common problems like cycling dependencies.

This is the existing control panel for the brush tool, where we can select “smart selection”, “overtones selection” and adjust the brush radius using the slider.

What’s next?

First I need to give a big shout out to my mentor Paul Licameli, who has been an extremely passionate mentor and experienced C++ developers, he has been continuously providing assistance to me from high level architectural design to the lower level bug fixes suggestions, I would also like to thank you the Audacity team for arranging the program and the assistance provided!

I will be finishing the code review with Paul before the official end of the GSOC program, it is hoped that the frequency snapping and overtones can then be optimized. Afterwards, I will rebase the current branch onto the master and hopefully the tool will be merged and be available in the next release of Audacity.

GSoC 2021 – Work Product – Source Separation and Deep Learning Tools

Hi all! Google Summer of Code has wrapped up, and I have mostly completed my work contributing a Source Separation effect and a deep learning toolkit for Audacity. I still have code review fixes to address, but the code is in a fully functional state, and all the proposed features have been completed.

Code Changes

You can view the commit history and code reviews on the Pull Request I submitted to the main Audacity repo.

More Links

Here are links to more information on this project:

Work Product Summary

  • Deep Learning Effects
    • EffectSourceSep: A built-in effect for performing source separation in Audacity. While this effect is technically able to do more than just source separation (the internal effect functions as a generic deep learning processor that can produce a multi-track output given a single-track input), it is branded as Source Separation, as we expect the majority of model contributions to be focused on source separation. 
    • EffectDeepLearning: A base class for a built-in effect that uses PyTorch models. EffectDeepLearning takes care of data type conversions between torch::Tensor and WaveTrack/WaveClip data types. 
    • (In Progress) EffectLabeler: With the help of Aldo Aguilar, we are hoping to contribute an effect capable of performing automatic track labeling. Such an effect would enable users to perform automatic speech-to-text transcription or annotation of different target sounds within a track.
  • Deep Learning Tools: an internal toolkit for managing and using deep learning models anywhere within Audacity. 
    • DeepModelManager: A class for fetching, downloading, installing, and uninstalling deep learning models from HuggingFace repositories.
    • DeepModel and ModelCard
      • DeepModel: a wrapper class for PyTorch models. Loads an internal resampling module, which is used for resampling input audio to the model’s sample rate, and resampling output audio back to Audacity’s project rate. Takes care of exception handling during if loading the model fails, as well as internal errors during the model’s forward pass. 
      • ModelCard: class for holding model metadata.
  • Deep Model Manager UI: GUI elements for interacting with deep learning models hosted in HuggingFace. 
    • ManagerToolsPanel: The top panel, as seen on the image above. Contains controls for exploring models in HuggingFace and importing them onto the Model Manager UI.
    • ModelCardPanel scroller: a scroller for navigating through the fetched models. Contains a short description of the model’s purpose, as well as a color-coded tag meant to inform the user of the model’s intended data domain (that is, models tagged with “music” are meant to be used with music data, while models that with “speech” are meant to be used with speech data). 
    • DetailedModelCardPanel: a detailed view for deep models. Contains a longer description, model sample rate, additional tags, and a button that links to the HuggingFace repo’s README file, for even more information on the model.

Future Work

  • Finish addressing code review this week
  • Extract internal deep learning utilities to lib-deeplearning
  • Open a PR that incorporates EffectLabeler for deep learning-based sound labeling and tagging within Audacity

Special thanks to Bryan Pardo, Ethan Manilow, and Aldo Aguilar from the Interactive Audio Lab, as well as Dmitry Vedenko from the Audacity team for all the helpful discussions and support I received throughout the project. I hope my contribution to Audacity provides the groundwork for a bringing a new wave of effects based on deep learning to the hands of audio users.

Source Separation – GSoC 2021 Week 9

Hi all!

GSoC is starting to wrap up, and I’ve created two project boards to finalize the last of the work that needs to be completed to wrap up the Deep Learning Tools project for Audacity. The first project board is concerned with pending bug fixes and enhancements for the internal functionality of the Deep Model Manager (see the github link). The second board is concerned with improving the UI for model selection (see the github link). All of the high priority tasks in the first project board are done, and I am planning to finish both project boards by the end of the week (with help from Aldo Aguilar in the interactive audio lab). 

The manager UI will contain a new detailed view for ModelCards that offers a link for opening the model in HuggingFace, as well as a longer description of the model within Audacity. Additionally, using colored domain tags should help users pick the right model with more ease. 

GSOC 2021 with Audacity – Week 9

This is second to last week of the GSOC program, I have finalized the majority of the new codes, and I have conducted more frequent meetings with Paul regarding the code review.

The over-computation

Currently, the brush stroke is calculated based on Bresenham’s algorithm based on mouse coordinate system, however, the data we collected will require more calculation than the FFT transform can handle, in other words, we have collected too much spectral data but only be able to process limited number of them. Therefore, the whole brush stroke calculation will need to be refactored to sampleCount hop vs frequency bins, so we will not be wasting computation power on the area between each Fourier transform window.

The code review

My mentor Paul has been reviewing my code and gave me extremely detailed and helpful comments starting from last week, some of them are just code styles/unused header imports, however, there are critical bug fixes that he has spotted and pointed out. And I am currently resolving his comments, the history of the conversations can be viewed in this PR link.

Before the final week

It is hoped that the transformation and re-factorization of the hop vs bin space will be completed before next week, so we can try to optimize the frequency snapping and launch it as soon as possible.

GSOC 2021 with Audacity – Week 8

This week I have finished one additional feature, which is frequency snapping, this optional feature allows users to select the spectral data more accurately.

The frequency snapping

It is an optional feature, which is associated with the smart selection in the spectral editing dialog, it allows more precise selection from user, the brush stroke will be calculated and snap to the nearest frequency bins with highest spectral energy.

Preparing for PR and final review

Originally I have approximately 50+ commits, and it can be overwhelming for the code review, considering that some of the commits in between were obsoleted (already!), while some changes were reverting/refactoring the previous written codes. I have tried to rebase the whole branch and pick the important updates, reordering and combining multiple commits, and I have encountered quite a lot of conflicts that needed to be resolved.

Source Separation – GSoC 2021 Week 8

Hi all! Here are some updates for this week. 

  • I cleaned up the commit history for the deep learning implementation and opened a pull request in the official audacity repo. 
  • Added a dialog for manually specifying a HuggingFace repo to fetch (github). 
  • Fixed a bug where ModelCards weren’t scrollable until the user manually resized the window (github).
  • Amended the download behavior so the downloaded model file is written to file incrementally, lowering memory consumption (github). 
  • Added sorting to ModelCard panels (github).
  • Fixed several other bugs in the Model Manger and its UI (github).

To do

  • Start writing documentation for model contributors. The documentation should provide instructions on how to properly structure a HuggingFace repo for an audacity model, write a metadata file, and properly export the deep model to torchscript, ensuring that it meets the input/output constraints in Audacity. 
  • Continue to fix open issues with the model manager. 
  • Make ModelCards collapsible. Right now, only 2-3 can be shown on screen at a time. It may be a good idea to offer a collapsed view of the ModelCard. 
  • Provide a hyperlink (or a more info button) that points to the model’s HuggingFace readme somewhere in the ModelCard panel, so users can view more information about the model online (e.g. datasets, benchmarks, performance, examples).