Meet our GSoC 2022 Contributor

Audacity is taking part in Google Summer of Code 2022 and this year we’re joined by a contributor on one project:

Waveform Rulers

This project will add the ability to display a logarithmic ruler for amplitude (measured in decibels) while still drawing the waveform according to a linear scale. This is useful because linear waveform graphs are more intuitive than logarithmic graphs, but audio engineers are accustomed to using decibel units to cover the wide range of amplitudes perceptible to humans. The GSoC project will also modify the timeline ruler, adding the option to use the musical units of beats and measures instead of the regular units of hours, minutes and seconds.

Contributor: Michael Papadopoulos
Mentor: Paul Licameli

You can learn more about Michael’s project by reading his introductory blog post and by following his weekly blog for regular progress updates.

Enhanced Ruler – GSoC 2022 Week 1

Hello everyone! Here are some updates on the current status of my GSoC Project. I have mainly taken this week to gain a better understanding of the Ruler code.

Accomplished this week:

  • Investigated the source code for the Ruler. Created a map of the functions and how they relate to each other.
  • Created a small project to better understand labelling. Separated the font of the label’s value from the units, allowing for more dynamic labelling.
  • Discussed the project with mentors and communicated goals and expectations.

To accomplish this upcoming week:

  • Implement a complete suite of options for a nonlinear ruler.
  • Continue talking to mentors and better understanding the code.

Audacity – Introduction to Enhanced Ruler Display Options

Hello, everyone. My name is Michael Papadopoulos, and I’m a computer science major at Rensselaer Polytechnic Institute. I’m grateful to have been selected to create a project for Audacity through Google Summer of Code. My mentor throughout this endeavor will be Paul Licameli.

The current state of rulers in Audacity

In Audacity, a ruler is any segment of the UI which measures a value on a scale using tick marks. Examples of rulers include the amplitude scale to the left of waveforms and the timeline.

In the current version of Audacity, only two types of rulers are supported: Linear and logarithmic. Developers have previously only been able to create new rulers if they could fall under one of these two categories. However, many long sought-after display options in Audacity need more flexibility than this. A prime example would be a linear decibel scale for the waveform display.

image
The linear dB scale in Audition. This allows users to view their waveforms in decibels while sidestepping the issues presented by a logarithmic scale described here.

Another example of a ruler that would require more customizable infrastructure is a timeline that displays beats / bars. However, creating better infrastructure would allow for powerful new features to be created well in the future.

The goal of the project

The most fundamental goal of the project will be to overhaul Audacity’s ruler infrastructure, allowing for all sorts of linear, nonlinear, and custom rulers. This will be a major benefit to future developers and users of Audacity. The potential for more user-friendly and useful display options that could be unlocked by this project cannot be overstated.

Some key challenges will have to be overcome for this project to be realized. This infrastructure must be very adaptable to fit into the variety of roles that rulers play throughout the software. All changes must be thoroughly tested so that they work in all cases, not just one. Decisions will have to be made when choosing how much old code to keep and how much to remove.

After implementing this, I will move on to creating some of the display options previously listed. I will first produce the linear dB scale and beat / bar timeline, as well as waveform display options described here. These will be incredibly helpful to all Audacity users, especially beginners and musicians who will be more comfortable using these options. Communication with the UI design team will be critical to this phase of the project. Once these options are created, I will continue to discuss and implement new features while time allows.

Once again, thank you to the Audacity team for selecting my project. I will be providing weekly updates throughout the coding period, which begins on June 13th. I will provide links to my project’s progress on GitHub as I go along. I hope to work closely with the community, and would appreciate feedback and suggestions for options as well.

Join Audacity for GSoC 2022!

We are delighted to announce that Audacity is participating as an open source organization in Google Summer of Code (GSoC) 2022. For those who are unaware, GSoC is an annual program where new contributors are paid a stipend by Google to write code for various organizations in open source.

Google Summer of Code logo
Google Summer of Code

Last year, two highly talented students successfully completed projects with us here at Audacity. This year, you don’t have to be a student to take part (students are still welcome though!) and there are a few other changes that potential candidates need to be aware of. Please see this page for more information about the program and how to apply to join Audacity as a GSoC contributor.

GSoC 2021 Success!

This week marked the end of the Google Summer of Code (GSoC) program for 2021, which saw over 1200 students work on over 200 open source projects. This year, 2 students joined us at Audacity, and we are happy to report that both completed their projects successfully! The projects were as follows:

Source Separation

Hugo Flores Garcia, mentored by Dmitry Vedenko, implemented a deep learning AI tool that, given an appropriately trained model, is able to take an audio track with multiple sound sources (e.g. a combined “singer + piano” track) and splits it into multiple tracks, with one track for each source (i.e. a “singer” track and a “piano” track). This opens up a whole variety of interesting use cases, including karaoke and background noise removal. You can learn more about the Source Separation project in Hugo’s blog.

Spectral Editing

Edward Hui, mentored by Paul Licameli, implemented the ability to edit audio tracks by drawing on the spectogram rather than the waveform as is usually the case in Audacity. He also implemented smart selection tools to automatically select regions of contiguous “colour” on the spectogram, and to select overtones (harmonics) in addition to the fundamental frequency. Spectral editing is useful for removing unwanted sounds and background noises without distorting the main part of the audio signal. You can learn more about the Spectral Editing project in Edward’s blog.

Next steps

We will continue to work with the students over the coming weeks to make the final touches necessary to get their code merged into the program, at which point it will become available in GitHub Actions builds of the master branch and a subsequent stable release of Audacity.

GSOC 2021 with Audacity – Work product – Spectral editing tool

Hello all, it has been a rewarding summer working with the passionate audio lovers, and I am happy to share about the finalized work product here, which is a multi-featured spectral editing tool. Here’s the link for my previous works:

Weekly dev blog: https://www.audacityteam.org/category/gsoc/gsoc-2021-spectral-editing/

Pull request: https://github.com/audacity/audacity/pull/1193#issue-680302149

The mid-term prototype

Full demo available here

New feature – frequency snapping

The frequency snapping was introduced to help users pick the spectral selection more precisely, even with unsteady cursor movement. It evaluates the real-time dragging frequency bin and perform searching vertically during Fast Fourier Transform, and automatically pick the frequency bins with highest spectral energy. Not only does it provide more reliable way of spectral editing, but also yields a better spectral editing result, which minimizes the damage to the original spectrogram.

Demo video

New features – overtones selection

To achieve a good removal result of some of the unwanted sounds from the original track requires multiple editing, since most of the noises observed consist of the fundamental frequency (f0) and the overtones. Hand picking these overtones can be repetitive, especially for wind or brass instruments, which generates more overtones than general instruments. This feature is introduced to help picking these overtone automatically, user simply need to drag over the fundamental frequency, the overtones will be approximated and chosen.

It works similarly as the smart selection (frequency snapping), instead it takes a step forward and check for the multiples of the f0 for the similar spectral energy.

The technical summary

BrushHandle (The front-end)

It is inherited from the UIHandle class, and this is more like the “front-end” of our tool, the place where we interact with cursor coordinates, convert them to sample count hops and frequency bins, then we will be using Bresenham’s line and circle drawing algorithm to add the spectral data to the backend.

Our screen consists of limited pixels, meaning that it is impossible to draw pixel-perfect line, or even circle! The algorithm mentioned is critical for simulating the stroke of the brush, or the selection will be barely usable since it’s just an one-pixel thin line. The idea of the algorithm is simple, it check for the x vs y differentiation and pick the next coordinates based on the accumulated errors.

Apart from above algorithm, this class is also responsible for adapting the selection according to the zoom level. Since the user may want to zoom in the spectrogram and make a finer selection, the brush tool should be able to detect and adjust the selection later when users zoom out! Initially I have stored the spectral data in absolute mouse coordinates and that will not be able to scale up and down and it was later modified to sample count and frequency.

Lastly, it stores extra parameters like frequency snapping and overtone threshold, brush radius etc.

SpectralDataManager (The back-end)

This is the core of the calculation where the magic happens, it is partially inherited from the SpectrumTransformer, a class rewritten by my mentor Paul Licameli, to handle common transformations and calculations of FFT and IFFT. The entry point of these methods (ProcessTracks(), FindFrequencySnappingBin(), FindHighestFrequencyBins()) are static methods, and ultimately the calculation will be completed in another static methods with the Processor suffix.

Noted for these processor, the completed Fourier Transform coefficients can be considered as black-box for them, whereas they are exposed to single window of data only.

SpectralDataDialog (The GUI front-end)

This class is rather an interesting class for me, it inherits from the wxWidget UI components. Comparing to conventional C++ workflow, this class works more like the asynchronized JavaScript for me, it binds methods with events, which is broadcasted / received as global state. On top of this events-trigger system, there is another factory that is used to optimize the dependency management, we can statically attach object or even GUI window to the factory and use it whenever necessary, it helps to tackle some of the common problems like cycling dependencies.

This is the existing control panel for the brush tool, where we can select “smart selection”, “overtones selection” and adjust the brush radius using the slider.

What’s next?

First I need to give a big shout out to my mentor Paul Licameli, who has been an extremely passionate mentor and experienced C++ developers, he has been continuously providing assistance to me from high level architectural design to the lower level bug fixes suggestions, I would also like to thank you the Audacity team for arranging the program and the assistance provided!

I will be finishing the code review with Paul before the official end of the GSOC program, it is hoped that the frequency snapping and overtones can then be optimized. Afterwards, I will rebase the current branch onto the master and hopefully the tool will be merged and be available in the next release of Audacity.

GSoC 2021 – Work Product – Source Separation and Deep Learning Tools

Hi all! Google Summer of Code has wrapped up, and I have mostly completed my work contributing a Source Separation effect and a deep learning toolkit for Audacity. I still have code review fixes to address, but the code is in a fully functional state, and all the proposed features have been completed.

Code Changes

You can view the commit history and code reviews on the Pull Request I submitted to the main Audacity repo.

More Links

Here are links to more information on this project:

Work Product Summary

  • Deep Learning Effects
    • EffectSourceSep: A built-in effect for performing source separation in Audacity. While this effect is technically able to do more than just source separation (the internal effect functions as a generic deep learning processor that can produce a multi-track output given a single-track input), it is branded as Source Separation, as we expect the majority of model contributions to be focused on source separation. 
    • EffectDeepLearning: A base class for a built-in effect that uses PyTorch models. EffectDeepLearning takes care of data type conversions between torch::Tensor and WaveTrack/WaveClip data types. 
    • (In Progress) EffectLabeler: With the help of Aldo Aguilar, we are hoping to contribute an effect capable of performing automatic track labeling. Such an effect would enable users to perform automatic speech-to-text transcription or annotation of different target sounds within a track.
  • Deep Learning Tools: an internal toolkit for managing and using deep learning models anywhere within Audacity. 
    • DeepModelManager: A class for fetching, downloading, installing, and uninstalling deep learning models from HuggingFace repositories.
    • DeepModel and ModelCard
      • DeepModel: a wrapper class for PyTorch models. Loads an internal resampling module, which is used for resampling input audio to the model’s sample rate, and resampling output audio back to Audacity’s project rate. Takes care of exception handling during if loading the model fails, as well as internal errors during the model’s forward pass. 
      • ModelCard: class for holding model metadata.
  • Deep Model Manager UI: GUI elements for interacting with deep learning models hosted in HuggingFace. 
    • ManagerToolsPanel: The top panel, as seen on the image above. Contains controls for exploring models in HuggingFace and importing them onto the Model Manager UI.
    • ModelCardPanel scroller: a scroller for navigating through the fetched models. Contains a short description of the model’s purpose, as well as a color-coded tag meant to inform the user of the model’s intended data domain (that is, models tagged with “music” are meant to be used with music data, while models that with “speech” are meant to be used with speech data). 
    • DetailedModelCardPanel: a detailed view for deep models. Contains a longer description, model sample rate, additional tags, and a button that links to the HuggingFace repo’s README file, for even more information on the model.

Future Work

  • Finish addressing code review this week
  • Extract internal deep learning utilities to lib-deeplearning
  • Open a PR that incorporates EffectLabeler for deep learning-based sound labeling and tagging within Audacity

Special thanks to Bryan Pardo, Ethan Manilow, and Aldo Aguilar from the Interactive Audio Lab, as well as Dmitry Vedenko from the Audacity team for all the helpful discussions and support I received throughout the project. I hope my contribution to Audacity provides the groundwork for a bringing a new wave of effects based on deep learning to the hands of audio users.

Source Separation – GSoC 2021 Week 9

Hi all!

GSoC is starting to wrap up, and I’ve created two project boards to finalize the last of the work that needs to be completed to wrap up the Deep Learning Tools project for Audacity. The first project board is concerned with pending bug fixes and enhancements for the internal functionality of the Deep Model Manager (see the github link). The second board is concerned with improving the UI for model selection (see the github link). All of the high priority tasks in the first project board are done, and I am planning to finish both project boards by the end of the week (with help from Aldo Aguilar in the interactive audio lab). 

The manager UI will contain a new detailed view for ModelCards that offers a link for opening the model in HuggingFace, as well as a longer description of the model within Audacity. Additionally, using colored domain tags should help users pick the right model with more ease. 

GSOC 2021 with Audacity – Week 9

This is second to last week of the GSOC program, I have finalized the majority of the new codes, and I have conducted more frequent meetings with Paul regarding the code review.

The over-computation

Currently, the brush stroke is calculated based on Bresenham’s algorithm based on mouse coordinate system, however, the data we collected will require more calculation than the FFT transform can handle, in other words, we have collected too much spectral data but only be able to process limited number of them. Therefore, the whole brush stroke calculation will need to be refactored to sampleCount hop vs frequency bins, so we will not be wasting computation power on the area between each Fourier transform window.

The code review

My mentor Paul has been reviewing my code and gave me extremely detailed and helpful comments starting from last week, some of them are just code styles/unused header imports, however, there are critical bug fixes that he has spotted and pointed out. And I am currently resolving his comments, the history of the conversations can be viewed in this PR link.

Before the final week

It is hoped that the transformation and re-factorization of the hop vs bin space will be completed before next week, so we can try to optimize the frequency snapping and launch it as soon as possible.

GSOC 2021 with Audacity – Week 8

This week I have finished one additional feature, which is frequency snapping, this optional feature allows users to select the spectral data more accurately.

The frequency snapping

It is an optional feature, which is associated with the smart selection in the spectral editing dialog, it allows more precise selection from user, the brush stroke will be calculated and snap to the nearest frequency bins with highest spectral energy.

Preparing for PR and final review

Originally I have approximately 50+ commits, and it can be overwhelming for the code review, considering that some of the commits in between were obsoleted (already!), while some changes were reverting/refactoring the previous written codes. I have tried to rebase the whole branch and pick the important updates, reordering and combining multiple commits, and I have encountered quite a lot of conflicts that needed to be resolved.