GSOC 2021 with Audacity – Week 5

It has been an exciting week for me, after the completion of the first brush tool prototype! Currently each windowed sample will be passed via SpectrumTransformer into the newly added SpectralDataManager, it checks against the previously selected data and zeros out all the selected frequency bin. (Link to the commit)

The brush tool demo

I have chosen “Killing me softly” by Roberta Flack (one of my all-time favorites!), snippet of four seconds has been extracted from the beginning. I have also added a meow sound to it since we all love cats and more importantly, it consists of pitch variation which cannot be effectively selected by the current tool (horizontal line)

To use the brush tool, we simply dragged through the meow sound and its overtones, and click the apply button afterwards, then the selected frequency content will be zeroed out.

The full demo video is available here (with before v.s. after audio comparison):

https://drive.google.com/file/d/1bQJGncHWj_GqD19LOPeEp_og3j70akw8/view?usp=sharing

What’s next?

This is still, rather an early stage for this new feature, there are lots of potential improvements. For instance, we can definitely do better than zeroing out all the selected frequency bins, like average sampling from the non-selected windows (horizontally) or frequencies (vertically), or both!

Moreover, I would also like to make the selection smarter. For photo editing, say we were to remove or isolate subject from the background image, we would have prioritized and relied on tools like magic wand for picking up most of the desired area for us intelligently, then followed by the fine tuning using drawing tool. Being said, I hope that the tool will be able to guess and pick up user’s selection (or at least most of them), then the user can add/remove spectral data from the edges using brush tool.

A step even further will be picking up the overtones automatically for the user, during the “magic wand” stage. However, the overtones can be a bit tricky to calculate, since their shapes are kinda skewed in linear view and we need to take logarithmic scale as reference when performing the computation (User can edit in logarithmic view but we cannot easily “select view” for the computation). Without the introduction of advance area approximation algorithm, a possible way can be sliding the fundamental frequency area across the frequency bins that are close to its multiples, then we can estimate and spot the overtones by calculating their spectral energy similarity.