Audio FFT

Supported bindings: ossia

It is pretty common for audio analysis tasks to need access to the audio spectrum.

However, this causes a dramatic situation at the ecosystem level: every plug-in ships with its own FFT implementation, which at best duplicates code for no good reason, and at worse may cause complex issues for FFT libraries which rely on global state shared across the process, such as FFTW.

With the declarative approach of Avendish, however, we can make it so that the plug-in does not directly depend on an FFT implementation: it just requests that a port gets its spectrum computed, and it happens automatically outside of the plug-in's code. This allows hosts such as ossia score to provide their own FFT implementation which will be shared across all Avendish plug-ins, which is strictly better for performance and memory usage.

Making an FFT port

This can be done by extending an audio input port (channel or bus) with a spectrum member.

For instance, given:

struct {
  float** samples{};
  int channels{};
} my_audio_input;

One can add the following spectrum struct:

struct {
  float** samples{};

  struct {
    float** amplitude{};
    float** phase{};
  } spectrum;

  int channels{};
} my_audio_input;

to get a deinterleaved view of the amplitude & phase:

spectrum.amplitude[0][4]; // The amplitude of the bin at index 4 for channel 0
spectrum.phase[1][2]; // The phase of the bin at index 2 for channel 1

It is also possible to use complex numbers instead:

struct {
  double** samples{};

  struct {
    std::complex<double>** bins;
  } spectrum;

  int channels{};
} my_audio_input;

spectrum.bins[0][4]; // The complex value of the bin at index 4 for channel 0

Using complex numbers allows to use the C++ standard math library functions for complex numbers: std::norm, std::proj...

Note that the length of the spectrum arrays is always N / 2 + 1, N being the current frame size. Note also that the FFT is normalized - the input is divided by the amount of samples.

WIP: an important upcoming feature is the ability to make configurable buffered processors, so that one can choose for instance to observe the spectrum over 1024 frames. Right now this has to be handled internally by the plug-in.

Windowing

A window function can be applied by defining an

enum window { <name of the window> };

Supported names currently are hanning, hamming. If there is none, there will be no windowing (technically, a rectangular window). The helper types described below use an Hanning window.

WIP: process with the correct overlap for the window size, e.g. 0.5 for Hanning, 0.67 for Hamming etc. once buffering is implemented.

Helper types

These three types are provided. They give separated amplitude / phase arrays.

halp::dynamic_audio_spectrum_bus<"A", double> a_bus_port;
halp::fixed_audio_spectrum_bus<"B", double, 2> a_stereo_port;
halp::audio_spectrum_channel<"C", double> a_channel_port;

Accessing a FFT object globally

See the section about feature injection to see how a plug-in can be injected with an FFT object which allows to control precisely how the FFT is done.

Example

#pragma once
#include <cmath>
#include <halp/audio.hpp>
#include <halp/controls.hpp>
#include <halp/fft.hpp>
#include <halp/meta.hpp>

namespace examples::helpers
{

/**
 * For the usual case where we just want the spectrum of an input buffer,
 * no need to template: we can ask it to be precomputed beforehand by the host.
 */
struct PeakBandFFTPort
{
  halp_meta(name, "Peak band (FFT port)")
  halp_meta(c_name, "avnd_peak_band_fft_port")
  halp_meta(uuid, "143f5cb8-d0b1-44de-a1a4-ccd5315192fa")

  // TODO implement user-controllable buffering to allow various fft sizes...
  int buffer_size = 1024;

  struct
  {
    // Here the host will fill audio.spectrum with a windowed FFT.
    // Option A (an helper type is provided)
    halp::audio_spectrum_channel<"In", double> audio;

    // Option B with the raw spectrum ; no window is defined.
    struct
    {
      halp_meta(name, "In 2");

      double* channel{};
      // complex numbers... using value_type = double[2] is also ok
      struct
      {
        std::complex<double>* bin;
      } spectrum{};
    } audio_2;
  } inputs;

  struct
  {
    halp::val_port<"Peak", double> peak;
    halp::val_port<"Band", int> band;

    halp::val_port<"Peak 2", double> peak_2;
    halp::val_port<"Band 2", int> band_2;
  } outputs;

  void operator()(int frames)
  {
    // Process with option A
    {
      outputs.peak = 0.;

      // Compute the band with the highest amplitude
      for(int k = 0; k < frames / 2; k++)
      {
        const double ampl = inputs.audio.spectrum.amplitude[k];
        const double phas = inputs.audio.spectrum.phase[k];
        const double mag_squared = ampl * ampl + phas * phas;

        if(mag_squared > outputs.peak)
        {
          outputs.peak = mag_squared;
          outputs.band = k;
        }
      }

      outputs.peak = std::sqrt(outputs.peak);
    }

    // Process with option B
    {
      outputs.peak_2 = 0.;

      // Compute the band with the highest amplitude
      for(int k = 0; k < frames / 2; k++)
      {
        const double mag_squared = std::norm(inputs.audio_2.spectrum.bin[k]);

        if(mag_squared > outputs.peak_2)
        {
          outputs.peak_2 = mag_squared;
          outputs.band_2 = k;
        }
      }

      outputs.peak_2 = std::sqrt(outputs.peak_2);
    }
  }
};

}