Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization Algorithm Implementation #26

Open
lyswhut opened this issue Jun 1, 2023 · 13 comments
Open

Optimization Algorithm Implementation #26

lyswhut opened this issue Jun 1, 2023 · 13 comments

Comments

@lyswhut
Copy link

lyswhut commented Jun 1, 2023

This is a cool project, I tried to use it, but found that it seems to causes high CPU usage, it would be even better if the algorithm can be improved :)

image
image

Other information:

@marcelblum
Copy link

I have worked with optimizing this on a private fork for my use in an electron app. Due to my use case I was able to optimize specifically for Chrome web audio quirks. For example Chrome has this bug but it can be worked around to prevent processing during silence by checking for if (inputs[0].length < 2) instead of

if (inputs[0].length && inputs[0][0].length == 0) {

and then can be further optimized by stopping processing all 0s altogether once the block has been filled and there's no more tail. There are also some low hanging fruit microoptimizations doable like using pitchFactor = parameters.pitchFactor[0] instead of doing the slightly extra work of
const pitchFactor = parameters.pitchFactor[parameters.pitchFactor.length - 1];

on every process (with k-rate automation all values in the array should be the same). Then you can make some more assumptions to optimize, like for example if you know in advance all input and output will be 2 channels then you don't have to reallocateChannelsIfNeeded() on every process. With things like this done I have it working reasonably well with a 4096 block size and near 0% cpu use on silence, albeit with some unavoidable latency, and it still glitches if I try to push it with too many of these nodes processing simultaneously.

I think this is as good as it can get using pure JS, and taking it to the next level would be to use WASM for FFT calculation like this one. But I'm not sure if there would be a net gain after the potential latency hit of passing all of that data back and forth between the audioWorklet and the wasm module and if it's feasible for realtime use.

@lyswhut
Copy link
Author

lyswhut commented Jun 2, 2023

I am also using it for electron now,
if (inputs[0].length < 2) doesn't work for me when the player is paused, it's still 2 (Electron v22.3.12, Windows 10) 🙁

@marcelblum
Copy link

marcelblum commented Jun 2, 2023

I'm using electron 22 as well. I'm not sure what you mean by "when the player is paused". In my tests I found that if no audio is coming through (the input is "inactive" in web audio parlance) then inputs[0] is either empty or contains 1 channel of silent audio data (this is erroneous behavior on the part of Chromium and not to spec), but you might want to make sure you're cleaning up used one-shot source nodes, disconnecting nodes that are no longer in use etc. to achieve this consistently. But you can also add an explicit check for silence in inputs if needed, though it feels inefficient and I've tried to avoid doing this, it can be necessary due to this chrome bug, e.g.:

const checkForNotSilence = (value) => value !== 0;
//...
if (inputs[0][0].some(this.checkForNotSilence) || inputs[0][1].some(this.checkForNotSilence)) { //assumes 2 channel input
  //do process
} else {
  //don't process
}

Just keep in mind the above example is oversimplified, because you still need to handle the tail in cases where you still need to process a larger block that contains partial silent buffer(s), since the worklet block size is larger than the web audio "render quantum", hence the latency, but this is necessary for high quality output.

I'm attaching my "optimized for stereo input in chromium" fork here you're welcome to try it, it's not secret I just haven't submitted any PRs to this repo because it's mostly specialized for this use case, though probably some of the optimizations could probably be applied to the main package to benefit all users.

@marcelblum
Copy link

phase-vocoder.zip

@lyswhut
Copy link
Author

lyswhut commented Jun 3, 2023

Thanks for sharing the fork :)

I created a demo in gist using Electron Fiddle, you can load it using the link below:
https://gist.github.com/lyswhut/5f899a8aad24c578c27970c7f805d242

Now the player is not playing, but inputs[0].length is still 2:
image

@marcelblum
Copy link

I see, you're using an <audio> player via createMediaElementSource. Try explicitly disconnecting mediaSource on pause and reconnecting on play every time to get the desired behavior. Also using my fork IIRC you must force 2 channel output for the worklet node using {outputChannelCount: [2]} because I added an assumption for that as an optimization to avoid checking for channel count changes on every process. Here's a rewrite of your gist's renderer.js incorporating these changes:

let audio
let audioContext
let mediaSource
let pitchShifterNode
let pitchShifterNodePitchFactor


const initAudio = async() => {
    audio = new Audio()
    audio.controls = false
    audio.autoplay = true
    audio.preload = 'auto'
    audio.crossOrigin = 'anonymous'

    audioContext = new window.AudioContext()
    mediaSource = audioContext.createMediaElementSource(audio)

    // Load audio worklet module
    return audioContext.audioWorklet.addModule('./phase-vocoder.js').then(() => {
    // return audioContext.audioWorklet.addModule('./origin-phase-vocoder.js').then(() => {
        console.log('pitch shifter audio worklet loaded')
        pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor', {outputChannelCount: [2]})
        let pitchFactorParam = pitchShifterNode.parameters.get('pitchFactor')
        if (!pitchFactorParam) return
        pitchShifterNodePitchFactor = pitchFactorParam
        // Connect node
        pitchShifterNode.connect(audioContext.destination)
    })
}

const dom_input_audio_src = document.getElementById('input_audio_src')
const dom_btn_play = document.getElementById('btn_play')
dom_btn_play.disabled = true
dom_input_audio_src.value = 'https://raw.githubusercontent.com/lyswhut/test-load-local-file/master/music2.mp3'

initAudio().then(() => {
    audio.addEventListener('playing', () => {
        dom_btn_play.innerText = 'Pause'
    })
    audio.addEventListener('pause', () => {
        dom_btn_play.innerText = 'Play'
    })
    dom_btn_play.disabled = false

    dom_btn_play.addEventListener('click', () => {
        if (audio.paused) {
            mediaSource.connect(pitchShifterNode);
            if (audio.src) {
                audio.play()
                return
            } else {
                dom_btn_play.innerText = 'Loading...'
                audio.src = dom_input_audio_src.value
            }
        } else {
            audio.pause()
            mediaSource.disconnect(pitchShifterNode);
        }
    })
})

@lyswhut
Copy link
Author

lyswhut commented Jun 3, 2023

Cool, it works!
I changes:

 let audio
 let audioContext
 let mediaSource
 let pitchShifterNode
 let pitchShifterNodePitchFactor
 
 
 const initAudio = async() => {
     audio = new Audio()
     audio.controls = false
     audio.autoplay = true
     audio.preload = 'auto'
     audio.crossOrigin = 'anonymous'
 
     audioContext = new window.AudioContext()
     mediaSource = audioContext.createMediaElementSource(audio)
 
     // Load audio worklet module
     return audioContext.audioWorklet.addModule('./phase-vocoder.js').then(() => {
     // return audioContext.audioWorklet.addModule('./origin-phase-vocoder.js').then(() => {
         console.log('pitch shifter audio worklet loaded')
-        pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor')
+        pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor', { outputChannelCount: [2] })
         let pitchFactorParam = pitchShifterNode.parameters.get('pitchFactor')
         if (!pitchFactorParam) return
         pitchShifterNodePitchFactor = pitchFactorParam
         
         // Connect node
-        mediaSource.connect(pitchShifterNode)
         pitchShifterNode.connect(audioContext.destination)
     })
 }
 
 const dom_input_audio_src = document.getElementById('input_audio_src')
 const dom_btn_play = document.getElementById('btn_play')
 dom_btn_play.disabled = true
 dom_input_audio_src.value = 'https://raw.githubusercontent.com/lyswhut/test-load-local-file/master/music2.mp3'
 
+let isConnected = false
+const connectNode = () => {
+  if (isConnected) return
+  mediaSource.connect(pitchShifterNode)
+  isConnected = true
+}
+const disconnectNode = () => {
+  if (!isConnected) return
+  mediaSource.disconnect()
+  isConnected = false
+}
 initAudio().then(() => {
+    audio.addEventListener('playing', connectNode)
+    audio.addEventListener('pause', disconnectNode)
+    audio.addEventListener('waiting', disconnectNode)
+    audio.addEventListener('emptied', disconnectNode)
+
     audio.addEventListener('playing', () => {
         dom_btn_play.innerText = 'Pause'
     })
     audio.addEventListener('pause', () => {
         dom_btn_play.innerText = 'Play'
     })
     dom_btn_play.disabled = false
 
     dom_btn_play.addEventListener('click', () => {
         if (audio.paused) {
             if (audio.src) {
                 audio.play()
                 return
             } else {
                 dom_btn_play.innerText = 'Loading...'
                 audio.src = dom_input_audio_src.value
             }
         } else {
             audio.pause()
         }
     })
 })

According to the test, the block size needs to be at least 4096 so that the sound will not be distorted. After applying this fork, the CPU usage is reduced, and the CPU usage is minimized when the audio is paused. I think that if we want to optimize it significantly, we need to use WASM to Make the conversion work, from this post it works.

Thanks for your help! ❤️

@jeff-shell
Copy link

CPU load is definitely high, I think a wasm conversion could help but I am not sure how much. I created an app that can run up to 4 instances of the vocoder worklet simultaneously, but performance is atrocious on mobile devices (somewhat expected). I know very little about WASM, but I wonder if a thoughtful AssemblyScript rewrite of fft.js and the vocoder could be feasible. If anyone is familiar with this, would be good to know!

@marcelblum
Copy link

@jeff-shell I've been hoping to experiment with this some more when I get a chance. Rewriting fft.js shouldn't be necessary rather refactoring to use an existing Wasm FFT implementation (there are several) seems like the way to go. Actually ideally would be to do all the work in a Wasm module of course. There are a couple timestretch Wasm projects already out there deserving of attention: https://signalsmith-audio.co.uk/code/stretch/ and https://bungee.parabolaresearch.com/

@jeff-shell
Copy link

jeff-shell commented Oct 10, 2024

@marcelblum Discovered this along the way that might help optimize. From the most recent Web Audio API docs regarding renderSizeHint: https://webaudio.github.io/web-audio-api/#AudioContextOptions

This update came as a response to suggestions to make the block size selection more flexible. For real-time manipulation, a 128-frame render quantum size introduces quite a bit of overhead compared to many non-web based audio processing tools that utilize 1024 or even 2048.

Example: If you have a 48000hz buffer and use a 128 block size, 48000/128 = fixed overhead incurred 375 times per second. Block size of 1024 by contrast is 47 times per second. I think this change alone would decrease load quite a bit.

Unfortunately, while you can "suggest" a fixed value, there isn't a guarantee that it will be honored by the browser. There is also no way to directly access/store this attribute because you are just providing a "hint." I suppose you could create a counter to measure calls from the AudioWorkletProcessor's process() method, but that would be pretty hacky. It looks like the block size is explicitly defined in your module, so this could be an issue. Worth noting though.

Separately you can also manually resample buffers, I did test this and it helps a bit.

@jeff-shell
Copy link

Very informative thread: WebAudio/web-audio-api#2450

@marcelblum
Copy link

@jeff-shell seems like renderSizeHint is not really supported yet though, right? In any case while there are situations where being able to specify the render size is helpful, I'm not sure it would help here that much since there would still be the same amount of audio data that needs to be processed overall, a different render size would just split it up differently... Note that this library takes a blockSize option and handles concatenating the buffers and delaying processing them until the target block size has been collected, hence the latency. So yeah that bit of work could be avoided with a known preset renderSizeHint but that doesn't change that the FFT calculations are the most expensive part of the process. BTW lowering the blockSize lowers CPU use but also lowers quality.

@jeff-shell
Copy link

jeff-shell commented Oct 12, 2024

@marcelblum I am not sure about how much support it has currently tbh. True that the same amount of audio data is being processed, but by specifying a larger block size, this gives the CPU cache more data at a time, leading to less frequent "interruptions" and incursions of overhead (moving data between memory allocs, setting up buffer for processing, dispatching processing work, running node graph). This helps decrease likelihood of buffer underruns at high CPU/memory usage (under runs is the problem I have encountered).

This is my understanding of it anyway but I might be wrong, still learning about all of this! And yeah, overhead might be a secondary consideration to the overall FFT expense though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants