-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization Algorithm Implementation #26
Comments
I have worked with optimizing this on a private fork for my use in an electron app. Due to my use case I was able to optimize specifically for Chrome web audio quirks. For example Chrome has this bug but it can be worked around to prevent processing during silence by checking for Line 93 in 841f37b
and then can be further optimized by stopping processing all 0s altogether once the block has been filled and there's no more tail. There are also some low hanging fruit microoptimizations doable like using pitchFactor = parameters.pitchFactor[0] instead of doing the slightly extra work ofLine 47 in 841f37b
on every process (with k-rate automation all values in the array should be the same). Then you can make some more assumptions to optimize, like for example if you know in advance all input and output will be 2 channels then you don't have to reallocateChannelsIfNeeded() on every process. With things like this done I have it working reasonably well with a 4096 block size and near 0% cpu use on silence, albeit with some unavoidable latency, and it still glitches if I try to push it with too many of these nodes processing simultaneously.
I think this is as good as it can get using pure JS, and taking it to the next level would be to use WASM for FFT calculation like this one. But I'm not sure if there would be a net gain after the potential latency hit of passing all of that data back and forth between the audioWorklet and the wasm module and if it's feasible for realtime use. |
I am also using it for electron now, |
I'm using electron 22 as well. I'm not sure what you mean by "when the player is paused". In my tests I found that if no audio is coming through (the input is "inactive" in web audio parlance) then const checkForNotSilence = (value) => value !== 0;
//...
if (inputs[0][0].some(this.checkForNotSilence) || inputs[0][1].some(this.checkForNotSilence)) { //assumes 2 channel input
//do process
} else {
//don't process
} Just keep in mind the above example is oversimplified, because you still need to handle the tail in cases where you still need to process a larger block that contains partial silent buffer(s), since the worklet block size is larger than the web audio "render quantum", hence the latency, but this is necessary for high quality output. I'm attaching my "optimized for stereo input in chromium" fork here you're welcome to try it, it's not secret I just haven't submitted any PRs to this repo because it's mostly specialized for this use case, though probably some of the optimizations could probably be applied to the main package to benefit all users. |
Thanks for sharing the fork :) I created a demo in gist using Electron Fiddle, you can load it using the link below: Now the player is not playing, but |
I see, you're using an let audio
let audioContext
let mediaSource
let pitchShifterNode
let pitchShifterNodePitchFactor
const initAudio = async() => {
audio = new Audio()
audio.controls = false
audio.autoplay = true
audio.preload = 'auto'
audio.crossOrigin = 'anonymous'
audioContext = new window.AudioContext()
mediaSource = audioContext.createMediaElementSource(audio)
// Load audio worklet module
return audioContext.audioWorklet.addModule('./phase-vocoder.js').then(() => {
// return audioContext.audioWorklet.addModule('./origin-phase-vocoder.js').then(() => {
console.log('pitch shifter audio worklet loaded')
pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor', {outputChannelCount: [2]})
let pitchFactorParam = pitchShifterNode.parameters.get('pitchFactor')
if (!pitchFactorParam) return
pitchShifterNodePitchFactor = pitchFactorParam
// Connect node
pitchShifterNode.connect(audioContext.destination)
})
}
const dom_input_audio_src = document.getElementById('input_audio_src')
const dom_btn_play = document.getElementById('btn_play')
dom_btn_play.disabled = true
dom_input_audio_src.value = 'https://raw.githubusercontent.com/lyswhut/test-load-local-file/master/music2.mp3'
initAudio().then(() => {
audio.addEventListener('playing', () => {
dom_btn_play.innerText = 'Pause'
})
audio.addEventListener('pause', () => {
dom_btn_play.innerText = 'Play'
})
dom_btn_play.disabled = false
dom_btn_play.addEventListener('click', () => {
if (audio.paused) {
mediaSource.connect(pitchShifterNode);
if (audio.src) {
audio.play()
return
} else {
dom_btn_play.innerText = 'Loading...'
audio.src = dom_input_audio_src.value
}
} else {
audio.pause()
mediaSource.disconnect(pitchShifterNode);
}
})
}) |
Cool, it works! let audio
let audioContext
let mediaSource
let pitchShifterNode
let pitchShifterNodePitchFactor
const initAudio = async() => {
audio = new Audio()
audio.controls = false
audio.autoplay = true
audio.preload = 'auto'
audio.crossOrigin = 'anonymous'
audioContext = new window.AudioContext()
mediaSource = audioContext.createMediaElementSource(audio)
// Load audio worklet module
return audioContext.audioWorklet.addModule('./phase-vocoder.js').then(() => {
// return audioContext.audioWorklet.addModule('./origin-phase-vocoder.js').then(() => {
console.log('pitch shifter audio worklet loaded')
- pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor')
+ pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor', { outputChannelCount: [2] })
let pitchFactorParam = pitchShifterNode.parameters.get('pitchFactor')
if (!pitchFactorParam) return
pitchShifterNodePitchFactor = pitchFactorParam
// Connect node
- mediaSource.connect(pitchShifterNode)
pitchShifterNode.connect(audioContext.destination)
})
}
const dom_input_audio_src = document.getElementById('input_audio_src')
const dom_btn_play = document.getElementById('btn_play')
dom_btn_play.disabled = true
dom_input_audio_src.value = 'https://raw.githubusercontent.com/lyswhut/test-load-local-file/master/music2.mp3'
+let isConnected = false
+const connectNode = () => {
+ if (isConnected) return
+ mediaSource.connect(pitchShifterNode)
+ isConnected = true
+}
+const disconnectNode = () => {
+ if (!isConnected) return
+ mediaSource.disconnect()
+ isConnected = false
+}
initAudio().then(() => {
+ audio.addEventListener('playing', connectNode)
+ audio.addEventListener('pause', disconnectNode)
+ audio.addEventListener('waiting', disconnectNode)
+ audio.addEventListener('emptied', disconnectNode)
+
audio.addEventListener('playing', () => {
dom_btn_play.innerText = 'Pause'
})
audio.addEventListener('pause', () => {
dom_btn_play.innerText = 'Play'
})
dom_btn_play.disabled = false
dom_btn_play.addEventListener('click', () => {
if (audio.paused) {
if (audio.src) {
audio.play()
return
} else {
dom_btn_play.innerText = 'Loading...'
audio.src = dom_input_audio_src.value
}
} else {
audio.pause()
}
})
}) According to the test, the block size needs to be at least 4096 so that the sound will not be distorted. After applying this fork, the CPU usage is reduced, and the CPU usage is minimized when the audio is paused. I think that if we want to optimize it significantly, we need to use WASM to Make the conversion work, from this post it works. Thanks for your help! ❤️ |
CPU load is definitely high, I think a wasm conversion could help but I am not sure how much. I created an app that can run up to 4 instances of the vocoder worklet simultaneously, but performance is atrocious on mobile devices (somewhat expected). I know very little about WASM, but I wonder if a thoughtful AssemblyScript rewrite of fft.js and the vocoder could be feasible. If anyone is familiar with this, would be good to know! |
@jeff-shell I've been hoping to experiment with this some more when I get a chance. Rewriting fft.js shouldn't be necessary rather refactoring to use an existing Wasm FFT implementation (there are several) seems like the way to go. Actually ideally would be to do all the work in a Wasm module of course. There are a couple timestretch Wasm projects already out there deserving of attention: https://signalsmith-audio.co.uk/code/stretch/ and https://bungee.parabolaresearch.com/ |
@marcelblum Discovered this along the way that might help optimize. From the most recent Web Audio API docs regarding This update came as a response to suggestions to make the block size selection more flexible. For real-time manipulation, a 128-frame render quantum size introduces quite a bit of overhead compared to many non-web based audio processing tools that utilize 1024 or even 2048. Example: If you have a 48000hz buffer and use a 128 block size, 48000/128 = fixed overhead incurred 375 times per second. Block size of 1024 by contrast is 47 times per second. I think this change alone would decrease load quite a bit. Unfortunately, while you can "suggest" a fixed value, there isn't a guarantee that it will be honored by the browser. There is also no way to directly access/store this attribute because you are just providing a "hint." I suppose you could create a counter to measure calls from the AudioWorkletProcessor's Separately you can also manually resample buffers, I did test this and it helps a bit. |
Very informative thread: WebAudio/web-audio-api#2450 |
@jeff-shell seems like |
@marcelblum I am not sure about how much support it has currently tbh. True that the same amount of audio data is being processed, but by specifying a larger block size, this gives the CPU cache more data at a time, leading to less frequent "interruptions" and incursions of overhead (moving data between memory allocs, setting up buffer for processing, dispatching processing work, running node graph). This helps decrease likelihood of buffer underruns at high CPU/memory usage (under runs is the problem I have encountered). This is my understanding of it anyway but I might be wrong, still learning about all of this! And yeah, overhead might be a secondary consideration to the overall FFT expense though. |
This is a cool project, I tried to use it, but found that it seems to causes high CPU usage, it would be even better if the algorithm can be improved :)
Other information:
The text was updated successfully, but these errors were encountered: