Skip to content

VAD not emitting UserStoppedSpeaking Frame, causing the bot to stuck and not respond #1493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sphatate opened this issue Apr 1, 2025 · 7 comments
Assignees

Comments

@sphatate
Copy link

sphatate commented Apr 1, 2025

We are using VAD -> STT -> LLM -> TTS architecture.

Lot of time we have observer that bot don't respond, after debugging found that VAD is not emitting UserStoppedSpeaking Frame.

Further debugging made me realize that the issue is caused because

if (
self._vad_state == VADState.STOPPING
and self._vad_stopping_count >= self._vad_stop_frames
):
self._vad_state = VADState.QUIET
self._vad_stopping_count = 0

In above condition _vad_stopping_count is always less than _vad_stop_frames, and i am not understanding the reason as to why is this happening. We have kept the stop_secs to 0.8, so

_vad_stop_frames value is ~35
and _vad_stopping_count always stop increment at range 10 to13 (i.e it stops incrementing above 13) due to which the condition is never satisfied.

@sphatate sphatate changed the title VAD not emitting UserStoppedSpeaking Frame VAD not emitting UserStoppedSpeaking Frame, causing the bot to stuck and not respond Apr 1, 2025
@markbackman
Copy link
Contributor

Two questions:

  • What are your VAD settings?
  • What version of Pipecat?

In 0.0.57, we added handling for the case where the VAD doesn't fire but a TranscriptionFrame is received. This will result in a completion occurs.

In my experience, this works robustly.

@sphatate
Copy link
Author

sphatate commented Apr 3, 2025

Hi @markbackman

these are our vad settings

confidence = 0.5
start_secs = 0.2
stop_secs = 0.8
volume=0.5

We are using pipecat version 0.0.60

@sphatate
Copy link
Author

sphatate commented Apr 3, 2025

Is it something to do with deepgram or azure transcriber, we are facing this issue with both.

Also this is not happening with phone calls, rather this is happening with WebSocket when we do web-calls

@markbackman
Copy link
Contributor

This is not a known issue. It sounds like something in your pipeline might be blocking the UserStoppedSpeakingFrame. Few questions:

  • Have you customized any parts of Pipecat?
  • Do you have any wrappers around the services (STT, LLM, or TTS)
  • Do you have any custom processors in your pipeline?

If yes to any of those, please make sure you're pushing frames down the pipeline in all cases.

@markbackman
Copy link
Contributor

@sphatate any update? Otherwise, I'll close the issue.

@markbackman markbackman self-assigned this Apr 12, 2025
@sphatate
Copy link
Author

I have not customized anything, this is only happening in wesocket when using for web calls. We are not facing this issue with phone call on twilio

@markbackman
Copy link
Contributor

Do you have a single file repro that you can share? Also with what to look for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants