GitHub - anupsingh15/BEST-STD

About

This repository provides a fast and efficient speech tokenization approach using bidirectional Mamba for spoken term detection. The proposed method employs a speech tokenizer that generates speaker-agnostic tokens, ensuring consistent token sequences across different utterances of the same word. The repository includes the implementation, datasets, and pre-trained models.

Paper: BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection

Setup

Clone the Repository

git clone https://github.com/anupsingh15/BEST-STD.git
cd BEST-STD

Create a Virtual Environment

conda create -n best_std anaconda

Alternatively, you can replicate the Conda environment with the additional dependencies included:

conda env create -f environment.yml
conda activate best_std

Install Dependencies

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python -m pip install lightning
pip install vector-quantize-pytorch
pip install mamba-ssm
pip install causal-conv1d>=1.4.0
python -m pip install tslearn
pip install librosa

Usage

To train the model, run:

python main.py

To create the database, build the index, and perform retrieval, run:

python retrieval/std.py

For a demonstration of word tokenization, check the following Jupyter Notebook:

demo/word_tokenization.ipynb

Datasets & Pre-trained Models

Dataset: LibriSpeech Word Alignments
Pre-trained Models: Download from Google Drive

Citation

If you find our work useful, please cite:

@inproceedings{singh2025best,
  title={BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection},
  author={Singh, Anup and Demuynck, Kris and Arora, Vipul},
  booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}

🚀 Coming Soon

We are actively working on enhancing this method with new features and improvements. Stay tuned for upcoming upgrades, including:

More efficient tokens
Improved token consistency across different noise conditions
Faster inference speed
Support for additional languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
demo		demo
src		src
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Setup

Clone the Repository

Create a Virtual Environment

Install Dependencies

Usage

Datasets & Pre-trained Models

Citation

🚀 Coming Soon

About

Releases

Packages

Languages

License

anupsingh15/BEST-STD

Folders and files

Latest commit

History

Repository files navigation

About

Setup

Clone the Repository

Create a Virtual Environment

Install Dependencies

Usage

Datasets & Pre-trained Models

Citation

🚀 Coming Soon

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages