We are excited to open-source NAMO-SSLM, small yet powerful real-time multi-modal. The AI landscape is shifting from massive, resource-intensive models to lightweight, optimized small models—and for good reason. Small models (like NAMO-SSLM) offer a compelling mix of efficiency, speed, and cost-effectiveness, making them the smarter choice for real-world applications.
Key research includes:
- Run on CPU: Run model real-time on consumer CPU devices.
- Multimodal (voice + vision): Native support for real-time speech and vision and OCR capabilites.
- Low Latency, Real-Time Processing: Real-time streaming support with end to end latency as low as 80ms.
- Multilingual Support: Enable multi-lang and hybrid language capabilities such as hing-lish.
- Multi-turn RAG: Supports multi-turn RAG to retrive rich context from while keeping conversation real-time
- Voiced + Silent Function / Tools Calling: Function calling support with silent voice with text as well as voice output.
[video]
21.03.2025: Announched Model Launch.
- Launch real-time vision+text modality
- Launch real-time speech modality