Skip to content

Keyword Spotting

Keyword Spotting vs Speed Recognition

Keyword Spotting Speed Recognition
Power-Usage Low High
Type Continuous
Location On-Device On-Device/
Online

Types

Single Shot Streaming
Only keyword spoken Keyword within a sentence

Challenges

Aspect Constraint Comment Metrics
System performance Latency Listening animation
Bandwidth
Preserving Security Safeguarding data being sent to cloud
Privacy
Model Accuracy Listen continuously, but only trigger at the right time

Pick operating point accordingly
Personalization Trigger only for user, not for other users or for background noise
Resource constraints Battery
Memory

Model

Spectrogram is just an image

TinyConv

Since we only we are only focused on recognizing a few keywords, we can just use One Conv2D followed by single dense layer

flowchart LR

Input --> Conv --> FC --> Softmax --> Output

Limitations

  • Limited vocabulary
  • Lower accuracy
  • Limited UX

Cascading

Multiple Inferences

  • Average inferences across multiple time slices

This is to avoid False Positives for group of words. For eg: - No - No good - Notion - Notice - Notable

Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow

Comments