Skip to content

IntroductionΒΆ

Last Updated: 3 months ago2024-12-26 ; Contributors: AhmedThahir, web-flow

Human Perception of SoundΒΆ

DatasetΒΆ

BuildingΒΆ

  • Who are the users
  • What do they need
  • What task are they trying to solve
  • How do they interact with the system
    • Distance
    • Environment
      • Background Noise
      • Reverb
  • Quality Control
    • Only keep whatever a human can understand

Industry-StandardΒΆ

  • Google Speed Commands dataset
    • Recorded as individual words, not sentences
    • 1000-4000 examples of each word

Good Characteristics of ModelΒΆ

Volume Invariance

Pre-ProcessingΒΆ

What aspects of the signal should you sent to the neural network

  1. Align on start point
  2. Normalization of amplitude
  3. Denoise
  4. Convert to frequencies, using Fast Fourier transform
    1. Extract features
    2. Sliding window
  5. Cut on end point
Word Volume Waveform Spectrogram MFCC
Yes Loud
Quiet
No Loud
Quiet

Mel FilterbanksΒΆ

Comments