Skip to content

Introduction

Human Perception of Sound

Dataset

Building

  • Who are the users
  • What do they need
  • What task are they trying to solve
  • How do they interact with the system
    • Distance
    • Environment
      • Background Noise
      • Reverb
  • Quality Control
    • Only keep whatever a human can understand

Industry-Standard

  • Google Speed Commands dataset
    • Recorded as individual words, not sentences
    • 1000-4000 examples of each word

Good Characteristics of Model

Volume Invariance

Pre-Processing

What aspects of the signal should you sent to the neural network

  1. Align on start point
  2. Normalization of amplitude
  3. Denoise
  4. Convert to frequencies, using Fast Fourier transform
    1. Extract features
    2. Sliding window
  5. Cut on end point
Word Volume Waveform Spectrogram MFCC
Yes Loud
Quiet
No Loud
Quiet

Mel Filterbanks

Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow

Comments