Introduction
Human Perception of Sound
Dataset
Building
- Who are the users
- What do they need
- What task are they trying to solve
- How do they interact with the system
- Quality Control
- Only keep whatever a human can understand
Industry-Standard
- Google Speed Commands dataset
- Recorded as individual words, not sentences
- 1000-4000 examples of each word
Good Characteristics of Model
| |
Volume Invariance | |
| |
Pre-Processing
What aspects of the signal should you sent to the neural network
- Align on start point
- Normalization of amplitude
- Denoise
- Convert to frequencies, using Fast Fourier transform
- Extract features
- Sliding window
- Cut on end point
Word | Volume | Waveform | Spectrogram | MFCC |
Yes | Loud | | | |
| Quiet | | | |
No | Loud | | | |
| Quiet | | | |
Mel Filterbanks