Skip to content

TinyML

Rather than adding more compute power, focus on improving compute efficiency

Will mainly focus on the following applications: Speech, Computer Vision, NLP

Topics

  • Hardware
  • Architecture & Dataflow
  • Metrics and Analysis
  • Efficiency
  • Micro-architecture/Circuits
  • Model Optimization
  • Quantization
  • Pruning
  • Knowledge distillation
  • AutoML
  • Software: Optimize DNN operations through software compilation/kernel implementations
  • Domain-specific compilers; eg: TVM
  • Kernel implementations
  • Mapping onto hardware
  • Systems
  • Pre/Post Processing
  • Distributed training
  • Federated learning
  • Environmental issues

Pre-Requisites

  • Computer archictecture
  • Machine Learning
  • Python programming
  • PyTorch Basics

Reading

Textbook Efficient processing of deep neural networks
Introduction A New Golden Age for Computer Architecture
- PDF
- HTML
DNN Computations TB Chap 1, 2
Whatโ€™s the backward-forward FLOP ratio for Neural Networks?
Optimizing RNNs in CuDNN 5
What are keys, queries, and values in attention mechanisms?
Attention is all you need
Hardware Book: Chapter 3
In-datacenter Performance Analysis of a Tensor Processing Unit
Optional: Computer Architecture: A Quantitative Approach. Ch 7
Book: Chapter 5
Think Fast: a TSP for Accelerating Deep Learning Workloads
FYI: OCP Microscaling (MX) Format Specification
Book: Chapter 8
Serving DNNs in Real-time with Project Brainwave
Ten Lessons from 3 Generations Shaped Google TPUv4i
Optional: EIE: Efficient Inference Engine on Compressed DNN
Optional: Survey on sparse hardware acceleration
Microarchitecture Deep learning with INT8 on Xilinx devices
On-Chip Memory Design for Low-Power CNN Accelerators
Optional: Making Floating Point Math Highly Efficient for AI Hardware
Optional: Book: Chapter 10
Quantization Book: Chapter 7
Quantization and Training of NNs for Efficient INT-only Inference
Training DNNs with 8-bit Floating-Point Numbers
Pruning Book: Chapter 8
Learning Both Weights and Connections for Efficient NNs
The Lottery Ticket Hypothesi
s
TinyML TinyML Progress, Challenges and Roadmap
Knowledge Distillation Distilling the Knowledge in a Neural Network
Knowledge Distillation: A Survey
Neural Architecture Search Book: Chapter 9
Neural Architecture Search with Reinforcement Learning
BRP-NAS: Prediction-based NAS using GCNs
AutoML Codesign of a CNN and its Hardware Accelerator
Kernel Computation Book: Chapter 4
Fast algorithms for CNNs
End-to-end ASR Model Compression using Reinforcement Learning
Optional: TNet
Mapping Book: Chapter 6
Optimizing RNNs on GPUs
DLA: Compiler and FPGA Overlay for DNN Inference Acceleration
TVM TVM: An Automated Optimizing Compiler for Deep Learning
Pre-/Postprocessing AI Tax: The Hidden Cost of AI Data-Center Applications
Rethinking Data Storage and Preprocessing in Datacenters
Faster Neural Networks Straight from JPEG
Distributed Training Horovod: Fast and Easy Distributed Deep Learning in Tensorflow
Large Scale Distributed Deep Learning
Federated Learning Google AI Blog Post on FL
Communication Efficient Learning of DNNs from Decentralized Data
Towards Federated Learning at Scale: System Design
Ethical/Environmental Issues Chasing Carbon: The Elusive Environmental Footprint of Computing
On the Dangers of Stochastic Parrots: Can Language Models be Too Big
The Carbon Footprint of ML Training will Plateau, then Shrink

References

Last Updated: 2024-05-14 ; Contributors: AhmedThahir

Comments