Skip to content

Pre/Post Processing

Amdahl’s Law

Layers of Overhead for DNN

At the application level, “overheads” can take more time than the DNN itself

image-20240517190014640

Example: Face Recognition

image-20240517221006852

image-20240517221019695

image-20240517221219847

Host + Accelerator

image-20240517221506831

  • Model

  • Sits in CPU main memory

  • Transferred over PCIe to GPU mem

  • Input data

  • Arrives over ethernet to CPU

  • Transferred over PCIe to CPU
  • Inference/Training
  • Result send back to CPU over PCIe

Latency impacted by data transfers

Solution 1: Multiple GPU Cores

image-20240517222000472

Solution 2: Multiple CPU Cores

image-20240517222202903

Solution 3: Dedicated GPU-GPU connections

NVLink

image-20240517222259936

Dedicated FPGA for Packet Processing

image-20240517222336927

Algorithm Codesign Opportunities

image-20240517230039617

image-20240517230056286

On-Device Deployment

CPU, GPU, NPU share the same memory on the SOC (System On Chip)

Mobile-Cloud Inference

image-20240517230433537

Last Updated: 2024-05-14 ; Contributors: AhmedThahir

Comments