Skip to content

Microarchitecture

  • Arithmetic unit design
  • Memory organization

Processing Element

Should support dot product

  • Multiplier with 2 elements
  • Accumulator with 2 elements

Accumulator: Adder that keeps result in storage

Inference in INT8 precision => Multipliers are INT8, because adders and accumulators need wide range to perform accurate accumulation of many numbers

image-20240504170520675

Sequential

Step
1 image-20240504164012538
2 image-20240504164010093

Paralllel/Vectorized

Step
1 image-20240504164302118
2 image-20240504164213548

Pipelined

Initiation interval: How often we can start computation of a new element in a loop

Break down computation into multiple steps with intermediate registers

Interleaved

image-20240504170328804

Precision

Block Floating Point

  • One exponent for each exponent

image-20240504170905075

On-Chip Memory

Bit-width of address = no of data entries

Connecting RAM to MAC

Simple image-20240504172833548
Use separate memories for 2 operands image-20240504172818182
Increase no of read ports Problems with adding many read ports to SRAM

1. Large size
2. Inc power consumption
3. Slow
4. In FPGA, you need to duplicate your memorie
image-20240504172334439
Banking Use multiple small memories image-20240504173115539

Computing Paradigms

Processing Why?
In-Sensor image-20240504173510262 Data movement from sensor to processor is costly

For eg, if you only need class label as output, why unnecessarily transfer 8MP image to processor
Near-Memory image-20240504175124559
In-Memory
(Analog Processing)
image-20240504175143868 - Weights stored as charges
- Activations delivered as analog voltages
- By activating pre-charge circuity on the word & bit lines, we can perform multiplication between input activation voltage & stored weights
Last Updated: 2024-05-12 ; Contributors: AhmedThahir, web-flow

Comments