NPU Component Sizing

Neural processing unit

Sizing of scalar, vector and tensor execution units of a neural processing unit (NPU) and assessing the efficiency of processing different ML models (e.g. convolutional or transformer ML models). With Deneb, the designer can even evaluate the optimal sizing of different NPU components for different layers of a deep learning ML model

Other typical parameters that can benefit from Deneb's sizing analysis


  • Internal (SRAM, CCM) and external (DRAM) memory interface bus width and bandwidth
  • Optimal buffer memory size and its effect on data movement efficiency.
  • Relative sizing of low and high precision compute units.
  • The effect of weight or activation compression on large ML model execution performance.

Performance bottleneck of on-device LLMs

Analyzing the efficiency of running LLMs on edge devices such as smartphone or AR/VR headsets equipped with ML accelerators such as NPU or GPU. Understand the effect of local buffer on compute unit utilization and data movement requirement. Evaluate DRAM or flash memory size and bandwidth requirement for a given model size and performance target. Assess the benefits of using model parameter compression. Evaluate the possibility of multi-tasking LLMs and aother ML task such as a perception CNN.

On-device LLMs

ML-model multi-tasking

Demonstrate and evaluate the NPU's capability to support multi-tasking of different ML models concurrently. Understand the buffer size requirement the overhead of task context switching. Explore the potential of improved compute units utilization and reduced latency through ML multitasking and the resource requirement to achieve that.

ML multi-tasking

Explore NOC routing algorithms and its effect on multi-core compute performance

Analyzing the efficiency of different NOC topology and routing algorithms for facilitating data exchange among processing elements (PE) in in a multi-core AI/HPC SoC. Simulate different NOC routing algorithm such as deflection routing or wormhole routing. Evaluate the effect of router buffer size on NOC flow control and congestion.

NOC routing

Image processing to display pipeline latency analysis

Evaluate image processing (ISP) to display pipeline latency and its effect on virtual see-through experience. Evaluate the buffer requirement for various denoising and distortion correction algorithms and the resultant image lag. Help designers to assess trade-offs between image resolution, quality vs. silicon area and power.

Image processing to display pipeline