AI in EE

AI IN DIVISIONS

AI in Circuit Division

A 28nm 4.96 TOPS/W End-to-End Diffusion Accelerator with Reconfigurable Hyper-Precision Unified Non-Matrix Processing Engine (김주영 교수 연구실)

Title: A 28nm 4.96 TOPS/W End-to-End Diffusion Accelerator with Reconfigurable Hyper-Precision Unified Non-Matrix Processing Engine

Venue: ESSERC 2024

 

Abstract: This paper presents Picasso, an end-to-end diffusion accelerator. Picasso proposes a novel hyper-precision data type and reconfigurable architecture that can maximize hardware efficiency with extended dynamic range, with no compromise in accuracy. Picasso also proposes a unified engine operating all non-matrix operations in a streamlined processing flow and minimizes the end-to-end latency by sub-block pipeline scheduling. The accelerator is fabricated in 28nm CMOS technology and achieves an energy efficiency of 4.96 TOPS/W and a peak performance of 9.83 TOPS. Compared with prior works, Picasso achieves speedups of 8.4×-26.8× while improving energy and area efficiency by 1.1×-2.8× and 3.6×-30.5×, respectively.

 

Main Figure:10