{"id":132127,"date":"2022-08-01T10:10:05","date_gmt":"2022-08-01T01:10:05","guid":{"rendered":"http:\/\/192.249.19.202\/?post_type=ai-in-circuit&#038;p=132127"},"modified":"2026-06-05T08:57:52","modified_gmt":"2026-06-04T23:57:52","slug":"t-pim-a-2-21-to-161-08tops-w-processing-in-memory-accelerator-for-end-to-end-on-device-training","status":"publish","type":"ai-in-circuit","link":"http:\/\/ee.presscat.kr\/en\/ai-in-circuit\/t-pim-a-2-21-to-161-08tops-w-processing-in-memory-accelerator-for-end-to-end-on-device-training\/","title":{"rendered":"T-PIM: A 2.21-to-161.08TOPS\/W Processing-In-Memory Accelerator for End-to-End On-Device Training"},"content":{"rendered":"<p>Title : T-PIM: A 2.21-to-161.08TOPS\/W Processing-In-Memory Accelerator for End-to-End On-Device Training<\/p>\n<p>Authors : Jaehoon Heo, Junsoo Kim, Wontak Han, Sukbin Lim, Joo-Young Kim<\/p>\n<p>Publications : IEEE Custom Integrated Circuits Conference (CICC) 2022<\/p>\n<p>As the number of edge devices grows to tens of billions, the importance of intelligent computing has been shifted from cloud datacenters to edge devices. On-device training, which enables the personalization of a machine learning (ML) model for each user, is crucial in the success of edge intelligence. However, battery-powered edge devices cannot afford huge computations and memory accesses involved in the training. Processing-in-Memory (PIM) is a promising technology to overcome the memory bandwidth and energy problem by combining processing logic into the memory. Many PIM chips [1-5] have accelerated ML inference using analog or digital-based logic with sparsity handling. Two-way transpose PIM [6] supports backpropagation, but it lacks gradient calculation and weight update, required for end-to-end ML training.<\/p>\n<p>This paper presents T-PIM, the first PIM accelerator that can perform end-to-end on-device training with sparsity handling and support low-latency ML inference. T-PIM makes the four key contributions: 1) T-PIM can run the complete four computational stages of ML training on a chip (Fig. 1). 2) T-PIM allows various data mapping strategies for two major computational layers, i.e., fully-connected (FC) and convolutional (CONV), as well as two computational directions, i.e., forward and backward. 3) T-PIM supports fully variable bit-width for input data and power-of-two bit-width for weight data using serial and configurable arithmetic units. 4) T-PIM accelerates and saves energy consumption in ML training by exploiting fine-grained sparsity in all data types (act., error, and weight).<\/p>\n<p>&nbsp;<\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone size-full wp-image-132128\" src=\"http:\/\/ee.presscat.kr\/wp-content\/uploads\/2022\/08\/\uae40\uc8fc\uc6011.png\" alt=\"\" width=\"810\" height=\"689\" title=\"\" srcset=\"http:\/\/ee.presscat.kr\/wp-content\/uploads\/2022\/08\/\uae40\uc8fc\uc6011.png 810w, http:\/\/ee.presscat.kr\/wp-content\/uploads\/2022\/08\/\uae40\uc8fc\uc6011-300x255.png 300w, http:\/\/ee.presscat.kr\/wp-content\/uploads\/2022\/08\/\uae40\uc8fc\uc6011-768x653.png 768w\" sizes=\"(max-width: 810px) 100vw, 810px\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1715<\/p>\n","protected":false},"featured_media":0,"template":"","class_list":["post-132127","ai-in-circuit","type-ai-in-circuit","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-circuit\/132127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-circuit"}],"about":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/types\/ai-in-circuit"}],"wp:attachment":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media?parent=132127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}