{"id":116920,"date":"2019-07-10T17:38:47","date_gmt":"2019-07-10T08:38:47","guid":{"rendered":"http:\/\/175.125.95.178\/ai-in-circuit\/16920\/"},"modified":"2026-04-06T02:20:03","modified_gmt":"2026-04-05T17:20:03","slug":"16920","status":"publish","type":"ai-in-circuit","link":"http:\/\/ee.presscat.kr\/en\/ai-in-circuit\/16920\/","title":{"rendered":"Paper by Dong-Hyeon Han, Jin-Su Lee, Jin-Mook Lee and Hoi-Jun Yoo was presented at Symposium on VLSI Circuits (VLSI), 2019"},"content":{"rendered":"<p>Title: 1.32 TOPS\/W Energy Efficient Deep Neural Network Learning Processor with Direct Feedback Alignment based Heterogeneous Core Architecture<\/p>\n<p>Authors: Dong-Hyeon Han, Jin-Su Lee, Jin-Mook Lee and Hoi-Jun Yoo<\/p>\n<p>An energy efficient deep neural network (DNN) learning processor is proposed using direct feedback alignment (DFA).<\/p>\n<p>The proposed processor achieves 2.2 \u00d7 faster learning speed compared with the previous learning processors by the pipelined DFA (PDFA). Since the computation direction of the back-propagation (BP) is reversed from the inference, the gradient of the 1st layer cannot be generated until the errors are propagated from the last layer to the 1st layer. On the other hand, the proposed processor applies DFA which can propagate the errors directly from the last layer. This means that the PDFA can propagate errors during the next inference computation and that weight update of the 1st layer doesn\u2019t need to wait for error propagation of all the layers. In order to enhance the energy efficiency by 38.7%, the heterogeneous learning core (LC) architecture is optimized with the 11-stage pipeline data-path. It show 2 \u00d7 longer data reusing compared with the conventional BP. Furthermore, direct error propagation core (DEPC) utilizes random number generators (RNG) to remove external memory access (EMA) caused by error propagation (EP) and improve the energy efficiency by 19.9%.<\/p>\n<p>The proposed PDFA based learning processor is evaluated on the object tracking (OT) application, and as a result, it shows 34.4 frames-per-second (FPS) throughput with 1.32 TOPS\/W energy efficiency.<\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" alt=\"\" data-delta=\"1\" data-fid=\"6842\" data-media-element=\"1\" src=\"http:\/\/ee.presscat.kr\/sites\/default\/files\/circuit%207.png\" title=\"\"><\/p>\n<p>Figure 1. Back-propagation vs Pipelined DFA<\/p>\n<p>&nbsp;<\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" alt=\"\" data-delta=\"1\" height=\"508\" src=\"http:\/\/ee.presscat.kr\/sites\/default\/files\/circuit%208.png\" width=\"1181\" title=\"\"><\/p>\n<p>Figure 2. Layer Level vs Neuron-level vs Partial-sum Level Pipeline<\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" alt=\"\" data-delta=\"2\" height=\"581\" src=\"http:\/\/ee.presscat.kr\/sites\/default\/files\/circuit%209.png\" width=\"949\" title=\"\"><\/p>\n<p>Figure 3. Overall Architecture of Proposed Processor<\/p>\n","protected":false},"excerpt":{"rendered":"<p>643<\/p>\n","protected":false},"featured_media":126829,"template":"","class_list":["post-116920","ai-in-circuit","type-ai-in-circuit","status-publish","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-circuit\/116920","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-circuit"}],"about":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/types\/ai-in-circuit"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media\/126829"}],"wp:attachment":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media?parent=116920"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}