{"id":131998,"date":"2022-07-25T18:46:30","date_gmt":"2022-07-25T09:46:30","guid":{"rendered":"http:\/\/192.249.19.202\/?post_type=ai-in-circuit&#038;p=131998"},"modified":"2026-04-05T18:10:09","modified_gmt":"2026-04-05T09:10:09","slug":"a-framework-for-accelerating-transformer-based-language-model-on-reram-based-architecture","status":"publish","type":"ai-in-circuit","link":"http:\/\/ee.presscat.kr\/en\/ai-in-circuit\/a-framework-for-accelerating-transformer-based-language-model-on-reram-based-architecture\/","title":{"rendered":"A Framework for Accelerating Transformer-based Language Model on ReRAM-based Architecture"},"content":{"rendered":"<p>Title : A Framework for Accelerating Transformer-based Language Model on ReRAM-based Architecture<\/p>\n<p>&nbsp;<\/p>\n<p>Author: Myeonggu Kang, Hyein Shin, Lee-Sup Kim<\/p>\n<p>&nbsp;<\/p>\n<p>Journal : IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems<\/p>\n<p>&nbsp;<\/p>\n<p>Abstract : Transformer-based language models have become the de-facto standard model for various NLP applications given the superior algorithmic performances. Processing a transformer-based language model on a conventional accelerator induces the memory wall problem, and the ReRAM-based accelerator is a promising solution to this problem. However, due to the characteristics of the self-attention mechanism and the ReRAM-based accelerator, the pipeline hazard arises when processing the transformer-based language model on the ReRAM-based accelerator. This hazard issue greatly increases the overall execution time. In this paper, we propose a framework to resolve the hazard issue. Firstly, we propose the concept of window self-attention to reduce the attention computation scope by analyzing the properties of the self-attention mechanism. After that, we present a window-size search algorithm, which finds an optimal window size set according to the target application\/algorithmic performance. We also suggest a hardware design that exploits the advantages of the proposed algorithm optimization on the general ReRAM-based accelerator. The proposed work successfully alleviates the hazard issue while maintaining the algorithmic performance, leading to a 5.8\u00d7 speedup over the provisioned baseline. It also delivers up to 39.2\u00d7\/643.2\u00d7 speedup\/higher energy efficiency over GPU, respectively.<img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone size-full wp-image-131999\" src=\"http:\/\/ee.presscat.kr\/wp-content\/uploads\/2022\/07\/\uae40\uc774\uc12d2.png\" alt=\"\" width=\"922\" height=\"712\" title=\"\" srcset=\"http:\/\/ee.presscat.kr\/wp-content\/uploads\/2022\/07\/\uae40\uc774\uc12d2.png 922w, http:\/\/ee.presscat.kr\/wp-content\/uploads\/2022\/07\/\uae40\uc774\uc12d2-300x232.png 300w, http:\/\/ee.presscat.kr\/wp-content\/uploads\/2022\/07\/\uae40\uc774\uc12d2-768x593.png 768w\" sizes=\"(max-width: 922px) 100vw, 922px\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>888<\/p>\n","protected":false},"featured_media":0,"template":"","class_list":["post-131998","ai-in-circuit","type-ai-in-circuit","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-circuit\/131998","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-circuit"}],"about":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/types\/ai-in-circuit"}],"wp:attachment":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media?parent=131998"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}