{"id":122811,"date":"2021-11-19T16:57:56","date_gmt":"2021-11-19T07:57:56","guid":{"rendered":"http:\/\/175.125.95.178\/ai-in-computer\/22811\/"},"modified":"2026-04-18T22:55:13","modified_gmt":"2026-04-18T13:55:13","slug":"22811","status":"publish","type":"ai-in-computer","link":"http:\/\/ee.presscat.kr\/en\/ai-in-computer\/22811\/","title":{"rendered":"TensorPRAM: Designing a Scalable Heterogeneous Deep Learning Accelerator with Byte-addressable PRAMs"},"content":{"rendered":"<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><span lang=\"EN-US\" style=\",sans-serif\">Sangwon Lee, Gyuyoung Park, and Myoungsoo Jung<\/span><\/span><\/span><\/span><\/p>\n<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><span lang=\"EN-US\" style=\",sans-serif\">12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage), 2020, Poster<\/span><\/span><\/span><\/span><\/p>\n<p style=\"text-align:justify;margin-bottom:11px\"><span lang=\"EN-US\" style=\"font-size:10.0pt\"><span style=\"line-height:107%\"><span style=\",sans-serif\"><a href=\"https:\/\/www.usenix.org\/conference\/hotstorage20\/presentation\/lee\" rel=\"nofollow noopener\">https:\/\/www.usenix.org\/conference\/hotstorage20\/presentation\/lee<\/a><\/span><\/span><\/span><\/p>\n<p style=\"text-indent:5.0pt;text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><span lang=\"EN-US\" style=\",sans-serif\">We present TensorPRAM, a scalable heterogeneous deep learning accelerator that realizes FPGA-based domain specific architecture, and it can be used for forming a computational array for deep neural networks (DNNs). The current design of TensorPRAM includes a systolic-array hardware, which accelerates general matrix multiplication (GEMM) and convolution of DNNs. To reduce data movement overhead between a host and the accelerator, we further replace TensorPRAM\u2019s on-board memory with a dense, but byte-addressable storage class memory (PRAM). We prototype TensorPRAM by placing all the logic of a general processor, front-end host interface module, systolic-array and PRAM controllers into a single FPGA chip, such that one or more TensorPRAMs can be attached to the host over PCIe fabric as a scalable option. Our real system evaluations show that TensorPRAM can reduce the execution time of various DNN workloads, compared to a processor only accelerator and a systolic-array only accelerator by 99% and 48%, on average, respectively.<\/span><\/span><\/span><\/span><\/p>\n<p style=\"text-indent:5.0pt;text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><span lang=\"EN-US\" style=\",sans-serif\"><\/p>\n<div class=\"\"><img decoding=\"async\" class=\"\" src=\"\/wp-content\/uploads\/drupal\/\uc815\uba85\uc218\uad50\uc218\ub2d83.png\" alt=\"\" title=\"\"><\/div>\n<p><\/span><\/span><\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>821<\/p>\n","protected":false},"featured_media":0,"template":"","class_list":["post-122811","ai-in-computer","type-ai-in-computer","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-computer\/122811","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-computer"}],"about":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/types\/ai-in-computer"}],"wp:attachment":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media?parent=122811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}