{"id":167298,"date":"2024-07-12T15:23:07","date_gmt":"2024-07-12T06:23:07","guid":{"rendered":"http:\/\/ee.presscat.kr\/?post_type=research-achieve&#038;p=167298"},"modified":"2026-04-06T21:52:34","modified_gmt":"2026-04-06T12:52:34","slug":"167298","status":"publish","type":"research-achieve","link":"http:\/\/ee.presscat.kr\/en\/research-achieve\/167298\/","title":{"rendered":""},"content":{"rendered":"<div><span style=\"font-size: 14pt;color: #000000\"><strong>[<\/strong><strong>Professor Myoungsoo Jung&#8217;s Research Team Pioneers the &#8216;CXL-GPU&#8217; Market. KAIST Develops High Capacity and Performance GPU<\/strong><strong>]<\/strong><\/span><\/div>\n<div>\u00a0<\/div>\n<div><span style=\"color: #000000\"><strong><img decoding=\"async\" class=\"fr-draggable\" src=\"https:\/\/kaist.gov-dooray.com\/mails\/3843617433979195737\/contents\/3842864258736465856.3843537081640038163@dooray.com?type=raw\" alt=\"\" title=\"\"><\/strong><\/span><\/div>\n<div><span style=\"color: #000000\">&lt;Professor Myoungsoo Jung&#8217;s Research Team&gt;<\/span><\/div>\n<div>\u00a0<\/div>\n<p><span style=\"color: #000000\">Recently, big tech companies at the forefront of large-scale AI service provision are competitively increasing the size of their models and data to deliver better performance to users. The latest large-scale language models require tens of terabytes (TB, 10^12 bytes) of memory for training. A domestic research team has developed a next-generation interface technology-enabled high-capacity, high-performance AI accelerator that can compete with NVIDIA, which currently dominates the AI accelerator market.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">Professor Jung Myoungsoo\u2019s research team, announced on the 8th that they have developed a technology to optimize the memory read\/write performance of high-capacity GPU devices with the next-generation interface technology, Compute Express Link (CXL).<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">The internal memory capacity of the latest GPUs is only a few tens of gigabytes (GB, 10^9 bytes), making it impossible to train and infer models with a single GPU. To provide the memory capacity required by large-scale AI models, the industry generally adopts the method of connecting multiple GPUs. However, this method significantly increases the total cost of ownership (TCO) due to the high prices of the latest GPUs.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\"><img decoding=\"async\" class=\"fr-draggable\" src=\"https:\/\/kaist.gov-dooray.com\/mails\/3843617433979195737\/contents\/3842864258736465856.3843535396340293895@dooray.com?type=raw\" width=\"750\" alt=\"\" title=\"\"><\/span><\/p>\n<p><span style=\"color: #000000\">&lt; Representative Image of the CXL-GPU &gt;<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">Therefore, the \u2018CXL-GPU\u2019 structure technology, which directly connects large-capacity memory to GPU devices using the next-generation connection technology, CXL, is being actively reviewed in various industries. However, the high-capacity feature of CXL-GPU alone is not sufficient for practical AI service use. Since large-scale AI services require fast inference and training performance, the memory read\/write performance to the memory expansion device directly connected to the GPU must be comparable to that of the local memory of the existing GPU for actual service utilization.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">*CXL-GPU: It supports high capacity by integrating the memory space of memory expansion devices connected via CXL into the GPU memory space. The CXL controller automatically handles operations needed for managing the integrated memory space, allowing the GPU to access the expanded memory space in the same manner as accessing its local memory. Unlike the traditional method of purchasing additional expensive GPUs to increase memory capacity, CXL-GPU can selectively add memory resources to the GPU, significantly reducing system construction costs.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">Our research team has developed technology to improve the causes of decreased memory read\/write performance of CXL-GPU devices. By developing technology that allows the memory expansion device to determine its memory write timing independently, the GPU device can perform memory writes to the memory expansion device and the GPU&#8217;s local memory simultaneously. This means that the GPU does not have to wait for the completion of the memory write task, thereby solving the write performance degradation issue.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\"><img decoding=\"async\" class=\"fr-draggable\" title=\"\uc81c\uc548\ud558\ub294 CXL-GPU\uc758 \uad6c\uc870\" src=\"https:\/\/news.kaist.ac.kr\/_prog\/download\/?editor_image=\/images\/000079\/image2.jpg_7.jpg\" alt=\"\uc81c\uc548\ud558\ub294 CXL-GPU\uc758 \uad6c\uc870\" \/><\/span><\/p>\n<p><span style=\"color: #000000\">&lt; Proposed CXL-GPU Architecture &gt;\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">Furthermore, the research team developed a technology that provides hints from the GPU device side to enable the memory expansion device to perform memory reads in advance. <\/span><\/p>\n<p><span style=\"color: #000000\">Utilizing this technology allows the memory expansion device to start memory reads faster, achieving faster memory read performance by reading data from the cache (a small but fast temporary data storage space) when the GPU device actually needs the data.<\/span><\/p>\n<p><span style=\"color: #000000\"><img decoding=\"async\" class=\"fr-draggable\" src=\"https:\/\/kaist.gov-dooray.com\/mails\/3843617433979195737\/contents\/3842864258736465856.3843535396499373584@dooray.com?type=raw\" width=\"600\" alt=\"\" title=\"\"><\/span><\/p>\n<p><span style=\"color: #000000\">&lt; CXL-GPU Hardware Prototype &gt;<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">This research was conducted using the ultra-fast CXL controller and CXL-GPU prototype from Panmnesia*, a semiconductor fabless startup. Through the technology efficacy verification using Panmnesia&#8217;s CXL-GPU prototype, the research team confirmed that it could execute AI services 2.36 times faster than existing GPU memory expansion technology. The results of this research will be presented at the USENIX Association Conference and HotStorage research presentation in Santa Clara this July.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">*Panmnesia possesses a proprietary CXL controller with pure domestic technology that has reduced the round-trip latency for CXL memory management operations to less than double-digit nanoseconds (nanosecond, 10^9 of a second) for the first time in the industry. This is more than three times faster than the latest CXL controllers worldwide. Panmnesia has utilized its high-speed CXL controller to directly connect multiple memory expansion devices to the GPU, enabling a single GPU to form a large-scale memory space in the terabyte range.<\/span><\/p>\n<p><span style=\"color: #000000\">Professor Jung stated, \u201cAccelerating the market adoption of CXL-GPU can significantly reduce the memory expansion costs for big tech companies operating large-scale AI services.\u201d<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\"><img decoding=\"async\" class=\"fr-draggable\" src=\"https:\/\/kaist.gov-dooray.com\/mails\/3843617433979195737\/contents\/3842864258736465856.3843535396705573424@dooray.com?type=raw\" width=\"500\" alt=\"\" title=\"\"><\/span><\/p>\n<p><span style=\"color: #000000\">&lt; Evaluation Results of CXL-GPU Execution Time &gt;\u00a0<\/span><\/p>\n<div>\n<div>\u00a0<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>364<\/p>\n","protected":false},"featured_media":167297,"template":"","research_category":[],"class_list":["post-167298","research-achieve","type-research-achieve","status-publish","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/research-achieve\/167298","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/research-achieve"}],"about":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/types\/research-achieve"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media\/167297"}],"wp:attachment":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media?parent=167298"}],"wp:term":[{"taxonomy":"research_category","embeddable":true,"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/research_category?post=167298"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}