{"id":167900,"date":"2024-07-23T17:07:19","date_gmt":"2024-07-23T08:07:19","guid":{"rendered":"http:\/\/ee.presscat.kr\/?post_type=research-achieve&#038;p=167900"},"modified":"2026-04-13T10:24:54","modified_gmt":"2026-04-13T01:24:54","slug":"professor-changick-kims-research-team-develops-videomamba-a-high-efficiency-model-opening-a-new-paradigm-in-video-recognition","status":"publish","type":"research-achieve","link":"http:\/\/ee.presscat.kr\/en\/research-achieve\/professor-changick-kims-research-team-develops-videomamba-a-high-efficiency-model-opening-a-new-paradigm-in-video-recognition\/","title":{"rendered":"Professor Changick Kim&#8217;s Research Team Develops &#8216;VideoMamba,&#8217; a High-Efficiency Model Opening a New Paradigm in Video Recognition"},"content":{"rendered":"<p><span style=\"font-size: 14pt\"><strong><span style=\"color: #000000\">Professor Changick Kim&#8217;s Research Team Develops &#8216;VideoMamba,&#8217; a High-Efficiency Model Opening a New Paradigm in Video Recognition<\/span><\/strong><\/span><\/p>\n<p><span style=\"color: #000000\"><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone  wp-image-167899\" src=\"http:\/\/ee.presscat.kr\/wp-content\/uploads\/2024\/07\/1-1.png\" alt=\"\" width=\"677\" height=\"159\" title=\"\"><\/span><\/p>\n<p><span style=\"color: #000000\">&lt;(From left)\u00a0Professor Changick Kim,\u00a0Jinyoung Park\u00a0integrated Ph.D. candidate, Hee-Seon Kim\u00a0Ph.D. candidate, Kangwook Ko\u00a0Ph.D. candidate, and Minbeom Kim\u00a0Ph.D. candidate&gt;<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">On the 9th, Professor Changick Kim\u2019s research team announced the development of a high-efficiency video recognition model named &#8216;VideoMamba.&#8217; VideoMamba demonstrates superior efficiency and competitive performance compared to existing video models built on transformers, like those underpinning large language models such as ChatGPT. This breakthrough is seen as pioneering a new paradigm in the field of video utilization.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"alignnone  wp-image-167909\" src=\"http:\/\/ee.presscat.kr\/wp-content\/uploads\/2024\/07\/1-2.jpg\" alt=\"\" width=\"716\" height=\"185\" title=\"\"><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\"><strong>Figure 1: Comparison of VideoMamba\u2019s memory usage and inference speed with transformer-based video recognition models.\u00a0<\/strong><\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">VideoMamba is designed to address the high computational complexity associated with traditional transformer-based models. <\/span><\/p>\n<p><span style=\"color: #000000\">These models typically rely on the self-attention mechanism, which scales quadratically in complexity. However, VideoMamba utilizes a Selective State Space Model (SSM) mechanism, enabling efficient linear complexity processing. This allows VideoMamba to effectively capture the spatio-temporal information in videos and efficiently handle long dependencies within video data.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"alignnone  wp-image-167911\" src=\"http:\/\/ee.presscat.kr\/wp-content\/uploads\/2024\/07\/2.jpg\" alt=\"\" width=\"702\" height=\"316\" title=\"\"><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\"><strong>Figure 2: Detailed structure of the spatio-temporal forward and backward Selective State Space Model in VideoMamba.\u00a0<\/strong><\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">To maximize the efficiency of the video recognition model, Professor Kim\u2019s team incorporated spatio-temporal forward and backward SSMs into VideoMamba. This model integrates non-sequential spatial information and sequential temporal information effectively, enhancing video recognition performance. <\/span><\/p>\n<p><span style=\"color: #000000\">The research team validated VideoMamba&#8217;s performance across various video recognition benchmarks. As a result, VideoMamba achieved high accuracy with low GFLOPs (Giga Floating Point Operations) and memory usage, and it demonstrated very fast inference speed.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">VideoMamba offers an efficient and practical solution for various applications requiring video analysis. For example, autonomous driving can analyze driving footage to accurately assess road conditions and recognize pedestrians and obstacles in real time, thereby preventing accidents. <\/span><\/p>\n<p><span style=\"color: #000000\">In the medical field, it can analyze surgical videos to monitor the patient&#8217;s condition in real-time and respond swiftly to emergencies. In sports, it can analyze players&#8217; movements and tactics during games to improve strategies and detect fatigue or potential injuries during training to prevent them. VideoMamba&#8217;s fast processing speed, low memory usage, and high performance provide significant advantages in these diverse video utilization fields.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">The research team includes Jinyoung Park (integrated\u00a0Ph.D\u00a0candidate), Hee-Seon Kim (Ph.D. candidate), Kangwook Ko (Ph.D. candidate) as co-first authors, and Minbeom Kim (Ph.D. candidate) as a co-author, with Professor Changick Kim as the corresponding author from the Department of Electrical and Electronic Engineering at KAIST.<\/span><\/p>\n<p><span style=\"color: #000000\"> The research findings will be presented at the European Conference on Computer Vision (ECCV) 2024, one of the top international conferences in the field of computer vision, to be held in Milan, Italy, in September this year. (Paper title: VideoMamba: Spatio-Temporal Selective State Space Model).<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"color: #000000\">This work was supported by the Institute of Information &amp; communications Technology Planning &amp; Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2020-0-00153, Penetration Security Testing of ML Model Vulnerabilities and Defense).<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>586<\/p>\n","protected":false},"featured_media":167305,"template":"","research_category":[],"class_list":["post-167900","research-achieve","type-research-achieve","status-publish","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/research-achieve\/167900","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/research-achieve"}],"about":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/types\/research-achieve"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media\/167305"}],"wp:attachment":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media?parent=167900"}],"wp:term":[{"taxonomy":"research_category","embeddable":true,"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/research_category?post=167900"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}