Sound Source Localization is All about Cross-Modal Alignment (정준선 교수 연구실)

Title: Sound Source Localization is All about Cross-Modal Alignment

Authors: A. Senocak, H. Ryu, J. Kim, T. Oh, H. Pfister, J. S. Chung

Conference: International Conference on Computer Vision

Abstract: Humans can easily perceive the direction of sound sources in a visual scene, termed sound source localization. Recent studies on learning-based sound source localization have mainly explored the problem from a localization perspective. However, prior arts and existing benchmarks do not account for a more important aspect of the problem, cross-modal semantic understanding, which is essential for genuine sound source localization. Cross-modal semantic understanding is important in understanding semantically mismatched audio-visual events, e.g., silent objects, or offscreen sounds. To account for this, we propose a crossmodal alignment task as a joint task with sound source localization to better learn the interaction between audio and visual modalities. Thereby, we achieve high localization performance with strong cross-modal semantic understanding. Our method outperforms the state-of-the-art approaches in both sound source localization and cross-modal retrieval. Our work suggests that jointly tackling both tasks is necessary to conquer genuine sound source localization.

Main Figure:

AI in EE

AI in Signal Division

AI in Computer Division

AI in Communication Division

AI in Signal Division

AI in Wave Division

AI in Circuit Division

AI in Device Division

Sound Source Localization is All about Cross-Modal Alignment (정준선 교수 연구실)

학부 소개

연구

EE-X

AI in EE

구성원

교육

입학

소식

기부

학부 소개

연구

EE-X

AI in EE

구성원

교육

대외협력

입학

소식