경희대학교 SW중심대학사업단 포트폴리오

Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models

작성자 김동진

학과 또는 소속(회사명) 컴퓨터공학과

이메일 rlaehdwls310@khu.ac.kr

평가(좋아요)수 0

댓글수 0

Video moment retrieval and highlight detection aim to identify specific segments and highlights from video content based on a given text query. With the rapid growth of video content and the increasing overlap between these tasks, recent research has explored approaches that address both simultaneously. Furthermore, advancements in multimodal large language models (MLLMs) have shown promising results in video understanding tasks. In this study, we leverage MLLMs to generate captions that improve the performance of moment retrieval and highlight detection. Our results demonstrate the effectiveness of these captions in enhancing alignment between visual and textual information, ultimately bridging the gap between the two modalities.

Github

Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models

평가(좋아요)수 0

댓글수 0

게시 : 2024년 11월 18일

김동진 컴퓨터공학과 rlaehdwls310@khu.ac.kr

Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models

평가(좋아요)수 0

댓글수 0

게시 : 2024-11-18

Member

김동진

Keyword

인공지능