Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models

작성자 김동진
학과 또는 소속(회사명) 컴퓨터공학과
조회수 6
평가(좋아요)수 0
댓글수 0
Video moment retrieval and highlight detection aim to identify specific segments and highlights from video content based on a given text query. With the rapid growth of video content and the increasing overlap between these tasks, recent research has explored approaches that address both simultaneously. Furthermore, advancements in multimodal large language models (MLLMs) have shown promising results in video understanding tasks. In this study, we leverage MLLMs to generate captions that improve the performance of moment retrieval and highlight detection. Our results demonstrate the effectiveness of these captions in enhancing alignment between visual and textual information, ultimately bridging the gap between the two modalities.

Github

Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models

조회수 6
평가(좋아요)수 0
댓글수 0
게시 : 2024년 11월 18일
김동진 컴퓨터공학과

Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models

조회수 6
평가(좋아요)수 0
댓글수 0
게시 : 2024-11-18

Member

김동진

Keyword

  • 인공지능