Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models
작성자
김동진
학과 또는 소속(회사명)
컴퓨터공학과
이메일
rlaehdwls310@khu.ac.kr
조회수
6
평가(좋아요)수
0
댓글수
0
Video moment retrieval and highlight detection aim to identify specific segments and highlights from video content based on a given text query. With the rapid growth of video content and the increasing overlap between these tasks, recent research has explored approaches that address both simultaneously. Furthermore, advancements in multimodal large language models (MLLMs) have shown promising results in video understanding tasks. In this study, we leverage MLLMs to generate captions that improve the performance of moment retrieval and highlight detection. Our results demonstrate the effectiveness of these captions in enhancing alignment between visual and textual information, ultimately bridging the gap between the two modalities.
Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models
조회수
6
평가(좋아요)수
0
댓글수
0
게시 : 2024년 11월 18일
김동진
컴퓨터공학과
rlaehdwls310@khu.ac.kr
Video Moment Retrieval and Highlight Detection Using Captions Generated by Multimodal Large Language Models
조회수
6
평가(좋아요)수
0
댓글수
0
게시 : 2024-11-18
Member
김동진
Keyword
- 인공지능