Seminars and Talks

All talks

Supercharging Multimodal Video Representations

by Rohit Girdhar

Date:	Friday, Jan. 19
Time:	16:00
Location:	Online Call via Zoom

Our guest speaker is Rohit Girdhar from the GenAI Research group, Meta.

You are all cordially invited to the CVG Seminar on January 19th at 4 pm CET

via Zoom (passcode is 659431).

Abstract

Last few years have seen an explosion in the capabilities of representations learned by large models trained on lots of data. From LLMs like GPT4 for natural language processing, to multimodal models like CLIP or Flamingo for visual reasoning, or even text-to-image models like DALLE-3 for image generation; these models have revolutionized the way computers understand these different modalities. One modality, however, has somewhat been left behind—videos. While GPT4V and DALLE-3 have made huge strides in image understanding and generation, understanding or generating videos is still an open problem. What are the reasons for this, and will video representations ever catch up? I believe that instead of thinking of this as a competition between videos and the other modalities, the strong language, image, or generative representations should instead be viewed as an asset for bootstrapping strong video representations. In this talk, I will share some of my recent work in building better video representations, by leveraging these advanced representations, specifically for the tasks of video understanding, multimodal understanding, and video generation.

Bio

Rohit is a Research Scientist in the GenAI Research group at Meta. His current research focuses on understanding and generating multimodal data, using minimal human supervision. He obtained a MS and PhD in Robotics from Carnegie Mellon University, where he worked on learning from and understanding videos. He was previously part of the Facebook AI Research (FAIR) group at Meta, and has spent time at DeepMind, Adobe and Facebook as an intern. His research has won multiple international challenges, and has been recognized through a Best Paper (Finalist) Award at CVPR’22, Best Paper Award at ICCV’19 HVU Workshop, Siebel Scholarship at CMU, and a Gold Medal and Research Award for undergraduate research at IIIT Hyderabad.