Seminars and Talks

All talks

Understanding and Harnessing Foundation Models

by Narek Tumanyan

Date:	Friday, Mar. 22
Time:	14:30
Location:	Online Call via Zoom

Our guest speaker is Narek Tumanyan from the Weizmann Institute of Science.

You are all cordially invited to the CVG Seminar on March 22nd at 2:30 pm CET

via Zoom (passcode is 696673).

Abstract

The field of computer vision has been undergoing a paradigm shift, moving from task-specific models to "foundation models" - large-scale networks trained on a massive amount of data that can be adopted to a variety of downstream tasks. However, current state-of-the-art foundation models are largely "black boxes". That is, despite being successfully leveraged for downstream tasks, the underlying mechanisms which are responsible for their performance are not well understood. In this talk, we will study the internal representations of two prominent foundation models: DINO-ViT - a self-supervised vision transformer, and StableDiffusion - a text-to-image generative latent diffusion model. This will enable us to

Unveil novel visual descriptors;
Devise efficient frameworks of semantic image manipulation based on the novel visual descriptors.

We demonstrate how gaining understanding of internal representations enables a more creative usage of foundation models and expands their capacities to a broader set of tasks.

Bio

I am a PhD student at the Weizmann Institute of Science, Faculty of Mathematics and Computer Science, advised by Tali Dekel. My research is focused on analyzing and understanding the internal representations of large-scale models and leveraging them as priors for downstream tasks in images and videos, such as image manipulation, editing, and point tracking. I have completed my Master’s degree at the Weizmann Institute in Tali Dekel's lab, where I also started my PhD in March of 2023.