Seminars and Talks

Machine Learning Tools for Content Synthesis and Editing
by Omri Avrahami
Date: Friday, Apr. 11
Time: 14:45
Location: N10_302, Institute of Computer Science

Our guest speaker is Omri Avrahami from Hebrew University of Jerusalem.

You are all cordially invited to the CVG Seminar on April 11th, 2025 at 2:45 pm CEST

Abstract

Classical computer graphics approaches for realistic content synthesis require an elaborate underlying scene representation, which typically describes the geometry and physics of a scene, and specifies lighting and camera position, etc. In contrast, generative neural models learn to synthesize diverse visual content from large image datasets, but typically without providing precise fine-grained control. In our work, we aim at developing new tools for visual content synthesis and editing using generative models by exploring various ways to narrow down this gap between classical content generation techniques and neural data-driven approaches.

Bio

Omri is a Computer Science Ph.D. student at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, under the joint supervision of Prof. Dani Lischinski and Prof. Ohad Fried. Omri is interested in developing new tools for content synthesis and editing, known popularly as Generative AI. For his latest research, visit https://omriavrahami.com.

Learnings from Building Video Game World Models
by Abdelhak Lemkhenter
Date: Friday, Mar. 21
Time: 15:00
Location: N10_302, Institute of Computer Science

Our guest speaker is Abdelhak Lemkhenter from Microsoft, Cambridge.

You are all cordially invited to the CVG Seminar on March 21st, 2025 at 3:00 pm CEST

Abstract

Over the last few years, the research community has continuously pushed the boundary of
video generative modelling with many impressive demos of open and closed source models.
This led to an increasing interest in the steerability of such models and their ability to capture
the different dynamics present in the data. In this talk, we will discuss the recent advances in
world modelling applied to video games as an interesting setting for training such models. We
will discuss the recently published World and Human Action Model (WHAM) through the lens of
its design, its evaluation and key learning that came of scaling world models to a modern video
game title.

Bio

Abdelhak Lemkhenter is a Researcher at Microsoft Research Cambridge currently working on few-
shot imitation learning and world modeling in complex modern video games. His research
interests also include robust and scalable representation learning and data-centric
learning. He completed his PhD in Informatics at the University of Bern and obtained his Master
degree from the Ecole Central de Paris.

Synthetic Realities: possibilities, frontiers and societal challenges
by Anderson Rocha
Date: Friday, Jul. 5
Time: 14:30
Location: N10_302, Institute of Computer Science

Our guest speaker is Prof. Anderson Rocha from the University of Campinas (Unicamp), Brazil.

You are all cordially invited to the CVG Seminar on July 5th at 2:30 pm CEST

Abstract

We explore the burgeoning landscape of synthetic realities (AI-enabled synthetic contents allied with narratives and contexts), detailing their impact, technological advancements, and ethical quandaries. Synthetic realities provide innovative solutions and opportunities for immersive experiences across various sectors, including education, healthcare, and commerce. However, these advancements also usher in substantial challenges, such as the propagation of misinformation, privacy concerns, and ethical dilemmas. In this talk, we discuss the specifics of synthetic media, including deepfakes and their generation techniques, and the imperative need for robust detection methods to combat the potential misuse of such technologies, as well as concerted efforts on regulation, standardization and technological literacy. We show the dual-edged nature of synthetic realities and advocate for interdisciplinary research, informed public discourse, and collaborative efforts to harness their benefits while mitigating risks. This talk contributes to the discourse on the responsible development and application of artificial intelligence and synthetic media in modern society.

Bio

Anderson Rocha (IEEE Fellow) is Full-Professor of Artificial Intelligence and Digital Forensics at the Institute of Computing, University of Campinas (Unicamp), Brazil. He is the Head of the Artificial Intelligence Lab., Recod.ai, at Unicamp. He is a three-term elected member of the IEEE Information Forensics and Security Technical Committee (IFS-TC) and a former chair of such committee. He is also chair-elect for the 2025-2026 term. He is a Microsoft Research and a Google Research Faculty Fellow as well as a Tan Chin Tuan (TCT) Fellow. Since 2023, he is also an Asia Pacific Artificial Intelligence Association Fellow. He is ranked among the Top 2% of research scientists worldwide, according to PlosOne/Stanford and Research.com studies. Finally, he is now a LinkedIn Top Voice in Artificial Intelligence for continuously raising awareness of Al and its potential impacts on society at large. 

Sparse-view 3D in the Wild
by Jason Y. Zhang
Date: Friday, Apr. 26
Time: 16:00
Location: Online Call via Zoom

Our guest speaker is Jason Y. Zhang from Carnegie Mellon University.

You are all cordially invited to the CVG Seminar on April 26th at 4 pm CEST

  • via Zoom (passcode is 003713).

Abstract

Reconstructing 3D scenes and objects from images alone has been a long-standing goal in computer vision. However, typical methods require a large number of images with precisely calibrated camera poses, which is cumbersome for end users. We propose a probabilistic framework that can predict distributions over relative camera rotations. These distributions are then composed into coherent camera poses given sparse image sets. To improve precision, we then propose a diffusion-based model that represents camera poses as a distribution over rays instead of camera extrinsics. We demonstrate that our system is capable of recovering accurate camera poses from a variety of self-captures and is sufficient for high-quality 3D reconstruction.

Bio

Jason Y. Zhang is a final-year PhD student at Carnegie Mellon University, advised by Deva Ramanan and Shubham Tulsiani. Jason completed his undergraduate degree at UC Berkeley, where he worked with Jitendra Malik and Angjoo Kanazawa. He is interested in scaling single-view and multi-view 3D to unconstrained environments. Jason is supported in part by the NSF GRFP.

Understanding and Harnessing Foundation Models
by Narek Tumanyan
Date: Friday, Mar. 22
Time: 14:30
Location: Online Call via Zoom

Our guest speaker is Narek Tumanyan from the Weizmann Institute of Science.

You are all cordially invited to the CVG Seminar on March 22nd at 2:30 pm CET

  • via Zoom (passcode is 696673).

Abstract

The field of computer vision has been undergoing a paradigm shift, moving from task-specific models to "foundation models" - large-scale networks trained on a massive amount of data that can be adopted to a variety of downstream tasks. However, current state-of-the-art foundation models are largely "black boxes". That is, despite being successfully leveraged for downstream tasks, the underlying mechanisms which are responsible for their performance are not well understood. In this talk, we will study the internal representations of two prominent foundation models: DINO-ViT - a self-supervised vision transformer, and StableDiffusion - a text-to-image generative latent diffusion model. This will enable us to

  1. Unveil novel visual descriptors;
  2. Devise efficient frameworks of semantic image manipulation based on the novel visual descriptors.

We demonstrate how gaining understanding of internal representations enables a more creative usage of foundation models and expands their capacities to a broader set of tasks.

Bio

I am a PhD student at the Weizmann Institute of Science, Faculty of Mathematics and Computer Science, advised by Tali Dekel. My research is focused on analyzing and understanding the internal representations of large-scale models and leveraging them as priors for downstream tasks in images and videos, such as image manipulation, editing, and point tracking. I have completed my Master’s degree at the Weizmann Institute in Tali Dekel's lab, where I also started my PhD in March of 2023.