Seminars and Talks

Three Views on View Synthesis
by Kyle Sargent
Date: Friday, Dec. 15
Time: 16:00
Location: Online Call via Zoom

Our guest speaker is Kyle Sargent from Stanford Vision Lab.

You are all cordially invited to the CVG Seminar on December 15th at 4 pm CET

  • via Zoom (passcode is 520944).


Novel view synthesis from a single image is an important problem in computer vision. Several sources of randomness and ill-posedness make the problem extremely challenging. I will present three papers from over the course of my research career, each taking a very different perspective and technical approach to this problem. As the talk progresses, I will explain how I have come to regard 3D generative modeling and 3D novel view synthesis as closely connected, and give supporting evidence. The final paper I will present is ZeroNVS: Zero-shot 360-degree View Synthesis from a Single Real Image, my most recent paper, which is currently in submission.


Kyle Sargent is a second year PhD student in the Stanford Vision Lab, advised by Jiajun Wu and Fei-Fei Li. He works on 3D generative models and novel view synthesis. He has written several papers for top vision conferences. This includes two first or co-first authored Best Paper Finalists, at CVPR2022 and ICCV2023. Prior to joining Stanford, he was an AI Resident at Google Research, and prior to that, he was an undergraduate at Harvard.

Using Deep Generative Models for Representation Learning and Beyond
by Daiqing Li
Date: Thursday, Dec. 7
Time: 16:00
Location: Online Call via Zoom

Our guest speaker is Daiqing Li from Playground.

You are all cordially invited to the CVG Seminar on December 7th at 4 pm CET

  • via Zoom (passcode is 102781).


Diffusion-based deep generative models have demonstrated remarkable performance in text condition synthesis tasks in images, videos, and 3D. In this talk, I will talk about how to use large-scale T2I models as vision foundation models for representation learning and other downstream tasks, such as synthetic dataset generation and semantic segmentation.


Daiqing Li is currently serving as a research lead at Playground, where their primary focus lies in advancing the realm of pixel foundation models. Previously, Daiqing held the position of senior research scientist at the NVIDIA Toronto AI Lab. In this capacity, their research encompassed a broad spectrum, including computer vision, computer graphics, generative models, and machine learning. He collaborates closely with Sanja Fidler and Antonio Torralba in NVIDIA and several of his works have been integrated into NVIDIA products, notably Omniverse and Clara. Daiqing graduated from the University of Toronto and has been recognized as the runner-up for the MICCAI Young Scientist Awards. His recent research focuses on using generative models for dataset synthesis, perception tasks, and representation learning. He is the author of SemanticGAN, BigDatasetGAN, and DreamTeacher.

Event-based optical flow and stereo depth estimation using contrast maximization
by Guillermo Gallego
Date: Monday, Apr. 17
Time: 09:00
Location: N10_302, Institute of Computer Science

Our guest speaker is Dr. Guillermo Gallego from TU Berlin.

You are all cordially invited to the CVG Seminar on April 17th at 9 am CET


Event cameras are novel vision sensors that mimic functions from the human retina and offer potential advantages over traditional cameras (low latency, high speed, high dynamic range, etc.). They acquire visual information in the form of pixel-wise brightness changes, called events. This talk presents event processing approaches for motion estimation in computer vision and robotics applications. In particular, we will discuss recent advances by the Robotic Interactive Perception Lab at TU Berlin in extending the contrast maximization framework to optical flow and stereo depth estimation.


Guillermo Gallego is Associate Professor at TU Berlin and the Einstein Center Digital Future, Berlin, Germany. He is also a PI of the Science of Intelligence Excellence Cluster. He received the PhD degree in Electrical and Computer Engineering from the Georgia Institute of Technology, USA, in 2011. From 2011 to 2014 he was a Marie Curie researcher with Universidad Politecnica de Madrid, Spain, and from 2014 to 2019 he was a postdoctoral researcher with the Robotics and Perception Group at the University of Zurich, Switzerland. He serves as Associate Editor for IEEE Trans. on Pattern Analysis and Machine Intelligence, IEEE Robotics and Automation Letters and the International Journal of Robotics Research.

Understanding Long Videos with Minimal Supervision
by Tengda Han
Date: Friday, Mar. 17
Time: 15:00
Location: Online Call via Zoom

Our guest speaker is Tengda Han from the Visual Geometry Group (VGG), University of Oxford

You are all cordially invited to the CVG's Seminar on the 17th of March at 3:00 pm CET

  • via Zoom (passcode is 690015).


Videos are an appealing data source to train computer vision models. There exist almost infinite supplies of videos online, but exhaustive manual annotation is infeasible. In this talk, I will briefly introduce a few methods to learn strong video representations with minimal human annotations, with an emphasis on long videos which go beyond a few seconds.



Tengda Han is a post-doctoral research fellow at the Visual Geometry Group at the University of Oxford. He obtained his PhD from the same group in 2022 supervised by Andrew Zisserman. His current research focuses on self-supervised learning, efficient learning and video understanding.

Advanced Diffusion Models: Accelerated Sampling, Smooth Diffusion, and 3D Shape Generation
by Karsten Kreis
Date: Thursday, Dec. 22
Time: 17:30
Location: Online Call via Zoom

Our guest speaker is Karsten Kreis from NVIDIA’s Toronto AI Lab.

You are all cordially invited to the CVG Seminar on the 22nd of December at 5:30 pm CET

  • via Zoom (passcode is 052316).


Denoising diffusion-based generative models have led to multiple breakthroughs in deep generative learning. In this talk, I will discuss recent works by the NVIDIA Toronto AI Lab on diffusion models. In the first part, I will present GENIE: Higher-Order Denoising Diffusion Solvers, a novel method for accelerated sampling from diffusion models, leveraging higher-order methods together with an efficient model distillation technique to solve the generative differential equations of diffusion models. Next, I will discuss our work on Critically-Damped Langevin Diffusion. Taking inspirations from statistical mechanics and Markov chain Monte Carlo, we introduce an auxiliary velocity variable into the diffusion process, which allows the diffusion to converge to the Gaussian prior more smoothly and quickly. This makes critically-damped Langevin diffusion ideally suited for diffusion-based generative modeling. Finally, I will briefly recapitulate Latent Score-based Generative Models and then present LION: Latent Point Diffusion Models for 3D Shape Generation, which achieves state-of-the-art 3D shape synthesis and enables various artistic applications, such as voxel-guided shape generation.



Karsten Kreis is a senior research scientist at NVIDIA’s Toronto AI Lab. Prior to joining NVIDIA, he worked on deep generative modeling at D-Wave Systems and co-founded Variational AI, a startup utilizing generative models for drug discovery. Before switching to deep learning, Karsten did his M.Sc. in quantum information theory at the Max Planck Institute for the Science of Light and his Ph.D. in computational and statistical physics at the Max Planck Institute for Polymer Research. Currently, Karsten's research focuses on developing novel generative learning methods and on applying deep generative models on problems in areas such as computer vision, graphics and digital artistry, as well as in the natural sciences.