Date: | Thursday, Dec. 7 |
---|---|
Time: | 16:00 |
Location: | Online Call via Zoom |
Our guest speaker is Daiqing Li from Playground.
You are all cordially invited to the CVG Seminar on December 7th at 4 pm CET
Diffusion-based deep generative models have demonstrated remarkable performance in text condition synthesis tasks in images, videos, and 3D. In this talk, I will talk about how to use large-scale T2I models as vision foundation models for representation learning and other downstream tasks, such as synthetic dataset generation and semantic segmentation.
Daiqing Li is currently serving as a research lead at Playground, where their primary focus lies in advancing the realm of pixel foundation models. Previously, Daiqing held the position of senior research scientist at the NVIDIA Toronto AI Lab. In this capacity, their research encompassed a broad spectrum, including computer vision, computer graphics, generative models, and machine learning. He collaborates closely with Sanja Fidler and Antonio Torralba in NVIDIA and several of his works have been integrated into NVIDIA products, notably Omniverse and Clara. Daiqing graduated from the University of Toronto and has been recognized as the runner-up for the MICCAI Young Scientist Awards. His recent research focuses on using generative models for dataset synthesis, perception tasks, and representation learning. He is the author of SemanticGAN, BigDatasetGAN, and DreamTeacher.
Date: | Monday, Apr. 17 |
---|---|
Time: | 09:00 |
Location: | N10_302, Institute of Computer Science |
Our guest speaker is Dr. Guillermo Gallego from TU Berlin.
You are all cordially invited to the CVG Seminar on April 17th at 9 am CET
Event cameras are novel vision sensors that mimic functions from the human retina and offer potential advantages over traditional cameras (low latency, high speed, high dynamic range, etc.). They acquire visual information in the form of pixel-wise brightness changes, called events. This talk presents event processing approaches for motion estimation in computer vision and robotics applications. In particular, we will discuss recent advances by the Robotic Interactive Perception Lab at TU Berlin in extending the contrast maximization framework to optical flow and stereo depth estimation.
Guillermo Gallego is Associate Professor at TU Berlin and the Einstein Center Digital Future, Berlin, Germany. He is also a PI of the Science of Intelligence Excellence Cluster. He received the PhD degree in Electrical and Computer Engineering from the Georgia Institute of Technology, USA, in 2011. From 2011 to 2014 he was a Marie Curie researcher with Universidad Politecnica de Madrid, Spain, and from 2014 to 2019 he was a postdoctoral researcher with the Robotics and Perception Group at the University of Zurich, Switzerland. He serves as Associate Editor for IEEE Trans. on Pattern Analysis and Machine Intelligence, IEEE Robotics and Automation Letters and the International Journal of Robotics Research.
Date: | Friday, Mar. 17 |
---|---|
Time: | 15:00 |
Location: | Online Call via Zoom |
Our guest speaker is Tengda Han from the Visual Geometry Group (VGG), University of Oxford
You are all cordially invited to the CVG's Seminar on the 17th of March at 3:00 pm CET
Videos are an appealing data source to train computer vision models. There exist almost infinite supplies of videos online, but exhaustive manual annotation is infeasible. In this talk, I will briefly introduce a few methods to learn strong video representations with minimal human annotations, with an emphasis on long videos which go beyond a few seconds.
Tengda Han is a post-doctoral research fellow at the Visual Geometry Group at the University of Oxford. He obtained his PhD from the same group in 2022 supervised by Andrew Zisserman. His current research focuses on self-supervised learning, efficient learning and video understanding.
Date: | Thursday, Dec. 22 |
---|---|
Time: | 17:30 |
Location: | Online Call via Zoom |
Our guest speaker is Karsten Kreis from NVIDIA’s Toronto AI Lab.
You are all cordially invited to the CVG Seminar on the 22nd of December at 5:30 pm CET
Denoising diffusion-based generative models have led to multiple breakthroughs in deep generative learning. In this talk, I will discuss recent works by the NVIDIA Toronto AI Lab on diffusion models. In the first part, I will present GENIE: Higher-Order Denoising Diffusion Solvers, a novel method for accelerated sampling from diffusion models, leveraging higher-order methods together with an efficient model distillation technique to solve the generative differential equations of diffusion models. Next, I will discuss our work on Critically-Damped Langevin Diffusion. Taking inspirations from statistical mechanics and Markov chain Monte Carlo, we introduce an auxiliary velocity variable into the diffusion process, which allows the diffusion to converge to the Gaussian prior more smoothly and quickly. This makes critically-damped Langevin diffusion ideally suited for diffusion-based generative modeling. Finally, I will briefly recapitulate Latent Score-based Generative Models and then present LION: Latent Point Diffusion Models for 3D Shape Generation, which achieves state-of-the-art 3D shape synthesis and enables various artistic applications, such as voxel-guided shape generation.
Karsten Kreis is a senior research scientist at NVIDIA’s Toronto AI Lab. Prior to joining NVIDIA, he worked on deep generative modeling at D-Wave Systems and co-founded Variational AI, a startup utilizing generative models for drug discovery. Before switching to deep learning, Karsten did his M.Sc. in quantum information theory at the Max Planck Institute for the Science of Light and his Ph.D. in computational and statistical physics at the Max Planck Institute for Polymer Research. Currently, Karsten's research focuses on developing novel generative learning methods and on applying deep generative models on problems in areas such as computer vision, graphics and digital artistry, as well as in the natural sciences.
Date: | Friday, Dec. 9 |
---|---|
Time: | 14:30 |
Location: | Online Call via Zoom |
Our guest speaker is Yuki Asano from the University of Amsterdam.
You are all cordially invited to the CVG Seminar on the 9th of December at 2:30 p.m. CET
It is a talk about pushing the limits of what can be learnt without using any human annotations. After a first overview of what self-supervised learning is, we will first dive into how clustering can be combined with representation learning using optimal transport and how this can be leveraged to unsupervisedly segment objects in images [1]. Finally, as augmentations are crucial for all of the self-supervised learning, we will analyze these in more detail in a recent preprint [2]. Here, we show that it is possible to extrapolate to semantic classes such as those of ImageNet using just a single datum as visual input when combined with strong augmentations.
[1] Self-Supervised Learning of Object Parts for Semantic Segmentation [arxiv]
[2] Extrapolating from a Single Image to a Thousand Classes using Distillation [arxiv]
Yuki Asano is an assistant professor for computer vision and machine learning at the Qualcomm-UvA lab at the University of Amsterdam, where he works with Cees Snoek, Max Welling and Efstratios Gavves. His current research interests are multi-modal and self-supervised learning and ethics in computer vision. Prior to his current appointment, he finished his PhD at the Visual Geometry Group (VGG) at the University of Oxford working with Andrea Vedaldi and Christian Rupprecht. During his time as a PhD student, he also interned at Facebook AI Research and worked at TransferWise. Prior to the PhD, he studied physics at the University of Munich (LMU) and Economics in Hagen as well as a MSc in Mathematical Modelling and Scientific Computing at the Mathematical Institute in Oxford.