Date: | Friday, Dec. 3 |
---|---|
Time: | 14:30 |
Location: | NS302 |
Our guest speaker is Grigorios Chrysos from EPFL and you are all cordially invited to the CVG Seminar on Dec 3rd at 2:30 p.m. on Zoom (passcode is 825054) or in-person (room 302 at the institute of informatics)
Date: | Friday, Nov. 5 |
---|---|
Time: | 17:30 |
Location: | Online Call via Zoom |
Our guest speaker is Jiajun Wu from Stanford University and you are all cordially invited to the CVG Seminar on Nov 5th at 5:30 p.m. on Zoom (passcode is 769629).
Much of our visual world is highly regular: objects are often symmetric and have repetitive parts; indoor scenes such as corridors often consist of objects organized in a repetitive layout. How can we infer and represent such regular structures from raw visual data, and later exploit them for better scene recognition, synthesis, and editing? In this talk, I will present our recent work on developing neuro-symbolic methods for scene understanding. Here, symbolic programs and neural nets play complementary roles: symbolic programs are more data-efficient to train and generalize better to new scenarios, as they robustly capture high-level structure; deep nets effectively extract complex, low-level patterns from cluttered visual data. I will demonstrate the power of such hybrid models in three different domains: 2D image editing, 3D shape modeling, and human motion understanding.
Jiajun Wu is an Assistant Professor of Computer Science at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science at Massachusetts Institute of Technology. Wu's research has been recognized through the ACM Doctoral Dissertation Award Honorable Mention, the AAAI/ACM SIGAI Doctoral Dissertation Award, the MIT George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, the 2020 Samsung AI Researcher of the Year, the IROS Best Paper Award on Cognitive Robotics, and faculty research awards and graduate fellowships from Samsung, Amazon, Facebook, Nvidia, and Adobe.
Date: | Friday, Oct. 1 |
---|---|
Time: | 15:00 |
Location: | Online Call via Zoom |
Our guest speaker is Professor Chenyang Tao, from the Amazon research (and before that Duke University) and you are all cordially invited to the CVG Seminar on Oct 1st at 3:00 p.m. CEST on Zoom (passcode is 145086).
Date: | Thursday, Jul. 22 |
---|---|
Time: | 16:00 |
Location: | Online Call via Zoom |
Our guest speaker is Vincent Sitzmann from the MIT CSAIL and you are all cordially invited to the CVG Seminar on July 22nd at 4:00 p.m. CET on Zoom (passcode is 566141), where Vincent will give a talk titled “Light Field Networks“.
Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. This poses a unique set of challenges that sets neural scene representations apart from conventional representations of 3D scenes: Rendering and processing operations need to be differentiable, and the type of information they encode is unknown a priori, requiring them to be extraordinarily flexible. At the same time, training them without ground-truth 3D supervision is a highly underdetermined problem, highlighting the need for structure and inductive biases without which models converge to spurious explanations.
Focusing on 3D structure, a fundamental feature of natural scenes, I will demonstrate how we can equip neural networks with inductive biases that enables them to learn 3D geometry, appearance, and even semantic information, self-supervised only from posed images. I will then discuss our recent work overcoming a key limitation of existing 3D-structured neural scene representations, the differentiable ray-marching, by directly parameterizing the 360 degree, 4D light field of 3D scenes.
Vincent Sitzmann is a Postdoc with Joshua Tenenbaum, William Freeman, and Fredo Durand at MIT CSAIL, and an incoming Assistant Professor, also at MIT. Previously, he finished his PhD at Stanford University. His primary research interests lie in the self-supervised learning of neural representations of 3D scenes, and their applications in computer graphics, computer vision, and robotics.
Date: | Thursday, Jun. 24 |
---|---|
Time: | 17:30 |
Location: | Online Call via Zoom |
Our guest speaker is Yang Song from the University of Stanford and you are all cordially invited to the CVG Seminar on June 24th at 5:30 p.m. CET on Zoom (passcode is 299064), where Yang will give a talk titled “Generative Modeling by Estimating Gradients of the Data Distribution“.
Existing generative methods are typically based on training explicit probability representations with maximum likelihood (e.g., VAEs), or learning implicit sampling procedures with adversarial training (e.g., GANs). The former requires variational inference or special model architectures for tractable training, while the latter can be unstable. To address these difficulties, we explore an alternative approach based on estimating gradients of probability densities. We can estimate gradients of distributions by training flexible neural network models with denoising score matching, and use these models for sample generation, exact likelihood computation, posterior inference, and data manipulation by leveraging techniques of MCMC and stochastic differential equations. Our framework enables free-form model architectures, requires no adversarial optimization, and achieves the state-of-the-art performance in many applications such as image and audio generation.