Seminars and Talks

Tackling the Challenge of Uncertainty Estimation and Robustness to Distributional Shift in Real-World applications
by Doctor Andrey Malinin
Date: Friday, Jan. 14
Time: 14:30
Location: Online Call via Zoom

Our guest speaker is Andrey Malinin from Yandex Research and you are all cordially invited to the CVG Seminar on Jan 14th at 2:30 p.m. on Zoom (passcode is 979103).

Abstract

While much research has been done on developing methods for improving robustness to distributional shift and uncertainty estimation, most of these methods were developed only for small-scale regression or image classification tasks. Limited work has examined developing standard datasets and benchmarks for assessing these approaches. Furthermore, many tasks of practical interest have different modalities, such as tabular data, audio, text, or sensor data, which offer significant challenges involving regression and discrete or continuous structured prediction. In this work, we propose the Shifts Dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation. We hope that this dataset will enable researchers to meaningfully evaluate the plethora of recently developed uncertainty quantification methods, assessment criteria and baselines, and accelerate the development of safe and reliable machine learning in real-world risk-critical applications.

An additional challenge to uncertainty estimation in real world tasks is that standard approaches, such as model ensembles, are computationally expensive. Ensemble Distribution Distillation (EnDD) is an approach that allows a single model to efficiently capture both the predictive performance and uncertainty estimates of an ensemble. Although theoretically principled, this work shows that the original Dirichlet log-likelihood criterion for EnDD exhibits poor convergence when applied to large-scale tasks where the number of classes is very high. Specifically, we show that in such conditions the original criterion focuses on the distribution of the ensemble tail-class probabilities rather than the probability of the correct and closely related classes. We propose a new training objective which resolves the gradient issues of EnDD and enables its application to tasks with many classes, as we demonstrate on the ImageNet, LibriSpeech, and WMT17 En-De datasets containing 1000, 5000, and 40,000 classes, respectively.

Bio

Andrey is a Senior Research Scientist at Yandex Research in Moscow, Russia. Prior to that He completed his PhD in Uncertainty Estimation and Speech Processing at Cambridge University under the supervision of Professor Mark Gales. His primary research interest is Bayesian-inspired approaches for Uncertainty estimation for Deep Learning and their practical application at-scale to tasks in NLP, NMT, Speech and computer vision. He also uses generative neural models, such as Flows, Variational Auto-encoders and Generative Adversarial Networks to create digital generative neural art.

(Conditional) image generation with high-degree polynomial expansions
by Grigorios Chrysos
Date: Friday, Dec. 3
Time: 14:30
Location: NS302

Our guest speaker is Grigorios Chrysos from EPFL and you are all cordially invited to the CVG Seminar on Dec 3rd at 2:30 p.m. on Zoom (passcode is 825054) or in-person (room 302 at the institute of informatics)

Abstract

Despite the impressive performance of Neural Networks (NNs), there are alternative classes of functions that can obtain similar approximation performance. In this talk, we will focus on Polynomial Networks (PNs), which use high-degree polynomial expansions to approximate the target function. The unknown parameters of PNs can be naturally represented as high-order tensors. We will exhibit how tensor decompositions can both reduce the number of learnable parameters and transform PNs into simple recursive formulations. In the second part of the talk, we will extend PNs for conditional tasks where we have multiple (possibly diverse) inputs. We will exhibit how PNs have been used for learning generative models on image, audio and non-euclidean signals. Lastly, we will showcase how conditional PNs can be used for recovering missing attribute combinations from the training set, e.g. in image generation.

Bio

Grigorios Chrysos is a Post-doctoral researcher at Ecole Polytechnique Federale de Lausanne (EPFL) following the completion of his PhD at Imperial College London (2020). Previously, he graduated from National Technical University of Athens with a Diploma/MEng in Electrical and Computer Engineering (2014). He has co-organised workshops in top conference venues, e.g. CVPR/ICCV, on deformable models. His current research interests lie in machine learning and its interface with computer vision. In particular, he is working on generative models, tensor decompositions and modelling high dimensional distributions with polynomial expansions. His recent work has been published in top tier conferences (CVPR, ICML, ICLR, NeurIPS) and prestigious journals (T-PAMI, IJCV, T-IP, Proceedings of the IEEE). He is also serving as a reviewer for the aforementioned conferences and journals. 

Understanding the Visual World through Code
by Professor Jiajun Wu
Date: Friday, Nov. 5
Time: 17:30
Location: Online Call via Zoom

Our guest speaker is Jiajun Wu from Stanford University and you are all cordially invited to the CVG Seminar on Nov 5th at 5:30 p.m. on Zoom (passcode is 769629).

Abstract

Much of our visual world is highly regular: objects are often symmetric and have repetitive parts; indoor scenes such as corridors often consist of objects organized in a repetitive layout. How can we infer and represent such regular structures from raw visual data, and later exploit them for better scene recognition, synthesis, and editing? In this talk, I will present our recent work on developing neuro-symbolic methods for scene understanding. Here, symbolic programs and neural nets play complementary roles: symbolic programs are more data-efficient to train and generalize better to new scenarios, as they robustly capture high-level structure; deep nets effectively extract complex, low-level patterns from cluttered visual data. I will demonstrate the power of such hybrid models in three different domains: 2D image editing, 3D shape modeling, and human motion understanding.

Bio

Jiajun Wu is an Assistant Professor of Computer Science at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science at Massachusetts Institute of Technology. Wu's research has been recognized through the ACM Doctoral Dissertation Award Honorable Mention, the AAAI/ACM SIGAI Doctoral Dissertation Award, the MIT George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, the 2020 Samsung AI Researcher of the Year, the IROS Best Paper Award on Cognitive Robotics, and faculty research awards and graduate fellowships from Samsung, Amazon, Facebook, Nvidia, and Adobe.

Simpler, Faster, Stronger: Supercharging Contrastive Learning with Novel Mutual Information Bounds
by Chenyang Tao
Date: Friday, Oct. 1
Time: 15:00
Location: Online Call via Zoom

Our guest speaker is Professor Chenyang Tao, from the Amazon research (and before that Duke University) and you are all cordially invited to the CVG Seminar on Oct 1st at 3:00 p.m. CEST on Zoom (passcode is 145086).

Abstract

Contrastive representation learners, such as MoCo and SimCLR, have been tremendously successful in recent years. These successes are empowered by the recent advances in contrastive variational mutual information (MI) estimators, in particular, InfoNCE and its variants. A major drawback with InfoNCE-type estimators is that they not only crucially depend on costly large-batch training but also sacrifice bound tightness for variance reduction. In this talk, we show how insights from unnormalized statistical modeling and convex optimization can overcome these limitations. We present a novel, simple, and powerful contrastive MI estimation framework based on Fenchel-Legendre Optimization (FLO). The derived novel FLO estimator is theoretically tight and it provably converges under stochastic gradient descent. We show how an important variant, named FlatNCE, overcomes the notorious log-K curse suffered by InfoNCE and excels at self-supervised learning, with just a one-line change of code. We also introduce new tools to monitor and diagnose contrastive training and demonstrate extended applications in fields such as causal machine learning. We will conclude the talk with some ideas for future work. 

Bio

Chenyang Tao is a senior research associate affiliated with Duke Statistics and Electrical & Computer Engineering, where he leads a small research group of Ph.D. students and postdocs with multi-disciplinary backgrounds working on both fundamental AI research and applications. Tao obtained his Ph.D. in Applied Mathematics from Fudan University (2011-2016) and was a visiting scholar at the University of Warwick (2014-2016) and a visiting scientist at RTI International (2017-2019). He joined Duke in 2017 and has worked enthusiastically to develop novel machine learning algorithms using new theoretical insights, which is endorsed by a strong publication record in top-tier machine learning conferences (more than 10 papers in ICML, NeurIPS, ICLR, etc.) He dived deep into the following three topics: (i) probabilistic inference (e.g., variational Bayesian, adversarial learning, optimal transport, energy models, etc.); (ii) representation optimization (e.g., contrastive learning, mutual information estimation, fairness, self-supervised learning, etc.); (iii) causal machine learning (e.g., counterfactual reasoning, causal representation transfer, etc.). The techniques he developed have been proven widely useful for various applications such as NLP, imbalanced data learning, time-to-event analysis, etc. Outside work, he likes hiking, camping, fishing, kayaking, and binge watching sci-fi franchises. ^_^
 

Light Field Networks
by Vincent Sitzmann
Date: Thursday, Jul. 22
Time: 16:00
Location: Online Call via Zoom

Our guest speaker is Vincent Sitzmann from the MIT CSAIL and you are all cordially invited to the CVG Seminar on July 22nd at 4:00 p.m. CET on Zoom (passcode is 566141), where‪ Vincent will give a talk titled “Light Field Networks“.

Abstract

Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. This poses a unique set of challenges that sets neural scene representations apart from conventional representations of 3D scenes: Rendering and processing operations need to be differentiable, and the type of information they encode is unknown a priori, requiring them to be extraordinarily flexible. At the same time, training them without ground-truth 3D supervision is a highly underdetermined problem, highlighting the need for structure and inductive biases without which models converge to spurious explanations. 
Focusing on 3D structure, a fundamental feature of natural scenes, I will demonstrate how we can equip neural networks with inductive biases that enables them to learn 3D geometry, appearance, and even semantic information, self-supervised only from posed images. I will then discuss our recent work overcoming a key limitation of existing 3D-structured neural scene representations, the differentiable ray-marching, by directly parameterizing the 360 degree, 4D light field of 3D scenes.

Bio

Vincent Sitzmann is a Postdoc with Joshua Tenenbaum, William Freeman, and Fredo Durand at MIT CSAIL, and an incoming Assistant Professor, also at MIT. Previously, he finished his PhD at Stanford University. His primary research interests lie in the self-supervised learning of neural representations of 3D scenes, and their applications in computer graphics, computer vision, and robotics.