Seminars and Talks

CV applications in medicine and biology
by Dmitry Dylov
Date: Friday, Jun. 3
Time: 14:30
Location: Online Call via Zoom

Our guest speaker is Dmitry Dylov from Skoltech and you are all cordially invited to the CVG Seminar on June 3rd at 2:30 p.m. on Zoom (passcode is 859119) or in-person (room 302 at the institute of informatics)


Dr. Dmitry V. Dylov is an Associate Professor at the Center for Computational and Data-Intensive Science and Engineering at Skoltech. He earned his Ph.D. degree in Electrical Engineering at Princeton University (Princeton, NJ, USA) in 2010, and M.Sc. degree in Applied Physics and Mathematics at Moscow Institute of Physics and Technology (Moscow, Russia) in 2006.

Prior to joining Skoltech, Dr. Dylov had been a Lead Scientist at GE Global Research Center (Niskayuna, NY, USA), where he had been leading various projects ranging from bioimaging and computational optics to medical image analytics. Dr. Dylov’s innovation record includes IP contributions to GE Healthcare, frequent technical consulting to emerging start-ups, and the foundation of two spin-off companies with clinical validation in major hospitals in the USA (MSKCCMGH, UCSF, Albany Med).

Dr. Dylov has established a new theoretical and computational paradigm for treating noise in imaging systems, resulting in impactful publications in reputable journals, such as Physical Review Letters and Nature Photonics. His career record includes more than 50 peer-reviewed publications, 16 international patents, and more than 80 invited and contributed talks. Dr. Dylov has earned the McGraw Teaching Excellence certificate at Princeton and has been an instructor in the Edison Engineering Development Program at GE. He has served as an avid professional service volunteer, a scientific reviewer, and an advocate for the educational outreach within SPIE, OSA, APS, and IEEE societies.

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
by Greg Yang
Date: Friday, Apr. 8
Time: 14:30
Location: NS302

Our guest speaker is Greg Yang from Microsoft Research and you are all cordially invited to the CVG Seminar on April 8th at 2:30 p.m. on Zoom (passcode is 757974) or in-person (room 302 at the institute of informatics)


You can’t train GPT-3 on a single GPU, much less tune its hyperparameters (HPs)…or so it seems. I’m here to tell you this is not true: you can tune its HPs on a single GPU even if you can’t train it that way! In the first half of this talk, I’ll describe how, in the so-called maximal update parametrization (abbreviated µP), narrow and wide neural networks share the same set of optimal HPs. This lets us tune any large model by just tuning a small version of it — we call this µTransfer. In particular, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7% of its pretraining compute budget, and, with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In the second half of this talk, I’ll discuss the theoretical reason µP has this special property and the connection to the study of infinite-width neural networks and, more generally, the theory of Tensor Programs. The first half will target general practitioners or empirical researchers in machine learning, while the second half targets those who are more theoretically curious. This talk is based on


Greg Yang is a researcher at Microsoft Research in Redmond, Washington. He joined MSR after he obtained Bachelor's in Mathematics and Master's degrees in Computer Science from Harvard University, respectively advised by ST Yau and Alexander Rush. He won the Hoopes prize at Harvard for best undergraduate thesis as well as Honorable Mention for the AMS-MAA-SIAM Morgan Prize, the highest honor in the world for an undergraduate in mathematics. He gave an invited talk at International Congress of Chinese Mathematicians 2019.

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
by Alaa El-Nouby
Date: Friday, Mar. 11
Time: 15:30
Location: Online Call via Zoom

Our guest speaker is Alaa El-Nouby from Meta AI Research and Inria Paris and you are all cordially invited to the CVG Seminar on March 11th at 3:30 p.m. on Zoom (passcode is 913674).


Pre-training models on large scale datasets, like ImageNet, is a standard practice in computer vision. This paradigm is especially effective for tasks with small training sets, for which high-capacity models tend to overfit. In this work, we consider a self-supervised pre-training scenario that only leverages the target task data. We consider datasets, like Stanford Cars, Sketch or COCO, which are order(s) of magnitude smaller than Imagenet. Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains. On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.


Alaa El-Nouby is a PhD student at Meta AI Research and Inria Paris advised by Hervé Jégou and Ivan Laptev. His research interests are metric learning, self-supervised learning and transformers for computer vision. Prior to pursuing his PhD, Alaa received his Msc from the University of Guelph and the Vector institute, advised by Graham Taylor, where he conducted research in spatio-temporal representation learning and text-to-image synthesis with generative models.

Tackling the Generative Learning Trilemma with Accelerated Diffusion Models
by Doctor Arash Vahdat
Date: Thursday, Feb. 10
Time: 17:30
Location: Online Call via Zoom

Our guest speaker is Arash Vahdat from NVIDIA Research and you are all cordially invited to the CVG Seminar on Feb 10th at 5:30 p.m. on Zoom (passcode is 908626).


A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade between them. Particularly, denoising diffusion models (DDMs) have shown impressive sample quality and diversity, but their expensive sampling does not yet allow them to be applied in many real-world applications. 

In this talk, I will cover our two recent works on reformulating DDMs specifically for fast sampling. In the first part, I will present LSGM, a framework that allows training DDMs in a latent space. In this work, we show that by mapping data to a latent space, we can learn smoother generative processes in a smaller space, resulting in fewer network evaluations and faster sampling. In the second part, I will present denoising diffusion GANs that model the denoising distribution in DDMs using conditional GANs. In this work, we show that our multi-modal denoising distributions, in contrast to unimodal Gaussian distributions, can reduce the number of denoising steps in DDMs to as few as two steps.
Project Pages


Arash Vahdat is a principal research scientist at NVIDIA research specializing in machine learning and computer vision. Before joining NVIDIA, he was a research scientist at D-Wave Systems where he worked on deep generative learning and weakly supervised learning. Prior to D-Wave, Arash was a research faculty member at Simon Fraser University (SFU), where he led research on deep video analysis and taught graduate-level courses on big data analysis. Arash obtained his Ph.D. and MSc from SFU under Greg Mori’s supervision working on latent variable frameworks for visual analysis. His current areas of research include deep generative learning, weakly supervised learning, efficient neural networks, and probabilistic deep learning.

Tackling the Challenge of Uncertainty Estimation and Robustness to Distributional Shift in Real-World applications
by Doctor Andrey Malinin
Date: Friday, Jan. 14
Time: 14:30
Location: Online Call via Zoom

Our guest speaker is Andrey Malinin from Yandex Research and you are all cordially invited to the CVG Seminar on Jan 14th at 2:30 p.m. on Zoom (passcode is 979103).


While much research has been done on developing methods for improving robustness to distributional shift and uncertainty estimation, most of these methods were developed only for small-scale regression or image classification tasks. Limited work has examined developing standard datasets and benchmarks for assessing these approaches. Furthermore, many tasks of practical interest have different modalities, such as tabular data, audio, text, or sensor data, which offer significant challenges involving regression and discrete or continuous structured prediction. In this work, we propose the Shifts Dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation. We hope that this dataset will enable researchers to meaningfully evaluate the plethora of recently developed uncertainty quantification methods, assessment criteria and baselines, and accelerate the development of safe and reliable machine learning in real-world risk-critical applications.

An additional challenge to uncertainty estimation in real world tasks is that standard approaches, such as model ensembles, are computationally expensive. Ensemble Distribution Distillation (EnDD) is an approach that allows a single model to efficiently capture both the predictive performance and uncertainty estimates of an ensemble. Although theoretically principled, this work shows that the original Dirichlet log-likelihood criterion for EnDD exhibits poor convergence when applied to large-scale tasks where the number of classes is very high. Specifically, we show that in such conditions the original criterion focuses on the distribution of the ensemble tail-class probabilities rather than the probability of the correct and closely related classes. We propose a new training objective which resolves the gradient issues of EnDD and enables its application to tasks with many classes, as we demonstrate on the ImageNet, LibriSpeech, and WMT17 En-De datasets containing 1000, 5000, and 40,000 classes, respectively.


Andrey is a Senior Research Scientist at Yandex Research in Moscow, Russia. Prior to that He completed his PhD in Uncertainty Estimation and Speech Processing at Cambridge University under the supervision of Professor Mark Gales. His primary research interest is Bayesian-inspired approaches for Uncertainty estimation for Deep Learning and their practical application at-scale to tasks in NLP, NMT, Speech and computer vision. He also uses generative neural models, such as Flows, Variational Auto-encoders and Generative Adversarial Networks to create digital generative neural art.