We have openings for 4 PhD positions in the areas of machine learning and computer vision. If you are eager to get involved in cutting edge, creative research, apply directly here on our website.
Our research  contributed to the discovery of important patterns in the EEG signals of coma patients. Read more about it in the official news article here.
S. Jonas, A. Rossetti, M. Oddo, S. Jenni, P. Favaro and F. Zubler, "EEG-based Outcome Prediction after Cardiac Arrest with Convolutional Neural Networks: Performance and Visualization of Discriminative Features", in Human Brain Mapping, 2019.
PhD student Adam Bielski just got his paper accepted (with spotlight!) in the upcoming NeurIPS conference. It is his first publication since he started the PhD in our group. Congratulations on your excellent work.
Please find the abstract below and keep an eye on our publications page as it will get updated with details about the NeurIPS submission.
We introduce a novel framework to build a model that can learn how to segment objects from a collection of images without any human annotation. Our method builds on the observation that the location of object segments can be perturbed locally relative to a given background without affecting the realism of a scene. Our approach is to first train a generative model of a layered scene. The layered representation consists of a background image, a foreground image and the mask of the foreground. A composite image is then obtained by overlaying the masked foreground image onto the background. The generative model is trained in an adversarial fashion against a discriminator, which forces the generative model to produce realistic composite images. To force the generator to learn a representation where the foreground layer corresponds to an object, we perturb the output of the generative model by introducing a random shift of both the foreground image and mask relative to the background. Because the generator is unaware of the shift before computing its output, it must produce layered representations that are realistic for any such random perturbation. Finally, we learn to segment an image by defining an autoencoder consisting of an encoder, which we train, and the pre-trained generator as the decoder, which we freeze. The encoder maps an image to a feature vector, which is fed as input to the generator to give a composite image matching the original input image. Because the generator outputs an explicit layered representation of the scene, the encoder learns to detect and segment objects. We demonstrate this framework on real images of several object categories.
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the biggest conferences in computer science . This year it took place in Long Beach, California from June 16-20. With an unprecedented 1300 accepted papers and over 5000 submissions this year, the conference is currently growing at an exponential rate. From the graphic below we can also see that the top keywords of interest are: image, detection, 3D object, video, segmentation, adversarial, recognition, visual. The frequency of the topics graph, representation, and cloud have doubled.
Our group has two accepted papers in CVPR 2019:
Learning to Extract Flawless Slow Motion from Blurry Videos
M. Jin, Z. Hu and P. Favaro
Full text and more information is available on our publications page.
Credit: Conference statistics and plots by Hoseong Lee.
We are pleased to announce that two students in our group have successfully defended their doctoral thesis recently. Qiyang Hu and Attila Szabó are now holding the title of Doctor of Philosophy. Congratulations!
Both theses can be found here.
Prof. Dr. Paolo Favaro and the members of CVG congratulate Qiyang and Attila for their incredible achievement and wish them all the best for the future.
Learning Controllable Representations for Image Synthesis
In this thesis, our focus is learning a controllable representation and applying the learned controllable feature representation on images synthesis, video generation, and even 3D reconstruction. We propose different methods to disentangle the feature representation in neural network and analyze the challenges in disentanglement such as reference ambiguity and shortcut problem when using the weak label. We use the disentangled feature representation to transfer attributes between images such as exchanging hairstyle between two face images. Furthermore, we study the problem of how another type of feature, sketch, works in a neural network. The sketch can provide shape and contour of an object such as the silhouette of the side-view face. We leverage the silhouette constraint to improve the 3D face reconstruction from 2D images. The sketch can also provide the moving directions of one object, thus we investigate how one can manipulate the object to follow the trajectory provided by a user sketch. We propose a method to automatically generate video clips from a single image input using the sketch as motion and trajectory guidance to animate the object in that image. We demonstrate the efficiency of our approaches on several synthetic and real datasets.
Learning Interpretable Representations of Images
Computers represent images with pixels and each pixel contains three numbers for red, green and blue colour values. These numbers are meaningless for humans and they are mostly useless when used directly with classical machine learning techniques like linear classifiers. Interpretable representations are the attributes that humans understand: the colour of the hair, viewpoint of a car or the 3D shape of the object in the scene. Many computer vision tasks can be viewed as learning interpretable representations, for example a supervised classification algorithm directly learns to represent images with their class labels.
In this work we aim to learn interpretable representations (or features) indirectly with lower levels of supervision. This approach has the advantage of cost savings on dataset annotations and the flexibility of using the features for multiple follow-up tasks. We made contributions in three main areas: weakly supervised learning, unsupervised learning and 3D reconstruction.
In the weakly supervised case we use image pairs as supervision. Each pair shares a common attribute and differs in a varying attribute. We propose a training method that learns to separate the attributes into separate feature vectors. These features then are used for attribute transfer and classification. We also show theoretical results on the ambiguities of the learning task and the ways to avoid degenerate solutions.
We show a method for unsupervised representation learning, that separates semantically meaningful concepts. We explain and show ablation studies how the components of our proposed method work: a mixing autoencoder, a generative adversarial net and a classifier.
We propose a method for learning single image 3D reconstruction. It is done using only the images, no human annotation, stereo, synthetic renderings or ground truth depth map is needed. We train a generative model that learns the 3D shape distribution and an encoder to reconstruct the 3D shape. For that we exploit the notion of image realism. It means that the 3D reconstruction of the object has to look realistic when it is rendered from different random angles. We prove the efficacy of our method from first principles.