News

Latest publications in ICLR 2025
Feb. 2, 2025

A paper from our group got accepted to ICLR 2025!

 


Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling

Aram Davtyan, Leello Tadesse Dadi, Volkan Cevher, Paolo Favaro, in International Conference on Learning Representations (ICLR), 2025

Conditional Flow Matching (CFM), a simulation-free method for training continuous normalizing flows, provides an efficient alternative to diffusion models for key tasks like image and video generation. The performance of CFM in solving these tasks depends on the way data is coupled with noise. A recent approach uses minibatch optimal transport (OT) to reassign noise-data pairs in each training step to streamline sampling trajectories and thus accelerate inference. However, its optimization is restricted to individual minibatches, limiting its effectiveness on large datasets. To address this shortcoming, we introduce LOOM-CFM (Looking Out Of Minibatch-CFM), a novel method to extend the scope of minibatch OT by preserving and optimizing these assignments across minibatches over training time. Our approach demonstrates consistent improvements in the sampling speed-quality trade-off across multiple datasets. LOOM-CFM also enhances distillation initialization and supports high-resolution synthesis in latent space training.

Paper: https://openreview.net/forum?id=rsGPrJDIhh

Latest publications in AAAI 2025
Dec. 11, 2024

A paper from our group got accepted to AAAI 2025!

 


CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation

Aram Davtyan, Sepehr Sameni, Björn Ommer, Paolo Favaro, in AAAI Conference on Artificial Intelligence, 2025

The field of video generation has expanded significantly in recent years, with controllable and compositional video generation garnering considerable interest. Traditionally, achieving this has relied on leveraging annotations such as text, objects' bounding boxes, and motion cues, which require substantial human effort and thus limit its scalability. Thus, we address the challenge of controllable and compositional video generation without any annotations by introducing a novel unsupervised approach. Once trained from scratch on a dataset of unannotated videos, our model can effectively compose scenes by assembling predefined object parts and animating them in a plausible and controlled manner. The core innovation of our method lies in its training process, where video generation is conditioned on a randomly selected subset of pre-trained self-supervised local features. This conditioning compels the model to learn how to inpaint the missing information in the video both spatially and temporally, thereby resulting in understanding the inherent compositionality and the dynamics of the scene. The abstraction level and the imposed invariance of the conditioning to minor visual perturbations enable control over object motion by simply moving the features to the desired future locations. We call our model CAGE, which stands for visual Composition and Animation for video GEneration. We conduct extensive experiments to validate the effectiveness of CAGE across various scenarios, demonstrating its capability to accurately follow the control and to generate high-quality videos that exhibit coherent scene composition and realistic animation.

Project website: https://araachie.github.io/cage.

Paper: https://arxiv.org/abs/2403.14368

Latest publications in NeurIPS 2024
Nov. 25, 2024

A paper from our group got accepted to NeurIPS 2024!

 


Blind Image Restoration via Fast Diffusion Inversion

Hamadi Chihaoui, Abdelhak Lemkhenter and Paolo Favaro, in the 38th Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.

Image Restoration (IR) methods based on a pre-trained diffusion model have demonstrated state-of-the-art performance. However, they have two fundamental limitations: 1) they often assume that the degradation operator is completely known and 2) they alter the diffusion sampling process, which may result in restored images that do not lie onto the data manifold. To address these issues, we propose Blind Image Restoration via fast Diffusion inversion (BIRD) a blind IR method that jointly optimizes for the degradation model parameters and the restored image. To ensure that the restored images lie onto the data manifold, we propose a novel sampling technique on a pre-trained diffusion model. A key idea in our method is not to modify the reverse sampling, i.e., not to alter all the intermediate latents, once an initial noise is sampled. This is ultimately equivalent to casting the IR task as an optimization problem in the space of the input noise. Moreover, to mitigate the computational cost associated with inverting a fully unrolled diffusion model, we leverage the inherent capability of these models to skip ahead in the forward diffusion process using large time steps. We experimentally validate BIRD on several image restoration tasks and show that it achieves state of the art performance.

Project page: https://hamadichihaoui.github.io/BIRD/

Paper: https://arxiv.org/abs/2405.19572

Latest publications in CVPR 2024
June 16, 2024

A paper from our group got accepted to CVPR 2024!

 


Masked and Shuffled Blind Spot Denoising for Real-World Images

Hamadi Chihaoui and Paolo Favaro, in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

We introduce a novel approach to single image denoising based on the Blind Spot Denoising principle, which we call MAsked and SHuffled Blind Spot Denoising (MASH). We focus on the case of correlated noise, which often plagues real images. MASH is the result of a careful analysis to determine the relationships between the level of blindness (masking) of the input and the (unknown) noise correlation. Moreover, we introduce a shuffling technique to weaken the local correlation of noise, which in turn yields an additional denoising performance improvement. We evaluate MASH via extensive experiments on real-world noisy image datasets. We demonstrate on par or better results compared to existing self-supervised denoising methods.

Paper: https://arxiv.org/abs/2404.09389

Latest publications in AAAI 2024
Feb. 20, 2024

A paper from our group got accepted to AAAI 2024!

 


Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation

Aram Davtyan and Paolo Favaro, in AAAI Conference on Artificial Intelligence, 2024

We propose a novel unsupervised method to autoregressively generate videos from a single frame and a sparse motion input. Our trained model can generate unseen realistic object-to-object interactions. Although our model has never been given the explicit segmentation and motion of each object in the scene during training, it is able to implicitly separate their dynamics and extents. Key components in our method are the randomized conditioning scheme, the encoding of the input motion control, and the randomized and sparse sampling to enable generalization to out of distribution but realistic correlations. Our model, which we call YODA, has therefore the ability to move objects without physically touching them. Through extensive qualitative and quantitative evaluations on several datasets, we show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.

Project website: https://araachie.github.io/yoda.

Paper: https://arxiv.org/abs/2306.03988