Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

Mehdi Noroozi[1], and Paolo Favaro[1]

[1] Universität Bern, Bern, Switzerland

Abstract

We propose a novel unsupervised learning approach to build features suitable for object detection and classification. The features are pre-trained on a large dataset without human annotation and later transferred via fine-tuning on a different, smaller and labeled dataset. The pre-training consists of solving jigsaw puzzles of natural images. To facilitate the transfer of features to other tasks, we introduce the context-free network (CFN), a siamese-ennead convolutional neural network. The features correspond to the columns of the CFN and they process image tiles independently (i.e., free of context). The later layers of the CFN then use the features to identify their geometric arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. We pre-train the CFN on the training set of the ILSVRC2012 dataset and transfer the features on the combined training and validation set of PascalVOC2007 for object detection (via fast RCNN) and classification. These features outperform all current unsupervised features with 51.8% for detection and 68.6% for classification, and reduce the gap with supervised learning (56.5% and 78.2% respectively).

ECCV paper(hosted by arXiv)

Video

Code and Data

Code

Code is available on GitHub

Trained Models

CFN - Jigsaw Puzzle Task : [caffemodel][definition]

CFN - Recognition Task : [caffemodel][definition]

@inproceedings{noroozi2016,
  author = {Mehdi Noroozi and Paolo Favaro},
  title = {Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles},
  booktitle = {ECCV},
  year = {2016}
}

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

Abstract

Code and Data

Code

Trained Models

Bibtex