Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers (SPDVT)
Official PyTorch Implementation
[CVPR 2024] Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers
This GitHub repository is currently undergoing organization. Stay tuned for the upcoming release of fully functional code!

git clone https://github.com/JinyangMarkLiu/JPDVT.git
cd JPDVT
Download datasets as you need. Here we give brief instructions for setting up part of the datasets we used.
You can use this script to download and prepare the ImageNet dataset. If you need to download the dataset, please uncomment the first part of the script.
Download the JPwLEG-3 from this Google Drive. Only select_image part is used in our experiments.
We provide training scripts for training image models and video models.
On ImageNet dataset:
torchrun --nnodes=1 --nproc_per_node=4 train_JPDVT.py --dataset imagenet --data-path <imagenet-train-path> --image-size 192 --crop
On MET dataset:
torchrun --nnodes=1 --nproc_per_node=4 train_JPDVT.py --dataset met --data-path <met-data-path> --image-size 288 --epochs 1000
If you find our paper/project useful, please consider citing our paper:
@InProceedings{Liu_2024_CVPR,
author = {Liu, Jinyang and Teshome, Wondmgezahu and Ghimire, Sandesh and Sznaier, Mario and Camps, Octavia},
title = {Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {23009-23018}
}
Our codebase is mainly based on improved diffusion, make a video, and DiT.