Skip to content

[CVPR 2024] Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers

Notifications You must be signed in to change notification settings

JinyangMarkLiu/JPDVT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 

Repository files navigation

Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers (SPDVT)
Official PyTorch Implementation

[CVPR 2024] Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers

This GitHub repository is currently undergoing organization. Stay tuned for the upcoming release of fully functional code!

Main Arch

Setup

git clone https://github.com/JinyangMarkLiu/JPDVT.git
cd JPDVT

Preparing Data

Download datasets as you need. Here we give brief instructions for setting up part of the datasets we used.

ImageNet

You can use this script to download and prepare the ImageNet dataset. If you need to download the dataset, please uncomment the first part of the script.

JPwLEG-3

Download the JPwLEG-3 from this Google Drive. Only select_image part is used in our experiments.

Training

We provide training scripts for training image models and video models.

Training image models

On ImageNet dataset:

torchrun --nnodes=1 --nproc_per_node=4 train_JPDVT.py --dataset imagenet --data-path <imagenet-train-path> --image-size 192 --crop

On MET dataset:

torchrun --nnodes=1 --nproc_per_node=4 train_JPDVT.py --dataset met --data-path <met-data-path> --image-size 288 --epochs 1000

Testing

BibTeX

If you find our paper/project useful, please consider citing our paper:

@InProceedings{Liu_2024_CVPR,
    author    = {Liu, Jinyang and Teshome, Wondmgezahu and Ghimire, Sandesh and Sznaier, Mario and Camps, Octavia},
    title     = {Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {23009-23018}
}

Acknowledgments

Our codebase is mainly based on improved diffusion, make a video, and DiT.

About

[CVPR 2024] Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages