-
We provide pre-trained models for view synthesis with 3D Gaussian splatting and scale-consistent depth estimation from multi-view posed images.
-
We assume that the downloaded weights are stored in the
pretrained
directory. It's recommended to create a symbolic link fromYOUR_MODEL_PATH
topretrained
using
ln -s YOUR_MODEL_PATH pretrained
- To verify the integrity of downloaded files, each model on this page includes its sha256sum prefix in the file name, which can be checked using the command
sha256sum filename
.
-
The models are trained on RealEstate10K (re10k) and/or DL3DV (dl3dv) datasets at resolutions of 256x256, 256x448, and 448x768. The number of training views ranges from 2 to 10.
-
The "→" symbol indicates that the models are trained in two stages. For example, "re10k → (re10k+dl3dv)" means the model is firstly trained on the RealEstate10K dataset and then fine-tuned using a combination of the RealEstate10K and DL3DV datasets.
Model | Training Data | Training Resolution | Training Views | Params (M) | Download |
---|---|---|---|---|---|
depthsplat-gs-small-re10k-256x256-view2-cfeab6b1.pth | re10k | 256x256 | 2 | 37 | download |
depthsplat-gs-base-re10k-256x256-view2-ca7b6795.pth | re10k | 256x256 | 2 | 117 | download |
depthsplat-gs-large-re10k-256x256-view2-e0f0f27a.pth | re10k | 256x256 | 2 | 360 | download |
depthsplat-gs-base-re10k-256x448-view2-fea94f65.pth | re10k | 256x448 | 2 | 117 | download |
depthsplat-gs-base-dl3dv-256x448-randview2-6-02c7b19d.pth | re10k → dl3dv | 256x448 | 2-6 | 117 | download |
depthsplat-gs-small-re10kdl3dv-448x768-randview4-10-c08188db.pth | re10k → (re10k+dl3dv) | 256x448 →448x768 | 4-10 | 37 | download |
depthsplat-gs-base-re10kdl3dv-448x768-randview2-6-f8ddd845.pth | re10k → (re10k+dl3dv) | 256x448 →448x768 | 2-6 | 117 | download |
- The depth models are trained with the following procedure:
- Initialize the monocular feature with Depth Anything V2 and the multi-view Transformer with UniMatch.
- Train the full DepthSplat model end-to-end on the mixed RealEstate10K and DL3DV datasets.
- Fine-tune the pre-trained depth model on the depth datasets with ground truth depth supervision. The depth datasets used for fine-tuning include ScanNet, TartanAir, and VKITTI2.
- The depth models are fine-tuned with random numbers (2-8) of input images, and the training image resolution is 352x640.
- The scale of the predicted depth is aligned with the scale of camera pose's translation.
Model | Training Data | Training Resolution | Training Views | Params (M) | Download |
---|---|---|---|---|---|
depthsplat-depth-small-352x640-randview2-8-e807bd82.pth | (re10k+dl3dv) → (scannet+tartanair+vkitti2) | 448x768 → 352x640 | 2-8 | 36 | download |
depthsplat-depth-base-352x640-randview2-8-65a892c5.pth | (re10k+dl3dv) → (scannet+tartanair+vkitti2) | 448x768 → 352x640 | 2-8 | 111 | download |