Authors: Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tu
DecAlign is a novel hierarchical cross-modal alignment framework that explicitly disentangles multimodal representations into modality-unique (heterogeneous) and modality-common (homogeneous) components, which not only facilitates fine-grained alignment through prototype-guided optimal transport but also enhances semantic consistency via latent distribution matching. Moreover, DecAlign effectively mitigates distributional discrepancies while preserving modality-specific characteristics, yielding consistent performance improvements across multiple multimodal benchmarks.
Clone this repository:
git clone https://github.com/taco-group/DecAlign.git
Prepare the Python environment:
cd DecAlign
conda create --name decalign python=3.9 -y
conda activate decalign
Install all the required libraries:
pip install -r requirements.txt
The preprocess of CMU-MOSI, CMU-MOSEI and CH-SIMS datasets follows MMSA, here we provide the processed datasets through these links:
CMU-MOSI: https://drive.google.com/drive/folders/1A6lpSk1ErSXhXHEJcNqFyOomSkP81Xw7?usp=drive_link
CMU-MOSEI: https://drive.google.com/drive/folders/1XZ4z94I-AlXNQfsWmW01_iROtjWmlmdh?usp=drive_link