DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Authors: Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, Zhengzhong Tu

DecAlign is a novel hierarchical cross-modal alignment framework that explicitly disentangles multimodal representations into modality-unique (heterogeneous) and modality-common (homogeneous) components, which not only facilitates fine-grained alignment through prototype-guided optimal transport but also enhances semantic consistency via latent distribution matching. Moreover, DecAlign effectively mitigates distributional discrepancies while preserving modality-specific characteristics, yielding consistent performance improvements across multiple multimodal benchmarks.

Figure 1. The Framework of our proposed DecAlign approach.

Installation

Clone this repository:

git clone https://github.com/taco-group/DecAlign.git

Prepare the Python environment:

cd DecAlign
conda create --name decalign python=3.9 -y
conda activate decalign

Install all the required libraries:

pip install -r requirements.txt

Dataset Preparation

The preprocess of CMU-MOSI, CMU-MOSEI and CH-SIMS datasets follows MMSA, here we provide the processed datasets through these links:

CMU-MOSI: https://drive.google.com/drive/folders/1A6lpSk1ErSXhXHEJcNqFyOomSkP81Xw7?usp=drive_link

CMU-MOSEI: https://drive.google.com/drive/folders/1XZ4z94I-AlXNQfsWmW01_iROtjWmlmdh?usp=drive_link

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
figs		figs
src		src
static		static
.nojekyll		.nojekyll
LICENSE		LICENSE
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Installation

Dataset Preparation

About

Releases

Packages

Languages

License

taco-group/DecAlign

Folders and files

Latest commit

History

Repository files navigation

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Installation

Dataset Preparation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages