OV-MER transitions from traditional MER to a framework that enables the prediction of any number and category of emotions, thereby advancing emotion AI toward real-world applicability by capturing the full spectrum of human emotions.
(a) Task Comparison: We compare the differences among three tasks (one-hot MER, multi-label MER, and OV-MER) across three aspects (label space, label number, and annotation manner).
(b) Label Comparison: We provide an example to visualize the one-hot and OV labels.
It is the first dataset we construct for the OV-MER task. This dataset is available at: https://huggingface.co/datasets/MERChallenge/MER2025
dataset
├── mer2025-dataset
| ├── video # all training data, including 132,171 samples
| ├── audio # pre-extracted audio
| ├── openface_face # # pre-extracted face files
| ├── subtitle_chieng.csv # pre-extracted subtitle content
| ├── track2_train_ovmerd.csv # OV-MERD Dataset (OV labels)
| ├── track3_train_ovmerd.csv # OV-MERD Dataset (Description)
We adopt a model-led, human-assisted annotation strategy to strike a balance between label quality and dataset size. This dataset is available at: https://huggingface.co/datasets/MERChallenge/MER2025
dataset
├── mer2025-dataset
| ├── video # all training data, including 132,171 samples
| ├── audio # pre-extracted audio
| ├── openface_face # # pre-extracted face files
| ├── subtitle_chieng.csv # pre-extracted subtitle content
| ├── track2_train_mercaptionplus.csv # MER-Caption+ Dataset (OV labels)
| ├── track3_train_mercaptionplus.csv # MER-Caption+ Dataset (Description)
We build MER-UniBench, which encompasses typical MER tasks with tailored metrics. This benchmark can offer comprehensive evaluation results for MLLM-based emotion understanding.
dataset # Available at: https://pan.baidu.com/s/1kbfs5pG_hAri0QwvQl-Ecg?pwd=b9vn
├── mer2023-dataset-process
├── mer2024-dataset-process
├── sims-dataset
├── simsv2-dataset
├── cmumosi-dataset
├── cmumosei-dataset
├── iemocap-dataset
├── meld-dataset
OV-MERD+ will be released at the end of the MER2025 challenge, while its original version OV-MERD is already available at: https://huggingface.co/datasets/MERChallenge/MER2025
dataset
├── mer2025-dataset
| ├── video # all training data, including 132,171 samples
| ├── audio # pre-extracted audio
| ├── openface_face # # pre-extracted face files
| ├── subtitle_chieng.csv # pre-extracted subtitle content
| ├── track2_train_ovmerd.csv # OV-MERD Dataset (OV labels)
| ├── track3_train_ovmerd.csv # OV-MERD Dataset (Description)
💡 OV-MERD Paper ✨.
OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition
Zheng Lian, Haiyang Sun, Licai Sun, Haoyu Chen, Lan Chen, Hao Gu, Zhuofan Wen, Shun Chen, Siyuan Zhang, Hailiang Yao, Bin Liu, Rui Liu, Shan Liang, Ya Li, Jiangyan Yi, Jianhua Tao
We provide zero-shot baselines for MLLMs on the OV-MER task on ./OV-MER.
OV-MER
├── Chat-UniVi
├── LLaMA-VID
├── ...
We provide specifically designed framework, AffectGPT, for the OV-MER task on ./AffectGPT.
AffectGPT
├── models # Available at: https://pan.baidu.com/s/1IvC4H7Xt1AzMFocGMBBbHQ?pwd=hzf9
│ ├── chinese-hubert-large # audio encoders
│ ├── clip-vit-large-patch14 # video encoders
│ ├── Qwen2.5-7B-Instruct # LLM
├── output # Available at: https://pan.baidu.com/s/1wtKBxHQP4eCUSAVuBrOzag?pwd=27sh
│ ├── emercoarse_highlevelfilter4_outputhybird_bestsetup_bestfusion_lz # Training on mercaptionplus + input face
If you find AffectGPT useful for your research and applications, please cite using this BibTeX:
# MER-Caption dataset, MER-Caption+ dataset, AffectGPT Framework
@article{lian2025affectgpt,
title={AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models},
author={Lian, Zheng and Chen, Haoyu and Chen, Lan and Sun, Haiyang and Sun, Licai and Ren, Yong and Cheng, Zebang and Liu, Bin and Liu, Rui and Peng, Xiaojiang and others},
journal={arXiv preprint arXiv:2501.16566},
year={2025}
}
# OV-MERD dataset
@article{lian2024open,
title={Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark},
author={Lian, Zheng and Sun, Haiyang and Sun, Licai and Chen, Lan and Chen, Haoyu and Gu, Hao and Wen, Zhuofan and Chen, Shun and Zhang, Siyuan and Yao, Hailiang and others},
journal={arXiv preprint arXiv:2410.01495},
year={2024}
}
# EMER task
@article{lian2023explainable,
title={Explainable Multimodal Emotion Recognition},
author={Lian, Zheng and Sun, Haiyang and Sun, Licai and Gu, Hao and Wen, Zhuofan and Zhang, Siyuan and Chen, Shun and Xu, Mingyu and Xu, Ke and Chen, Kang and others},
journal={arXiv preprint arXiv:2306.15401},
year={2023}
}
# MER2023 Dataset
@inproceedings{lian2023mer,
title={Mer 2023: Multi-label learning, modality robustness, and semi-supervised learning},
author={Lian, Zheng and Sun, Haiyang and Sun, Licai and Chen, Kang and Xu, Mngyu and Wang, Kexin and Xu, Ke and He, Yu and Li, Ying and Zhao, Jinming and others},
booktitle={Proceedings of the 31st ACM international conference on multimedia},
pages={9610--9614},
year={2023}
}
# MER2024 Dataset
@inproceedings{lian2024mer,
title={Mer 2024: Semi-supervised learning, noise robustness, and open-vocabulary multimodal emotion recognition},
author={Lian, Zheng and Sun, Haiyang and Sun, Licai and Wen, Zhuofan and Zhang, Siyuan and Chen, Shun and Gu, Hao and Zhao, Jinming and Ma, Ziyang and Chen, Xie and others},
booktitle={Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing},
pages={41--48},
year={2024}
}
We evaluate the performance of various LLM-based baselines on OV-MERD, including SECap, SALMONN, Qwen-Audio, Otter, OneLLM, PandaGPT, VideoChat, VideoChat2, Video-LLaMA, Video-LLaVA, Video-ChatGPT, LLaMA-VID, mPLUG-Owl, and Chat-UniVi. We extend our gratitude to the authors for their excellent work.
This project is released under the Apache 2.0 license as found in the LICENSE file. The service is a research preview intended for non-commercial use ONLY. Please get in touch with us if you find any potential violations.