MMIR

🌐 Homepage | 🤗 Dataset | 📖 Paper | GitHub

This repo contains the evaluation code for the paper "Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models".

Introduction

We introduce MMIR, the first benchmark for evaluating Multimodal Large Language Models (MLLMs) on detecting and reasoning about inconsistencies in layout-rich multimodal content. MMIR features 534 challenging samples across five reasoning-heavy inconsistency categories: Factual Contradiction, Identity Misattribution, Contextual Mismatch, Quantitative Discrepancy and Temporal/Spatial Incoherence.

Dataset Creation

The MMIR benchmark was meticulously constructed through a four-stage curation pipeline to ensure high-quality, diverse, and challenging test cases. Please refer to our huggingface 🤗 Dataset for more details.

Evaluation

Please refer to our eval folder for more details.

🏆 Leaderboard

Model	Web (Open-ended)	Office (Open-ended)	Poster (Open-ended)	Overall (Open-ended)	Web (MCQ)	Office (MCQ)	Poster (MCQ)	Overall (MCQ)
o1 (1217)	47.91	59.19	38.73	51.40	47.91	58.52	46.47	52.15
GPT-4o (1120)	25.00	42.60	30.98	33.14	37.29	58.96	47.88	47.75
Qwen2.5-VL-7B	8.54	29.14	11.97	17.60	14.37	33.18	16.90	22.56
LLaVA-NeXT-7B	10.20	21.97	7.04	14.70	11.45	25.33	5.63	16.47
InternVL2.5-8B	7.70	24.21	4.92	14.23	9.37	23.54	11.97	15.63
Phi-3.5-Vision-4B	6.87	24.43	7.04	14.23	1.66	8.52	0.00	4.30

Contact

Qianqi Yan: qyan79@ucsc.edu
Xin Eric Wang: xwang366@ucsc.edu

Citation

BibTeX:

@misc{yan2025multimodalinconsistencyreasoningmmir,
      title={Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models}, 
      author={Qianqi Yan and Yue Fan and Hongquan Li and Shan Jiang and Yang Zhao and Xinze Guan and Ching-Chen Kuo and Xin Eric Wang},
      year={2025},
      eprint={2502.16033},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2502.16033}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
eval		eval
README.md		README.md
examples.png		examples.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMIR

Introduction

Dataset Creation

Evaluation

🏆 Leaderboard

Contact

Citation

About

Releases

Packages

Contributors 2

Languages

eric-ai-lab/MMIR

Folders and files

Latest commit

History

Repository files navigation

MMIR

Introduction

Dataset Creation

Evaluation

🏆 Leaderboard

Contact

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages