Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
-
Updated
Feb 18, 2023
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
Dataset and Evaluation Scripts for Obstacle Detection via Semantic Segmentation in a Marine Environment
This study introduces MultiBanFakeDetect, a novel multimodal dataset for Bangla fake news detection, combining textual and visual information. It features TextFakeNet for text analysis and MultiFusionFake for integrating multimodal data.
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).
Wearanize+ is a research project in which multiple wearable devices were used to record participants' overnight sleep
The official code of "Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search"
A comprehensive multimodal dataset for player engagement analysis in video games, featuring synchronized EEG, eye tracking, heart rate, OpenFace facial analysis, and controller input data from 39 participants playing FIFA'23 and Street Fighter V.
Add a description, image, and links to the multimodal-dataset topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-dataset topic, visit your repo's landing page and select "manage topics."