I'm a Researcher at Shanghai Artificial Intelligence Laboratory (Shanghai AI LAB), currently focusing on the exciting field of Multimodal Large Language Models (MLLMs) and the potential of unified understanding & generation models towards AGI.
My academic journey includes a Master's degree from ShanghaiTech University (supervised by Professor Wang Hao) and a Bachelor's degree from Beijing Institute of Technology (BIT).
My Recent Work & Interests:
- Multimodal Large Language Models (MLLMs): I developed DocParser, a tool to process academic papers with LaTeX source files from arXiv. Leveraging DocParser, we released DocGenome, a rich academic dataset providing annotations across layouts, OCR, and entity relationships to enhance MLLM understanding of text-rich images.
- Unified Understanding & Generation Models: While I haven't initiated a specific project in this area yet, I believe it represents a crucial step towards achieving true Artificial General Intelligence.
As a newcomer to this pioneering domain, I am actively learning from foundational works like Qwen-VL and Intern-VL, aiming to contribute meaningfully to this emerging field.
Explore More: