Skip to content
View MaoSong2022's full-sized avatar
🧐
keep learning
🧐
keep learning

Block or report MaoSong2022

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MaoSong2022/README.md

Hi there 👋 I'm Mao Song (毛松)

I'm a Researcher at Shanghai Artificial Intelligence Laboratory (Shanghai AI LAB), currently focusing on the exciting field of Multimodal Large Language Models (MLLMs) and the potential of unified understanding & generation models towards AGI.

My academic journey includes a Master's degree from ShanghaiTech University (supervised by Professor Wang Hao) and a Bachelor's degree from Beijing Institute of Technology (BIT).

My Recent Work & Interests:

  • Multimodal Large Language Models (MLLMs): I developed DocParser, a tool to process academic papers with LaTeX source files from arXiv. Leveraging DocParser, we released DocGenome, a rich academic dataset providing annotations across layouts, OCR, and entity relationships to enhance MLLM understanding of text-rich images.
  • Unified Understanding & Generation Models: While I haven't initiated a specific project in this area yet, I believe it represents a crucial step towards achieving true Artificial General Intelligence.

As a newcomer to this pioneering domain, I am actively learning from foundational works like Qwen-VL and Intern-VL, aiming to contribute meaningfully to this emerging field.

Explore More:

Pinned Loading

  1. Paper_Reading Paper_Reading Public

    B站论文记录

    2