Xiaodong Wang · 王晓东

Xiaodong Wang (王晓东)

Video Understanding · Video Generation
Email: wangxd220 [@] gmail.com
Email: wangxiaodong21s [@] stu.pku.edu.cn
🎓 Ph.D. candidate
📍 SHENZHEN, CHINA
🔎 Vision × Language
Portrait of Xiaodong Wang
Taken at Tsinghua University campus

👋About

I am a CS Ph.D. candidate in the School of Computer Science & School of Electronic and Computer Engineering, Peking University. I am fortunate to be advised by Prof. Peixi Peng.

Before that, I received my M.S. degree in Software Engineering from Peking University, and my B.S. degree in Data Science from Beijing Information Science and Technology University, graduating with the President Scholarship, the highest student honor.

I am currently an intern at ByteDance, serving as the student leader in a collaborative research project. Previously, I have been an intern at the Microsoft Research Asia (MSRA), working closely with Dr. Chenfei Wu and Dr. Nan Duan, SenseTime Research, Megvii Research, and Institute of Computing Technology, Chinese Academy of Sciences.

Research interests

Video Understanding Video Generation

In short, I enjoy teaching models how to watch videos, imagine new worlds, and talk about them like a curious researcher.

Highlights Projects

Open-R1-Video (the 1st Open-Source R1-like Video-LLM, 370+ GitHub⭐)
Visual ChatGPT (the 1st multimodal AI agent, 34k+ GitHub⭐, 900+ Citations📑 )

If you see opportunities for collaboration, please feel free to email me — I'd love to have a coffee chat.

🎓Education & Experiences

Education

Peking University
CS Ph.D. candidate
Sep 2024 – Jul 2028 (expected)
Peking University
M.S. in Software Engineering (Exam-free admission)
Beijing, China · GPA: 3.66 / 4.00
Sep 2021 – Jul 2024
Beijing Information Science and Technology University
B.S. in Data Science
Beijing, China · GPA: 4.43 / 5.00 · Rank: 1 / 32
Sep 2017 – Jul 2021

Professional Experiences

ByteDance
Research Intern, Douyin Group
SenseTime
Research Intern, SenseTime-FVG
Jan 2024 – Aug 2024 · Beijing, China
Microsoft Research Asia (MSRA)
Research Intern, Natural Language Computing Group
Supervised by Chenfei Wu & Nan Duan
May 2022 – Nov 2023 · Beijing, China
Megvii (Face Detection Team)
Algorithm Intern, Megvii Research
Jun 2021 – Sep 2021 · Beijing, China

📝Selected Publications

For the full list and up-to-date citations, please check my Google Scholar.

  1. LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding AAAI 2026 · CCF A
    Xiaodong Wang, Langling Huang, Zhirong Wu, Xu Zhao, Teng Xu, Xuhong Xia, Peixi Peng✉️
    AAAI 2026
  2. LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model AAAI 2026 · CCF A
    Xiaodong Wang, Zhirong Wu, Peixi Peng✉️
    AAAI 2026
  3. Enhancing Zero-shot 3D Photography via Mesh-represented Image Inpainting ICME 2024 Oral · CCF B
    Yuejian Fang*, Xiaodong Wang*✉️
    2024 IEEE International Conference on Multimedia and Expo (ICME)
  4. Learning Invariant Representation with Consistency and Diversity for Semi-supervised Source Hypothesis Transfer ICASSP 2024 · CCF B
    Xiaodong Wang, Junbao Zhuo, Shuhao Cui, Shuhui Wang, Yuejian Fang
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  5. Learning 3D Photography Videos via Self-supervised Diffusion on Single Images (NUWA-3D) IJCAI 2023 · CCF A
    Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
    The 32nd International Joint Conference on Artificial Intelligence (IJCAI'23), 2023
  6. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Preprint · 900+ citations
    Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan
    Preprint, March 2023
  7. ORES: Open-vocabulary Responsible Visual Synthesis AAAI 2024 · CCF A
    Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan
    AAAI Conference on Artificial Intelligence, 2024
  8. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation ACL 2023 · Oral · CCF A
    Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan
    The 61st Annual Meeting of the Association for Computational Linguistics (ACL'23), Oral
  9. Revisiting Unsupervised Domain Adaptation Models: a Smoothness Perspective ACCV 2022 · CCF C
    Xiaodong Wang, Junbao Zhuo, Mengru Zhang, Shuhui Wang, Yuejian Fang
    The 16th Asian Conference on Computer Vision (ACCV'22), 2022
  10. Background Cleaning and Direction Weight in Salient Object Detection PRCV 2020 · CCF C
    Xiaodong Wang, Xiaoming Huang
    Chinese Conference on Pattern Recognition and Computer Vision (PRCV'20), 2020

🏅Awards

  • MSRA Stars of Tomorrow (Award of Excellent Intern) 2023
  • Merit Student, Peking University 2022
  • Beijing Outstanding Graduates 2021
  • President Scholarship (Highest Student Honor in BISTU) 2020
  • National Scholarship 2018

🤝Service

Reviewer

  • NeurIPS, ICML, ICLR, CVPR, AAAI