Xiaodong Wang | Video Understanding & Generation

👋About

I am a CS Ph.D. candidate in the School of Computer Science & School of Electronic and Computer Engineering, Peking University. I am fortunate to be advised by Prof. Peixi Peng.

Before that, I received my M.S. degree in Software Engineering from Peking University, and my B.S. degree in Data Science from Beijing Information Science and Technology University, graduating with the President Scholarship, the highest student honor.

I am currently an intern at ByteDance, serving as the student leader in a collaborative research project. Previously, I have been an intern at the Microsoft Research Asia (MSRA), working closely with Dr. Chenfei Wu and Dr. Nan Duan, SenseTime Research, Megvii Research, and Institute of Computing Technology, Chinese Academy of Sciences.

Research interests

Video Understanding Video Generation

In short, I enjoy teaching models how to watch videos, imagine new worlds, and talk about them like a curious researcher.

Highlights Projects

• Open-R1-Video (the 1st Open-Source R1-like Video-LLM, 370+ GitHub⭐)
• Visual ChatGPT (the 1st multimodal AI agent, 34k+ GitHub⭐, 900+ Citations📑 )

If you see opportunities for collaboration, please feel free to email me — I'd love to have a coffee chat.

🎓Education & Experiences

Education

Peking University

CS Ph.D. candidate

Sep 2024 – Jul 2028 (expected)

Peking University

M.S. in Software Engineering (Exam-free admission)

Beijing, China · GPA: 3.66 / 4.00

Sep 2021 – Jul 2024

Beijing Information Science and Technology University

B.S. in Data Science

Beijing, China · GPA: 4.43 / 5.00 · Rank: 1 / 32

Sep 2017 – Jul 2021

Professional Experiences

ByteDance

Research Intern, Douyin Group

SenseTime

Research Intern, SenseTime-FVG

Jan 2024 – Aug 2024 · Beijing, China

Microsoft Research Asia (MSRA)

Research Intern, Natural Language Computing Group
Supervised by Chenfei Wu & Nan Duan

May 2022 – Nov 2023 · Beijing, China

Megvii (Face Detection Team)

Algorithm Intern, Megvii Research

Jun 2021 – Sep 2021 · Beijing, China

📝Selected Publications

For the full list and up-to-date citations, please check my Google Scholar.

LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding AAAI 2026 · CCF A

Xiaodong Wang, Langling Huang, Zhirong Wu, Xu Zhao, Teng Xu, Xuhong Xia, Peixi Peng✉️

AAAI 2026

PDF
LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model AAAI 2026 · CCF A

Xiaodong Wang, Zhirong Wu, Peixi Peng✉️

AAAI 2026

AAAI-version ArXiv-version
Reinforcement Learning for Versatile Video Reasoning Capabilities in Base Multimodal LLMs Preprint

Xiaodong Wang, Zhirong Wu, Langling Huang, Yuxi Zheng, Jinfa Huang, Peixi Peng✉️

OpenReview 2025.11

PDF
LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs Preprint

Xiaodong Wang, Jinfa Huang, Li Yuan, Peixi Peng✉️

ArXiv 2025.06

PDF
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos Preprint

Xiaodong Wang, Peixi Peng✉️

ArXiv 2025.05

PDF
Enhancing Zero-shot 3D Photography via Mesh-represented Image Inpainting ICME 2024 Oral · CCF B

Yuejian Fang*, Xiaodong Wang*✉️

2024 IEEE International Conference on Multimedia and Expo (ICME)
Learning Invariant Representation with Consistency and Diversity for Semi-supervised Source Hypothesis Transfer ICASSP 2024 · CCF B

Xiaodong Wang, Junbao Zhuo, Shuhao Cui, Shuhui Wang, Yuejian Fang

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images (NUWA-3D) IJCAI 2023 · CCF A

Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

The 32nd International Joint Conference on Artificial Intelligence (IJCAI'23), 2023

PDF Supplement
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Preprint · 900+ citations

Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan

Preprint, March 2023

arXiv GitHub (34k⭐)
ORES: Open-vocabulary Responsible Visual Synthesis AAAI 2024 · CCF A

Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan

AAAI Conference on Artificial Intelligence, 2024

PDF Code
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation ACL 2023 · Oral · CCF A

Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

The 61st Annual Meeting of the Association for Computational Linguistics (ACL'23), Oral

PDF Project page
Revisiting Unsupervised Domain Adaptation Models: a Smoothness Perspective ACCV 2022 · CCF C

Xiaodong Wang, Junbao Zhuo, Mengru Zhang, Shuhui Wang, Yuejian Fang

The 16th Asian Conference on Computer Vision (ACCV'22), 2022

PDF Code
Background Cleaning and Direction Weight in Salient Object Detection PRCV 2020 · CCF C

Xiaodong Wang, Xiaoming Huang

Chinese Conference on Pattern Recognition and Computer Vision (PRCV'20), 2020

PDF

🏅Awards

MSRA Stars of Tomorrow (Award of Excellent Intern) 2023
Merit Student, Peking University 2022
Beijing Outstanding Graduates 2021
President Scholarship (Highest Student Honor in BISTU) 2020
National Scholarship 2018

🤝Service

Reviewer

NeurIPS, ICML, ICLR, CVPR, AAAI