👋About
I am a CS Ph.D. candidate in the School of Computer Science & School of Electronic and Computer Engineering, Peking University. I am fortunate to be advised by Prof. Peixi Peng.
Before that, I received my M.S. degree in Software Engineering from Peking University, and my B.S. degree in Data Science from Beijing Information Science and Technology University, graduating with the President Scholarship, the highest student honor.
I am currently an intern at ByteDance, serving as the student leader in a collaborative research project. Previously, I have been an intern at the Microsoft Research Asia (MSRA), working closely with Dr. Chenfei Wu and Dr. Nan Duan, SenseTime Research, Megvii Research, and Institute of Computing Technology, Chinese Academy of Sciences.
Research interests
In short, I enjoy teaching models how to watch videos, imagine new worlds, and talk about them like a curious researcher.
Highlights Projects
• Open-R1-Video (the 1st Open-Source R1-like Video-LLM, 370+ GitHub⭐)• Visual ChatGPT (the 1st multimodal AI agent, 34k+ GitHub⭐, 900+ Citations📑 )
If you see opportunities for collaboration, please feel free to email me — I'd love to have a coffee chat.
🎓Education & Experiences
Education
Professional Experiences
📝Selected Publications
For the full list and up-to-date citations, please check my Google Scholar.
-
LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding AAAI 2026 · CCF AAAAI 2026
-
LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model AAAI 2026 · CCF AAAAI 2026
-
Enhancing Zero-shot 3D Photography via Mesh-represented Image Inpainting ICME 2024 Oral · CCF B2024 IEEE International Conference on Multimedia and Expo (ICME)
-
Learning Invariant Representation with Consistency and Diversity for Semi-supervised Source Hypothesis Transfer ICASSP 2024 · CCF BIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
-
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images (NUWA-3D) IJCAI 2023 · CCF AThe 32nd International Joint Conference on Artificial Intelligence (IJCAI'23), 2023
-
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Preprint · 900+ citationsPreprint, March 2023
-
ORES: Open-vocabulary Responsible Visual Synthesis AAAI 2024 · CCF AAAAI Conference on Artificial Intelligence, 2024
-
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation ACL 2023 · Oral · CCF AThe 61st Annual Meeting of the Association for Computational Linguistics (ACL'23), Oral
-
Revisiting Unsupervised Domain Adaptation Models: a Smoothness Perspective ACCV 2022 · CCF CThe 16th Asian Conference on Computer Vision (ACCV'22), 2022
-
Background Cleaning and Direction Weight in Salient Object Detection PRCV 2020 · CCF CChinese Conference on Pattern Recognition and Computer Vision (PRCV'20), 2020
🏅Awards
- MSRA Stars of Tomorrow (Award of Excellent Intern) 2023
- Merit Student, Peking University 2022
- Beijing Outstanding Graduates 2021
- President Scholarship (Highest Student Honor in BISTU) 2020
- National Scholarship 2018
🤝Service
Reviewer
- NeurIPS, ICML, ICLR, CVPR, AAAI