LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model

Xiaodong Wang1,2, Zhirong Wu1, Peixi Peng1,2
1 Peking University 2 Peng Cheng Laboratory

1. Long Video prediction compared with Vista

Videos in this section are: 15 seconds, 8 Hz, 480×720 resolution.

2. Short Video prediction compared with Vista

Videos in this section are: 3 seconds, 8 Hz, 480×720 resolution.

3. Trajectory Controllability Comparison with Vista

Videos in this section are: 2.5 seconds, 10 Hz, 480×720 resolution.

4. Additional Results of Trajectory Controllability

Videos in this section are: 2.5 seconds, 10 Hz, 480×720 resolution.

5. Longer Video prediction compared with Vista

Videos in this section are: 90 seconds, 8 Hz, 480×720 resolution.