MoVerse

Real-Time Video World Modeling with Panoramic Gaussian Scaffold

Orange Team, Youku Moku-Lab, HUJING Digital Media & Entertainment Group

Yang Zhou Ziheng Wang Yuqin Lu Haofeng Liu Jun Liang Shengfeng He Jing Li

One image. One world. Real time.

❯ moverse --input single_image.jpg --output navigable_world/

⠋ Generating 360° panorama...

○ Building 3D panoramic Gaussian scaffold

○ Rendering navigable world

−

~/moverse.mp4

X Y A B

> 01_PIPELINE overview

Given a single narrow-field-of-view image, MoVerse separates world construction from observation rendering. Stages I and II build a reusable panoramic 3D Gaussian scaffold offline; Stage III translates scaffold renderings along user-specified camera trajectories into photorealistic video at 8 FPS on a single RTX 4090.

MoVerse pipeline: (a) Panoramic Generation → (b) Gaussian Generation & Rendering → (c) Autoregressive Video Refinement

INPUTNFOV image

→

STAGE_IPanorama generation

→

STAGE_II3DGS scaffold

→

STAGE_IIIVideo render

→

OUTPUTReal-time roam

> 02_ROAMING interactive scenes / real-time output

Select a scene below and watch MoVerse turn a single input photograph into a free-roaming video walkthrough. The camera trajectory is user-controlled; the scaffold keeps geometry consistent across revisits, while the causal renderer streams temporally coherent frames in real time.

~/roam/alcove.mp4

> 03_PANORAMA stage I — single image → 360° ERP

Stage I expands the input image into a gravity-aligned, horizontally periodic 360° panorama with topology-aware latent diffusion. The resulting panorama is the omnidirectional evidence that the 3D scaffold lifts.

~/panorama/bridge — input → 360° ERP → interactive viewer

↔ drag to explore

> 04_SCAFFOLD stage II — panorama → 3D Gaussian scaffold

Stage II lifts the panorama into a panoramic 3D Gaussian scaffold using feed-forward residual prediction in angular–inverse-depth space. The scaffold is a persistent, splattable scene asset and is what the video renderer in Stage III conditions on along the user-specified trajectory.

~/scaffold/alcove.ply — drag to rotate, scroll to zoom

> 05_CITE bibtex

@article{moverse2026,
  title   = {MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold},
  author  = {Yang Zhou, Ziheng Wang, Yuqin Lu, Haofeng Liu, Jun Liang, Shengfeng He, and Jing Li},
  journal = {arXiv preprint arXiv:2606.13376},
  year    = {2026}
}