MoVerse

Real-Time Video World Modeling with Panoramic Gaussian Scaffold

Orange Team, Youku Moku-Lab, HUJING Digital Media & Entertainment Group

Yang Zhou Ziheng Wang Yuqin Lu Haofeng Liu Jun Liang Shengfeng He Jing Li

One image. One world. Real time.
moverse --input single_image.jpg --output navigable_world/
Generating 360° panorama...
○ Building 3D panoramic Gaussian scaffold
○ Rendering navigable world
~/moverse.mp4
X Y A B
+
> 01_PIPELINE overview

Given a single narrow-field-of-view image, MoVerse separates world construction from observation rendering. Stages I and II build a reusable panoramic 3D Gaussian scaffold offline; Stage III translates scaffold renderings along user-specified camera trajectories into photorealistic video at 8 FPS on a single RTX 4090.

MoVerse pipeline: (a) Panoramic Generation → (b) Gaussian Generation & Rendering → (c) Autoregressive Video Refinement
INPUTNFOV image
STAGE_IPanorama generation
STAGE_II3DGS scaffold
STAGE_IIIVideo render
OUTPUTReal-time roam
> 02_ROAMING interactive scenes / real-time output

Select a scene below and watch MoVerse turn a single input photograph into a free-roaming video walkthrough. The camera trajectory is user-controlled; the scaffold keeps geometry consistent across revisits, while the causal renderer streams temporally coherent frames in real time.

~/roam/alcove.mp4
> 03_PANORAMA stage I — single image → 360° ERP

Stage I expands the input image into a gravity-aligned, horizontally periodic 360° panorama with topology-aware latent diffusion. The resulting panorama is the omnidirectional evidence that the 3D scaffold lifts.

~/panorama/bridge — input → 360° ERP → interactive viewer
input NFOV image
generated ERP panorama
↔ drag to explore
> 04_SCAFFOLD stage II — panorama → 3D Gaussian scaffold

Stage II lifts the panorama into a panoramic 3D Gaussian scaffold using feed-forward residual prediction in angular–inverse-depth space. The scaffold is a persistent, splattable scene asset and is what the video renderer in Stage III conditions on along the user-specified trajectory.

~/scaffold/alcove.ply — drag to rotate, scroll to zoom
> 05_CITE bibtex
@article{moverse2026, title = {MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold}, author = {Yang Zhou, Ziheng Wang, Yuqin Lu, Haofeng Liu, Jun Liang, Shengfeng He, and Jing Li}, journal = {arXiv preprint arXiv:2606.13376}, year = {2026} }