SmartDirector

Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

Zhida Zhang¹, Jie Ma², Zhan Peng³, Yang Han², Haoxue Wu², Jun Liang², Jie Cao¹, Jing Li²

¹ New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences ² Youku Moku-Lab ³ Huazhong University of Science and Technology

Paper GitHub HF Model

ABSTRACT

The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantly rely on sparse conditioning signals such as text prompts or first/last frames, which limits precise control over narrative structure and temporal pacing.

In this paper, we propose SmartDirector, a framework that enhances the narrative capacity of video generation models through multiple keyframes. SmartDirector supports flexible generation scenarios including single-shot generation, multi-shot narrative synthesis, and video extension. The framework operates in two stages: Director-Gen generates a low-resolution video conditioned on the provided keyframes, and Director-SR refines the output by exploiting high-resolution keyframes as semantic anchors to recover fine-grained details. To enable robust multi-keyframe training, we construct a data pipeline that curates single-shot and multi-shot sequences from movies. Extensive experiments demonstrate that SmartDirector substantially outperforms existing state-of-the-art approaches. We will release the code to facilitate further research.

MULTI-KEYFRAME GENERATION

Given multiple keyframes as conditions, SmartDirector generates coherent videos with smooth transitions and consistent narratives across shots.

SINGLE-FRAME GENERATION

Given a single keyframe at any temporal position, SmartDirector generates a complete video. The keyframe thumbnail position indicates where it falls in the generated video timeline.

VIDEO & MIXED-MODAL CONDITIONED GENERATION

SmartDirector supports video-conditioned generation — forward continuation, backward generation, and in-between interpolation.

Forward: Video → Future

Backward: Video → Past

In-Between Interpolation

NARRATIVE PACING CONTROL

Using the same set of keyframes, SmartDirector can generate videos with different narrative pacing styles — from slow, suspenseful tension to fast-paced action sequences.

▲ Shared Input Keyframes

SUSPENSE

This cyberpunk noir prompt depicts a tense rainy alley scene where slow, suffocating dolly-ins and hesitant push-ins amplify psychological dread as a vigilant woman transitions to resolve.

t=0.0s

t=3.0s

t=4.2s

DOCUMENTARY

Adopting a neutral cyberpunk noir aesthetic, this prompt uses steady tracking and arcing shots to objectively observe a woman's shift from alertness to determination in a rainy alley.

t=0.0s

t=1.8s

t=3.4s

ACTION

This high-intensity cyberpunk noir prompt captures a woman's aggressive movement through a rainy alley using rapid zooms, high-frequency camera shake, and sharp audio cues to maximize kinetic energy.

t=0.0s

t=1.0s

t=2.2s

VIDEO SUPER-RESOLUTION

By leveraging keyframe-conditioned generation, SmartDirector anchors identity information from reference frames during super-resolution, enabling identity-consistent restoration of degraded facial details and corrupted text — a capability beyond conventional SR methods.

Original Ours

L: 0 / 0 100% R:

Drag to compare · Scroll to zoom · Right-click drag to pan

Original Ours

L: 0 / 0 100% R:

Drag to compare · Scroll to zoom · Right-click drag to pan

ETHICAL CONSIDERATIONS

The insertion condition images and videos used in these examples are sourced from publicly available channels or generated by models, and are intended solely to demonstrate the capabilities of this research. If there are any concerns, please contact us (wuhaoxue.whx@alibaba-inc.com) and we will remove the relevant examples in time.

CITATION

@article{zhang2026smartdirector, title = {SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control}, author = {Zhida, Zhang and Jie, Ma and Zhan, Peng and Haoxue, Wu and Yang, Han and Jun, Liang and Jie, Cao and Jing, Li}, journal = {arXiv preprint arXiv:2605.27891}, year = {2026} } }