* This project page contains a large number of videos, please wait patiently for them to load.
Precise camera pose control is critical for video diffusion, yet maintaining geometric consistency remains a challenge. Existing methods that directly inject numerical camera parameters into the diffusion backbone often fail to bridge the gap between abstract coordinates and visual content, leading to structural distortions. To address this issue, we propose CameraNoise, a flow-to-noise warping method that encodes camera motion into a temporally coherent stochastic representation. Unlike conventional conditioning, CameraNoise embeds camera poses directly into the noise space. This decouples motion from scene appearance while faithfully preserving trajectory dynamics. Specifically, we introduce a novel Geometry-guided Reprojection Flow and a noise warping algorithm, which jointly preserve the Gaussian prior of diffusion and ensure consistent noise propagation under camera transformations. By integrating CameraNoise into the diffusion process, our framework delivers stable, high-fidelity videos. Extensive experiments demonstrate that our approach significantly outperforms prior methods in both visual quality and trajectory faithfulness.
Camera1: Move-Up Shot.
Camera2: Counterclockwise Rotation Shot.
Camera3: Move-Down Shot.
Inference Prompts:
1) Low speed: a boat splashing down a steep water slope, huge arcs of water frozen in the air, wet rails, bright reflections.
2) Medium speed: a boat racing rapidly down a steep water slope, explosive arcs of water bursting into the air, wet rails, bright reflections.
3) High speed: a boat speeding down a steep water slope at high speed, massive splashes frozen in midair, wet rails, bright reflections.
Scene1: A vibrant forest scene is filled with various birds flying, surrounded by trees, green mossy ground, and sunlight.
Scene2: A golden retriever stands in a sunlit grassy field, with trees and open green space in the background.
Scene3: A cowboy rides a horse along a winding dirt road through a golden and sunlit field with fences.
We observe that all these methods exhibit varying degrees of degradation when applied to new scenes:
1) CameraCtrl shows declines in both camera control accuracy and visual content quality;
2) MotionCtrl almost completely loses its camera control capability in the new scenes;
3) Go-with-the-Flow suffers from a noticeable drop in visual content quality.
Camera pose type 1: move-up shot.
Camera pose type 2: move-down shot.
Camera pose type 3: move-left shot.
Camera pose type 4: move-right shot.
Camera pose type 5: move-clockwise shot.
Camera pose type 6: move-in shot.
Based on these results, we summarize the characteristics and limitations of current mainstream methods under OOD conditions:
1) MotionCtrl and CameraCtrl: exhibit large deviations in camera following, indicating limited robustness in camera control;
2) Go-with-the-Flow: prone to excessive camera motion and occasional content collapse;
3) GEN3C: produces static scenes where objects cannot move, resulting in rigid video content. Additionally, due to its reliance on 3D feature modeling, it is susceptible to scene penetration issues (e.g., camera pose 5);
4) Our method: demonstrates superior performance in OOD scenarios in terms of camera control accuracy, content consistency, and motion dynamics.
@article{zhao2026cameranoise,
title={CameraNoise: Enabling Faithful Camera Control in Video Diffusion through Geometry-Flow-Guided Noise Warping},
author = {Zhao, Haoyu and Gu, Jiaxi and Chen, Haoran and Zheng, Qingping and Jin, Yeying and Yang, Hongyi and Cheng, Junqi and Zhang, Yuang and Lu, Zenghui and Yu, Huan and Jiang, Jie and Shu, Peng and Wu, Zuxuan and Jiang, Yu-Gang},
journal={Forty-third International Conference on Machine Learning (ICML)},
year={2026}
}