LaVR

Generate views of a dynamic scene from a novel camera trajectory

Baseline method (middle) vs Ours (right). Click to select different baselines and scenes.

Video Demos

ReCamMaster makes the man clips through the table, while Gen3C and TrajectoryCrafter distort his body.

Our method better preserves the shape of both the candles and the table.

Our method is the only one to correctly render the front of the tea table.

ReCamMaster hallucinates an extra tail on the side of its body, while Gen3C and TrajectoryCrafter distort the cat.

ReCamMaster hallucinates an extra arm, while Gen3C and TrajectoryCrafter distort the human body.

ReCamMaster and TrajectoryCrafter make the cars disappear or run off the road;
The camera trajectory of Gen3C's rendered video does not seem to be an upward arc --- its rotation is not correct.

ReCamMaster dislocates the basketball rim into the backboard, while TrajectoryCrafter distorts the entire hoop.

ReCamMaster incorrectly renders the shoe on the desk;
Gen3C's rendered scene seems to be wiggling unnaturally near the end of the video;
TrajectoryCrafter shows the wrong table geometry.

ReCamMaster incorrectly shows that a woman is standing besides the fridge when she should be by the table.
Gen3C shows a weird floating lamp and incorrectly stretches the body and hand of the woman in the middle.
TrajectoryCrafter distorts the fridge.

ReCamMaster hallucinates some spurious objects on the right;
Gen3C and TrajectoryCrafter severely distort the wooden strips behind the person.

ReCamMaster shows the wrong hand position near the end of the video --- it should be near her jaw.
Gen3C and TrajectoryCrafter distorts the woman's body.

ReCamMaster hallucinates unrealistic structures for the curtain in the back;
Gen3C and TrajectoryCrafter distort the people in the front.

ReCamMaster shows flickering on the laptop screen and disolves the steel frame outside the window;
Gen3C and TrajectoryCrafter make the laptop edges curved rather than rectangular.

Proposed Paradigm vs. Baselines'

(a) Non-3D(4D)-conditioned methods, e.g., ReCamMaster, achieve high visual quality but lack geometric awareness, leading to inconsistencies.

(b) Conditioning on 4D point cloud renders, as in Gen3C, TrajectoryCrafter, provides consistency, but reduces quality as point clouds re-rendering is sensitive to depth estimation errors.

(c) Our proposed architecture utilizes the implicit geometric knowledge of a pre-trained large 4D reconstruction model (LRM) to achieve both high quality and consistency.

LaVR: Latent Space Conditioned Video Re-rendering using
Large 4D Reconstruction Models

Mingyang Xie^1,2 Numair Khan¹ Tianfu Wang² Naina Dhingra¹ Seonghyeon Nam¹ Haitao Yang¹
Zhuo Hui¹ Christopher Metzler² Andrea Vedaldi^1,3 Hamed Pirsiavash^1,4 Lei Luo¹

¹Meta Reality Labs ²University of Maryland ³University of Oxford ⁴UC Davis

Paper arXiv

Generate views of a dynamic scene from a novel camera trajectory

Video Demos

Proposed Paradigm vs. Baselines'

Quantitative Results

Our re-rendered videos exhibit better consistency and more accurate camera trajectories than baselines'.

LaVR: Latent Space Conditioned Video Re-rendering using Large 4D Reconstruction Models

Mingyang Xie1,2 Numair Khan1 Tianfu Wang2 Naina Dhingra1 Seonghyeon Nam1 Haitao Yang1 Zhuo Hui1 Christopher Metzler2 Andrea Vedaldi1,3 Hamed Pirsiavash1,4 Lei Luo1

1Meta Reality Labs 2University of Maryland 3University of Oxford 4UC Davis

Paper arXiv

Generate views of a dynamic scene from a novel camera trajectory

Video Demos

Proposed Paradigm vs. Baselines'

Quantitative Results

Our re-rendered videos exhibit better consistency and more accurate camera trajectories than baselines'.

LaVR: Latent Space Conditioned Video Re-rendering using
Large 4D Reconstruction Models

Mingyang Xie^1,2 Numair Khan¹ Tianfu Wang² Naina Dhingra¹ Seonghyeon Nam¹ Haitao Yang¹
Zhuo Hui¹ Christopher Metzler² Andrea Vedaldi^1,3 Hamed Pirsiavash^1,4 Lei Luo¹

¹Meta Reality Labs ²University of Maryland ³University of Oxford ⁴UC Davis