PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Authors
Tianyuan Zhang, Hong-Xing Yu, Rundi Wu, Brandon Y. Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, William T. Freeman
Abstract
Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https://physdreamer.github.io/.
Concepts
The Big Picture
Imagine exploring a virtual garden. The roses look stunning: photorealistic petals, perfect lighting, gorgeous detail. You reach out to touch one. It does nothing. It just sits there, frozen, a beautiful but lifeless ornament.
This is the dirty secret of modern 3D graphics: we’ve gotten remarkably good at making things look real, but making them behave real is a different problem entirely.
The gap isn’t just aesthetic. For VR, robotics training, and interactive games, objects need to respond to forces the way real objects do, with the right stiffness, the right flex, the right jiggle. A rose bends differently than a rubber ball. A beanie hat deforms differently than a telephone cord. Capturing that variability means knowing the precise physical properties of each object: how stiff it is, how springy, how much it resists being pushed or pulled. And there’s no convenient database of stiffness values for every object you might want to simulate.
Researchers from MIT, Stanford, Columbia, and Cornell have found an elegant way around this bottleneck. Their system, PhysDreamer, teaches static 3D objects how to move, not by measuring material properties directly, but by borrowing physics intuition from AI video models that have absorbed millions of hours of how the real world behaves.
Key Insight: PhysDreamer extracts implicit physics knowledge from video generation models (which have learned how objects look when they move) and uses it to infer the material properties needed to simulate how objects should move under any new force.
How It Works
The core idea is almost philosophical: if a video model has watched enough footage of flowers bending in the wind, it knows, implicitly, what “flower-bending” looks like. PhysDreamer’s job is to extract that knowledge and translate it into the language of physics simulators.

The pipeline works in three stages:
-
Start with a static 3D object. PhysDreamer represents it using 3D Gaussian Splatting, a technique that models a scene as millions of tiny, fuzzy ellipsoids. The result is a photorealistic, mathematically flexible 3D representation that can be rendered from any viewpoint.
-
Ask the video model: “What would this look like if I poked it?” Given a still image and a force input, a pre-trained video diffusion model generates a plausible video of how the object might respond. This video becomes the supervision signal, a physics oracle describing what realistic motion looks like.
-
Optimize a physical material field to match those motions. PhysDreamer uses the Material Point Method (MPM), a physics simulator that tracks how materials deform when pushed or pulled, to model elastic deformation. A neural field encodes how stiff or springy each part of the object is. The system runs the simulation, renders it, compares it frame-by-frame to the video model’s output, then traces errors backward through the entire chain (from rendered image back through the simulator) to correct its material estimates. Repeat until the simulated physics matches the video-predicted physics.

The power of this approach is that every step is mathematically connected. Because both the renderer and the simulator allow errors to flow backward through their calculations, the team can compare actual rendered images directly rather than abstract motion descriptors. That’s a richer, more grounded signal.
Once the material field is estimated, it’s free. Apply any new force (a gust of wind, a finger push, a violent shake) and the physics simulator generates a physically consistent response from any camera angle. The video generation model is only needed during estimation, not at interaction time.
The team tested PhysDreamer on a deliberately diverse set of elastic objects: flowers, potted plants, a beanie hat, and a telephone cord. A user study comparing PhysDreamer against state-of-the-art alternatives found that participants consistently rated its synthesized interactions as more realistic, in both motion quality and visual fidelity.
Why It Matters
This work sits at an interesting intersection. It shows that expensive lab measurements aren’t required to characterize material properties, while also demonstrating a new use case for video generation models: not just creating content, but teaching physics.
Consider robotics, where simulation is only useful if simulated objects behave like their real counterparts. PhysDreamer offers a path toward automatically building physically accurate simulators from nothing more than images. For VR and AR, it could make the difference between environments that look immersive and environments that feel immersive. And in film and game production, artists could give static 3D scans interactive dynamics without manually tuning material parameters for every asset.
Open questions remain. The current system handles elastic materials, objects that deform and spring back. Extending to plastic deformation, fracture, or fluid dynamics would require different physics machinery. The approach also depends on video models having internalized the relevant physics, which may not hold for unusual or extreme materials. And like all neural-field methods, it requires per-object optimization, so it’s not yet real-time.
Bottom Line: PhysDreamer shows that video generation models are more than content factories. They’re repositories of implicit physical knowledge that can be distilled into simulation-ready material parameters, turning static 3D objects into physically interactive ones without a single lab measurement.
IAIFI Research Highlights
PhysDreamer connects generative AI with classical physics simulation, using video diffusion models as material priors that feed directly into differentiable MPM-based physics engines, tightly coupling learned and model-based approaches.
The work shows that video generation models implicitly encode material dynamics, and that this knowledge can be extracted through differentiable rendering for downstream simulation tasks.
By estimating spatially-varying elastic material fields from purely visual data, PhysDreamer contributes to the problem of inverse physics: inferring physical properties from observation rather than direct measurement.
Future work may extend PhysDreamer to non-elastic materials, real-time interaction, and broader object categories; the paper and project videos are available at physdreamer.github.io.
Original Paper Details
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
[arXiv:2404.13026](https://arxiv.org/abs/2404.13026)
["Tianyuan Zhang", "Hong-Xing Yu", "Rundi Wu", "Brandon Y. Feng", "Changxi Zheng", "Noah Snavely", "Jiajun Wu", "William T. Freeman"]
Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https://physdreamer.github.io/.