AI Feynman: a Physics-Inspired Method for Symbolic Regression

The Big Picture

Johannes Kepler spent four years and made 40 failed attempts before he realized that Mars traces an ellipse around the sun. He had the data, precise astronomical tables compiled by Tycho Brahe, but extracting the underlying equation from raw numbers was brutally hard. Today, scientists face the same challenge millions of times over, staring at experimental data and asking: what formula is hiding in here?

This is symbolic regression, the task of discovering a mathematical expression that exactly matches a dataset. Not just a curve that fits the data, but the actual equation, written in symbols, that could appear in a textbook. It’s fundamentally different from what most machine learning does.

A neural network that predicts planetary positions with 99.9% accuracy is useful. But it doesn’t tell you the orbit is an ellipse. Kepler’s law, written in four symbols, does.

The trouble is that the space of possible mathematical expressions grows exponentially with length. There are more candidate formulas than atoms in the observable universe. No brute-force approach could ever work. Researchers at MIT, led by Silviu-Marian Udrescu and Max Tegmark, took a different approach: instead of searching blindly through that exponential space, they asked what physicists know about how equations tend to behave, and built those insights directly into an algorithm called AI Feynman.

Key Insight: By embedding physics-inspired heuristics (symmetry detection, dimensional analysis, separability) into a recursive neural network framework, AI Feynman discovered all 100 equations from the Feynman Lectures on Physics and improved the state-of-the-art success rate on a harder benchmark from 15% to 90%.

How It Works

The core observation driving AI Feynman is that the equations physicists care about aren’t random. They have structure. They respect units. They decompose into simpler pieces. They exhibit symmetry. AI Feynman encodes six such properties into a recursive algorithm that chips away at complex equations by exploiting whichever simplifications apply.

The algorithm works like this:

Dimensional analysis first. If the variables have known physical units, the algorithm applies the Buckingham Pi theorem, a rule from physics that lets you combine variables into unit-free ratios, reducing the number of independent variables you need to track. Newton’s law of gravity, with 9 variables, can collapse to 6 such ratios. Fewer variables means a dramatically simpler search.
Neural network fitting is the algorithmic workhorse. A standard feedforward neural network is trained on the mystery data. The network itself isn’t the answer; it’s a probe. Once trained, the algorithm uses it to test for hidden structure.
Symmetry detection uses the trained network to check whether the function remains unchanged when variables are shifted or scaled. If adding a constant to $x_3$ doesn’t change the output, then $x_3$ only appears in the formula as part of a difference, and one variable disappears. This kind of translational symmetry detection can recursively strip variables from the problem.
Separability detection checks whether the function factors into a product or sum of two parts with no shared variables. If $f(x_1, x_2, x_3) = g(x_1) \cdot h(x_2, x_3)$, the problem splits in two. The algorithm tests this by checking whether the network’s partial derivatives respect a factored structure.
Polynomial fitting handles the case where the function, or a simplified sub-function, is a polynomial. This reduces to solving a linear system: fast and exact.
Brute-force symbolic search is the last resort for small, simple sub-expressions: try all formulas up to some length using a library of elementary functions.

Newton’s gravitational law illustrates how these steps chain together. Starting with 9 variables, dimensional analysis reduces the problem to 6 unit-free combinations. The neural network then detects two translational symmetries (the force depends only on differences of coordinates, not absolute positions), dropping the count to 4 variables. Multiplicative separability splits the 4-variable problem into two smaller ones. Each gets solved independently, one by polynomial fitting after a simple inversion. The original 9-variable problem is cracked without ever searching through formulas with 9 arguments.

Why It Matters

AI Feynman recovers all 100 equations from the Feynman Lectures on Physics. The previous best publicly available software, Eureqa (based on genetic algorithms), found only 71. On a harder test set of physics-based equations, the gap widens: 90% success versus 15%. That’s a 6x improvement, not a marginal gain.

But what matters more is how the algorithm wins. AI Feynman shows that the right way to bring AI into physics isn’t to throw a generic optimizer at the problem and hope it converges. It’s to encode what physicists already know (that real equations have units, symmetries, and compositional structure) and let the AI search within that constrained, meaningful space.

Neural networks here aren’t black-box predictors. They’re scientific instruments for detecting hidden structure in data. The trained network is interrogated, not trusted: does the function have a symmetry? Does it factorize? The answers guide the decomposition. This is a different kind of human-AI collaboration, where physical intuition sets the constraints and machine learning does the grunt work.

The same idea could apply wherever underlying laws might be compact and structured, even when we don’t yet know what they are: materials science, biology, fluid dynamics.

Bottom Line: AI Feynman doesn’t just fit data better. It uses physics-inspired tricks to recursively decompose hard symbolic regression problems into solvable pieces, achieving a 6x improvement over previous methods on challenging benchmarks.

IAIFI Research Highlights

Interdisciplinary Research Achievement
AI Feynman sits squarely at the intersection of machine learning and theoretical physics, encoding symmetry, dimensional analysis, and separability into a neural network-guided symbolic regression engine that recovers real physics equations from data.

Impact on Artificial Intelligence
The work resets expectations for symbolic regression, improving success rates from 15% to 90% on hard benchmarks by replacing brute-force search with physics-inspired recursive decomposition guided by neural network probes.

Impact on Fundamental Interactions
By automatically rediscovering equations from the Feynman Lectures, including multi-variable laws like Newton's gravitation, the method opens a concrete path toward machine-assisted discovery of physical laws from experimental data.

Outlook and References
Future extensions include handling noisy data, larger equation spaces, and applications to open problems where governing equations remain unknown. The work was published in *Science Advances* (2020) and the AI Feynman benchmark dataset is publicly available. See [arXiv:1905.11481](https://arxiv.org/abs/1905.11481).

Original Paper Details

Title
AI Feynman: a Physics-Inspired Method for Symbolic Regression

arXiv ID
[1905.11481](https://arxiv.org/abs/1905.11481)

Authors
Silviu-Marian Udrescu, Max Tegmark

AI Feynman: a Physics-Inspired Method for Symbolic Regression

Authors

Abstract

Concepts

The Big Picture

How It Works

Why It Matters

IAIFI Research Highlights

Original Paper Details