E Pluribus Unum Ex Machina: Learning from Many Collider Events at Once

The Big Picture

Imagine you’re a detective trying to identify a suspect. You could analyze one fingerprint carefully, or you could dump 50 fingerprints into a blender, study the smeared result, and reason backward to the individual. The blender approach sounds absurd. Yet that’s roughly the logic behind a growing class of machine learning proposals for particle physics: take multiple collision events from the Large Hadron Collider (LHC), merge them into a single combined snapshot, and feed them to a neural network. More data at once must be better, right?

At the LHC, protons smash together millions of times per second. Each collision sprays a unique fingerprint of particles across the detector. Physicists have been racing to apply deep learning to these events, whether to hunt for new particles, measure known ones with higher precision, or study the structure of matter itself. A handful of recent papers proposed that by processing batches of events simultaneously, machine learning models might extract patterns invisible to single-event analysis. Surely a network that sees 100 events at once can pick up on statistical regularities that a one-at-a-time approach would miss?

Benjamin Nachman and Jesse Thaler set out to test this rigorously. Their answer is clean and definitive.

Key Insight: When collider events are statistically independent (which they are, to an excellent approximation), there is no informational advantage to training a multi-event classifier over a single-event classifier. The two approaches are provably equivalent, and in practice, the single-event approach wins.

How It Works

The result turns on a concept called IID (independent and identically distributed), meaning each collision is drawn from the same rulebook of physics, with no memory of previous collisions. No collision “knows about” any other. At the LHC, this is essentially exact: the probability of observing a batch of collisions is simply the product of each individual collision’s probability.

That factorization has a direct consequence for classifiers. The likelihood ratio, a score capturing how much more or less likely an event is under the “signal” hypothesis than the “background” hypothesis, is the key quantity any optimal classifier must learn. For a set of N events, this ratio is just the product of per-event likelihood ratios. So:

An optimal per-ensemble classifier can always be built from an optimal per-instance classifier (take the product of single-event scores)
Conversely, a per-ensemble classifier can be designed to recover an optimal per-instance classifier as a byproduct

Neither direction requires training on ensembles. The information content is identical. The paper’s title, E Pluribus Unum Ex Machina (“from many, one, by machine”), is a wry nod to this equivalence: the machine can extract the one from the many, or the many from the one, with no fundamental loss.

To move beyond theory, Nachman and Thaler ran empirical comparisons on three benchmark tasks: a simple two-Gaussian classification problem, a dijet resonance search (looking for a new particle decaying to two jets of hadrons), and a top quark mass measurement (extracting the mass of the heaviest known elementary particle). In each case, they compared models trained on individual events against models trained on ensembles.

Per-instance training won across the board. The reason comes down to gradient quality: when you train on ensembles, each individual event’s contribution gets diluted across all other events in the batch. The gradient signal telling the network “this feature matters” weakens when averaged over many events simultaneously. Training on single events keeps that signal sharp.

The analysis also covers regression tasks, not just “which class?” but “what is the value of a continuous parameter, like the top quark mass?” Regression can be recast as a parametrized classification problem, and per-instance approaches remain at least as effective as per-ensemble ones here too.

Why It Matters

Several multi-event proposals in the literature claimed advantages over single-event methods, backed by compelling empirical results. Nachman and Thaler don’t dispute those results. But they show that any such advantage can be matched by a carefully constructed single-event approach. This saves physicists from building complex architectures that process event ensembles when simpler per-instance networks would do as well or better.

The paper also identifies exactly where multi-event methods might genuinely help: situations where the IID assumption breaks down. The clearest example is jet substructure, the internal structure of particle sprays created when quarks and gluons collide. Inside a jet, particle emissions are not fully independent. They’re correlated by the underlying QCD (quantum chromodynamics) radiation pattern, the rules governing how quarks and gluons shed energy as they fly apart. Ensemble approaches capturing those collective correlations might have real advantages that no single-emission classifier can replicate. The authors leave this as an open question for future work.

The framework itself, translating between per-instance and per-ensemble learning and recasting regression as parametrized classification, is also useful beyond collider physics. It applies anywhere datasets consist of independent draws from parameterized distributions.

Bottom Line: Training a single-event classifier and combining scores is not just equivalent to training a multi-event classifier; it’s actually more effective in practice. The mathematical structure of IID collider data makes the ensemble approach redundant, at least until correlations enter the picture.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work applies statistical theory to a live machine learning challenge in experimental particle physics, proving a formal equivalence between single-event and multi-event classifiers that directly guides how LHC analysis pipelines are designed.

Impact on Artificial Intelligence
The paper provides a concrete example where complex, set-based neural architectures offer no benefit over standard per-instance classifiers. For the broader ML community, it's a useful reminder that ensemble inputs can dilute gradients rather than add signal.

Impact on Fundamental Interactions
By clarifying the optimal strategy for classifying and measuring particle collision events, this research sharpens the tools physicists use to search for new particles and measure Standard Model parameters at the LHC.

Outlook and References
Future work may uncover genuine advantages for multi-event methods in jet substructure studies, where particle emissions are approximately but not exactly independent; the full analysis is available at [arXiv:2101.07263](https://arxiv.org/abs/2101.07263).

E Pluribus Unum Ex Machina: Learning from Many Collider Events at Once

Authors

Abstract

Concepts

The Big Picture

How It Works

Why It Matters

IAIFI Research Highlights