EPiC-GAN: Equivariant Point Cloud Generation for Particle Jets
Authors
Erik Buhmann, Gregor Kasieczka, Jesse Thaler
Abstract
With the vast data-collecting capabilities of current and future high-energy collider experiments, there is an increasing demand for computationally efficient simulations. Generative machine learning models enable fast event generation, yet so far these approaches are largely constrained to fixed data structures and rigid detector geometries. In this paper, we introduce EPiC-GAN - equivariant point cloud generative adversarial network - which can produce point clouds of variable multiplicity. This flexible framework is based on deep sets and is well suited for simulating sprays of particles called jets. The generator and discriminator utilize multiple EPiC layers with an interpretable global latent vector. Crucially, the EPiC layers do not rely on pairwise information sharing between particles, which leads to a significant speed-up over graph- and transformer-based approaches with more complex relation diagrams. We demonstrate that EPiC-GAN scales well to large particle multiplicities and achieves high generation fidelity on benchmark jet generation tasks.
Concepts
The Big Picture
Imagine trying to photograph a fireworks show, except each burst produces a different number of sparks, traveling in different directions, at a different rate every single time. Now imagine you need to simulate millions of those shows to design better cameras. That’s roughly the challenge facing physicists at the Large Hadron Collider, where proton collisions spray out cascades of particles called jets, and no two jets are quite alike.
These collisions produce enormous volumes of data, and the analyses built on them need equally enormous simulation budgets. The most accurate tool physicists have is Monte Carlo simulation, modeling particle interactions one random virtual collision at a time. Accurate, but slow. With the upcoming high-luminosity LHC upgrade promising ten times more collisions per second, that bottleneck is about to get far worse.
Generative machine learning models offer a promising shortcut: train a neural network on what jets look like, then generate synthetic jets at graphics-card speed. The catch is that most generative models expect fixed-size, tidy inputs like images or spreadsheets. Jets aren’t tidy. They’re clouds of points, variable in number, with no natural ordering.
A new paper from researchers at the University of Hamburg and MIT’s Center for Theoretical Physics introduces EPiC-GAN (Equivariant Point Cloud Generative Adversarial Network), which handles variable-size particle clouds with high fidelity and at a speed that scales far better than competing methods.
Key Insight: EPiC-GAN generates realistic particle jets of any size by treating them as unordered point clouds. It matches the best existing methods in fidelity while running significantly faster than graph-based alternatives.
How It Works
The core idea is a building block called an EPiC layer, a computational unit that transforms both individual particles and the jet as a whole without ever directly comparing particles to each other.

Rather than having every particle “talk” to every other particle (which gets expensive fast), each EPiC layer routes communication through a shared global summary of the whole jet. The layer works in two steps:
- Global update: Pool information from all particles using a sum and mean, then use that aggregate to update the global attribute vector g, a compact representation of the entire jet’s state.
- Point update: Update each individual particle using the new global vector and the particle’s own attributes, broadcasting jet-level context back down to every constituent.
This design is permutation equivariant, meaning the result is identical regardless of what order the particles are listed in. Because no particle needs to inspect any other directly, computational cost scales linearly with particle count. Compare that to message-passing networks, where every particle checks every other, or transformer architectures (the design underlying large language models), both of which scale quadratically. At 150 particles per jet, that difference is enormous.
The GAN stacks multiple EPiC layers in both a generator and a discriminator. The generator starts from random noise (a global vector plus per-particle vectors) and progressively refines them until they resemble real jets. The discriminator does the reverse, ingesting a jet point cloud and trying to tell synthetic from real. The two networks train against each other in the standard adversarial fashion.
Why It Matters

The team benchmarked EPiC-GAN on the JetNet dataset, the standard evaluation suite for point-cloud jet generation. JetNet30 contains jets with up to 30 particles; JetNet150 scales that up fivefold. On both, EPiC-GAN matches or exceeds the fidelity of MP-GAN (message-passing GAN), which relies on full pairwise particle interactions. Momentum distributions, angular spread, jet mass, and Wasserstein distance (a measure of how closely two probability distributions match) all come out right.
The real story is timing. At 150 particles per jet, EPiC-GAN generates samples roughly five times faster than MP-GAN. At larger jet sizes, the kind that will matter for future calorimeter simulations, the gap grows wider still. Quadratic scaling cripples graph networks as jets grow; linear scaling keeps EPiC-GAN fast.
There’s an unexpected bonus, too. The shared summary vector in each EPiC layer turns out to be interpretable: individual components of g correlate with known physical quantities like jet mass and transverse momentum. Most deep generative models are black boxes, so this is a welcome exception.
The applications go well beyond jets. EPiC-GAN could speed up calorimeter simulation (modeling how particle showers deposit energy in detectors, one of the most expensive computations in collider physics), background modeling for anomaly detection searches, and tuning of Monte Carlo simulations for systematic uncertainties. Any physics problem involving variable-size point clouds is a natural fit.
Bottom Line: You don’t need expensive pairwise interactions to generate high-fidelity particle jets. By routing information through a shared global vector instead of direct particle-to-particle messages, EPiC-GAN achieves competitive accuracy at a fraction of the computational cost and scales gracefully to the larger jet sizes that next-generation collider experiments will demand.
IAIFI Research Highlights
EPiC-GAN connects deep set theory from machine learning with the physics of particle jets, using permutation equivariance to enforce physical symmetries that make the generative model both more principled and more efficient.
The EPiC layer is a new approach to generative modeling: a global latent vector that enables inter-point communication without pairwise edges, offering a scalable design pattern for point cloud generation in and beyond physics.
By enabling fast, high-fidelity jet simulation at variable multiplicity, EPiC-GAN addresses a real computational bottleneck facing the HL-LHC program and future collider experiments that will require billions of simulated events for precision analyses.
Future directions include full calorimeter shower simulation and anomaly detection background generation; the work is available at [arXiv:2301.08128](https://arxiv.org/abs/2301.08128) with open-source code on GitHub.
Original Paper Details
EPiC-GAN: Equivariant Point Cloud Generation for Particle Jets
2301.08128
["Erik Buhmann", "Gregor Kasieczka", "Jesse Thaler"]
With the vast data-collecting capabilities of current and future high-energy collider experiments, there is an increasing demand for computationally efficient simulations. Generative machine learning models enable fast event generation, yet so far these approaches are largely constrained to fixed data structures and rigid detector geometries. In this paper, we introduce EPiC-GAN - equivariant point cloud generative adversarial network - which can produce point clouds of variable multiplicity. This flexible framework is based on deep sets and is well suited for simulating sprays of particles called jets. The generator and discriminator utilize multiple EPiC layers with an interpretable global latent vector. Crucially, the EPiC layers do not rely on pairwise information sharing between particles, which leads to a significant speed-up over graph- and transformer-based approaches with more complex relation diagrams. We demonstrate that EPiC-GAN scales well to large particle multiplicities and achieves high generation fidelity on benchmark jet generation tasks.