Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample

The Big Picture

Imagine trying to understand a city’s population by counting how many buildings stand within a mile of each other. You’d miss the rich neighborhoods clustered near the park, the sparse suburbs, the dense downtown core.

For decades, cosmologists have measured galaxies using the two-point correlation function, which counts how often galaxy pairs appear at different separations. It works well, but it treats dense cosmic filaments and empty voids as equally informative. They’re not.

The universe isn’t smooth. After 13.8 billion years of gravitational collapse, matter has clumped into a vast cosmic web of filaments, sheets, clusters, and enormous empty voids. All that structure encodes secrets about dark matter, dark energy, and the growth rate of the universe. Classic two-point statistics handle large scales well but fall short at small scales, where gravity creates complex, tangled structures that resist clean mathematical description. Cosmologists have been leaving information on the table.

A team led by Enrique Paillas, including IAIFI’s Carolina Cuesta-Lazaro, found a way around this limitation. They combined density-split clustering with machine-learning emulators to pin down the universe’s fundamental properties more precisely than before from the BOSS CMASS galaxy survey.

Key Insight: By splitting the galaxy field into regions of different local density and measuring clustering statistics in each environment separately, researchers capture non-Gaussian information that standard two-point analyses miss, squeezing more science from existing survey data.

How It Works

The idea is simple. Rather than treating all galaxies equally, the team first smoothed the galaxy density field and divided space into quintiles: five bins ranked by local galaxy density, from the emptiest voids to the most crowded clusters. Correlation functions measured in each environment carry distinct cosmological fingerprints.

To make this practically useful, they needed a theoretical model that could predict density-split correlation functions for any set of cosmological parameters without running a full N-body simulation (a computer model tracking how millions of particles attract and clump under gravity over cosmic time) every single time. Their solution: train neural network emulators on a library of high-fidelity mock galaxy catalogues. These emulators learn to interpolate across a vast parameter space, predicting clustering statistics in milliseconds rather than months.

The pipeline involves:

N-body simulations of dark matter clustering across a range of cosmological parameters
Halo occupation distribution (HOD) modeling, a statistical recipe for deciding which dark matter halos are massive or concentrated enough to host a galaxy, and how many
Forward modeling that translates theoretical predictions into telescope-observable signals
Corrections for redshift-space distortions (how galaxies’ velocities smear their apparent distances along the line of sight) and the Alcock-Paczynski effect (geometric distortions from assuming the wrong cosmology)
Emulators trained to predict both the standard two-point function and density-split statistics simultaneously

The galaxy sample, BOSS DR12 CMASS, covers roughly 650,000 massive galaxies at redshifts z ≈ 0.43–0.70 and is one of the richest datasets available for this kind of analysis. The team pushed measurements down to scales of just 1 h⁻¹ Mpc, deep into the regime where gravity creates messy, nonlinear structure but where their emulators remain accurate.

One important piece of the analysis was modeling assembly bias: whether galaxies’ tendency to occupy halos depends not just on halo mass but also on the surrounding environment. The team found genuine signatures. Galaxies prefer halos that are more concentrated or differently shaped depending on local density. It’s a subtle effect, but ignoring it would bias the cosmological results.

Why It Matters

The results land squarely in agreement with Planck 2018. The key numbers: ω_cdm = 0.1201 ± 0.0022 (cold dark matter density), σ_8 = 0.792 ± 0.034 (how clumpy matter is today), n_s = 0.970 ± 0.018 (the tilt of primordial fluctuations), and a growth rate fσ_8 = 0.462 ± 0.020 (how fast cosmic structure has been building). That agreement matters. The universe’s growth rate, as traced by these massive galaxies, is consistent with predictions from the CMB (the faint thermal glow left over from the Big Bang), easing some of the tensions that have dogged cosmology in recent years.

The team also tested whether any physics beyond standard ΛCDM (the current best model of the universe, built from ordinary matter, cold dark matter, and a constant dark energy) was hiding in the data. They varied the running of the spectral index, the dark energy equation of state, and the density of relic neutrinos. None showed statistically compelling deviations from the base model. A clean null result, but one that still puts meaningful constraints on exotic physics.

They validated their pipeline against mock catalogues 84 times larger than the actual CMASS volume and recovered unbiased constraints even at that scale. The next generation of galaxy surveys from DESI, Euclid, and the Vera Rubin Observatory will deliver exactly that kind of expanded dataset, and this methodology is ready for them.

Bottom Line: Density-split clustering, powered by machine-learning emulators, extracts cosmological information from scales and environments that traditional two-point analyses ignore. The technique is already validated for the much larger galaxy surveys arriving in the next decade.

IAIFI Research Highlights

Interdisciplinary Research Achievement
This work ties together AI and cosmology by deploying neural network emulators, trained on N-body simulations, to perform Bayesian inference on one of the largest galaxy surveys ever assembled. A problem that would be computationally prohibitive with brute-force simulation becomes tractable through learned surrogate models.

Impact on Artificial Intelligence
The emulator framework demonstrates how machine learning can stand in for expensive forward simulations in high-dimensional parameter inference, an approach transferable to any field that depends on simulation-based analysis.

Impact on Fundamental Interactions
By probing galaxy clustering at nonlinear scales through environment-dependent statistics, the analysis delivers competitive constraints on dark matter density, the growth rate of cosmic structure, and extensions to ΛCDM including dark energy and neutrino physics.

Outlook and References
With DESI and Euclid already collecting data, this pipeline, validated at 84× the CMASS volume, is positioned to sharpen cosmological constraints in the near future; the paper is available at [arXiv:2309.16541](https://arxiv.org/abs/2309.16541).

Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample

Authors

Abstract

Concepts

The Big Picture

How It Works

Why It Matters

IAIFI Research Highlights