Deep Learning + Deep Thinking = Deeper Understanding

Navigate the landscape of IAIFI research through summaries and connections.

574 papers 150 Theoretical Physics 70 Experimental Physics 202 Foundational AI 152 Astrophysics

Latest

Theoretical Physics Mar 2, 2026

Naturalness and Fisher Information

James Halverson, Thomas R. Harvey, Michael Nee

Fine-tuning and naturalness, the sensitivity of low-energy observables to small changes in the fundamental parameters of a theory, are cornerstones of physics beyond the Standard Model. We propose a new measure of fine-tuning based on information theory. To each point in parameter space we associate a probability distribution over observables. Divergence measures encode the sensitivity of observables to model parameters and determine a Riemannian metric on parameter space. By Chentsov's theorem, the physically motivated metric is the Fisher information metric, up to scaling. We propose a rescaled fine-tuning matrix $\mathcal{F}_{ij}$ derived from the Fisher information matrix, whose non-zero eigenvalues serve as our measure of fine-tuning. When the number of observables exceeds the number of parameters, $\mathcal{F}_{ij}$ admits a natural geometric interpretation as the pullback of the Euclidean metric from observable space to the submanifold of admissible predictions, with large eigenvalues corresponding to highly stretched directions and indicative of fine-tuning. Our measure reproduces the familiar Barbieri--Giudice criterion as a special case, while generalising it to multiple correlated parameters. We illustrate its behaviour on dimensional transmutation, the Wilson--Fisher fixed point, a simple model of the hierarchy problem, and the electron Yukawa coupling, finding agreement with physical intuition in each case.

fisher information metric naturalness measure information geometry eigenvalue decomposition quantum field theory effective field theory renormalization standard model
Experimental Physics Feb 27, 2026

End-to-end Differentiable Calibration and Reconstruction for Optical Particle Detectors

Omar Alterkait, César Jesús-Valls, Ryo Matsumoto et al.

Large-scale homogeneous detectors with optical readouts are widely used in particle detection, with Cherenkov and scintillator neutrino detectors as prominent examples. Analyses in experimental physics rely on high-fidelity simulators to translate sensor-level information into physical quantities of interest. This task critically depends on accurate calibration, which aligns simulation behavior with real detector data, and on tracking, which infers particle properties from optical signals. We present the first end-to-end differentiable optical particle detector simulator, enabling simultaneous calibration and reconstruction through gradient-based optimization. Our approach unifies simulation, calibration, and tracking, which are traditionally treated as separate problems, within a single differentiable framework. We demonstrate that it achieves smooth and physically meaningful gradients across all key stages of light generation, propagation, and detection while maintaining computational efficiency. We show that gradient-based calibration and reconstruction greatly simplify existing analysis pipelines while matching or surpassing the performance of conventional non-differentiable methods in both accuracy and speed. Moreover, the framework's modularity allows straightforward adaptation to diverse detector geometries and target materials, providing a flexible foundation for experiment design and optimization. The results demonstrate the readiness of this technique for adoption in current and future optical detector experiments, establishing a new paradigm for simulation and reconstruction in particle physics.

detector simulation differentiable simulation calibration event reconstruction automatic differentiation
Foundational AI Feb 25, 2026

Asymptotically Fast Clebsch-Gordan Tensor Products with Vector Spherical Harmonics

YuQing Xie, Ameya Daigavane, Mit Kotak et al.

$E(3)$-equivariant neural networks have proven to be effective in a wide range of 3D modeling tasks. A fundamental operation of such networks is the tensor product, which allows interaction between different feature types. Because this operation scales poorly, there has been considerable work towards accelerating this interaction. However, recently \citet{xieprice} have pointed out that most speedups come from a reduction in expressivity rather than true algorithmic improvements on computing Clebsch-Gordan tensor products. A modification of Gaunt tensor product \citep{gaunt} can give a true asymptotic speedup but is incomplete and misses many interactions. In this work, we provide the first complete algorithm which truly provides asymptotic benefits Clebsch-Gordan tensor products. For full CGTP, our algorithm brings runtime complexity from the naive $O(L^6)$ to $O(L^4\log^2 L)$, close to the lower bound of $O(L^4)$. We first show how generalizing fast Fourier based convolution naturally leads to the previously proposed Gaunt tensor product \citep{gaunt}. To remedy antisymmetry issues, we generalize from scalar signals to irrep valued signals, giving us tensor spherical harmonics. We prove a generalized Gaunt formula for the tensor harmonics. Finally, we show that we only need up to vector valued signals to recover the missing interactions of Gaunt tensor product.

equivariant neural networks clebsch-gordan tensor product group theory vector spherical harmonics geometric deep learning
Experimental Physics Feb 19, 2026

Building an AI-native Research Ecosystem for Experimental Particle Physics: A Community Vision

Thea Klaeboe Aarrestad, Alaa Abdelhamid, Haider Abidi et al.

Experimental particle physics seeks to understand the universe by probing its fundamental particles and forces and exploring how they govern the large-scale processes that shape cosmic evolution. This whitepaper presents a vision for how Artificial Intelligence (AI) can accelerate discovery in this field. We outline grand challenges that must be addressed to enable transformative breakthroughs and describe how current and planned experimental facilities can implement this vision to advance our understanding of the vast and complex physical world from the smallest to the largest scales. We show how facilities currently under construction, such as the HL-LHC, DUNE and soon EIC, can both benefit from and serve as proving grounds for this vision, while also enabling a longer-term goal for how future experiments -- like FCC-ee at CERN, IceCube-Gen2, a Muon Collider in the U.S., and smaller to mid-scale projects -- can be fully AI-native. We describe how a truly national-scale collaboration, jointly managed across large funding partners, and involving both DOE laboratories and universities, can make this happen.

collider physics trigger systems event reconstruction detector simulation particle tracking

All Posts

Foundational AI Feb 17, 2026

Machine learning electronic structure and atomistic properties from the external potential

Jigyasa Nigam, Tess Smidt, Geneviève Dusson

Electronic structure calculations remain a major bottleneck in atomistic simulations and, not surprisingly, have attracted significant attention in machine learning (ML). Most existing approaches learn a direct map from molecular geometries, typically represented as graphs or encoded local environments, to molecular properties or use ML as a surrogate for electronic structure theory by targeting quantities such as Fock or density matrices expressed in an atomic orbital (AO) basis. Inspired by the Hohenberg-Kohn theorem, in this work, we propose an operator-centered framework in which the external (nuclear) potential, expressed in an AO basis, serves as the model input. From this operator, we construct hierarchical, body-ordered representations of atomic configurations that closely mirror the principles underlying several popular atom-centered descriptors. At the same time, the matrix-valued nature of the external potential provides a natural connection to equivariant message-passing neural networks. In particular, we show that successive products of the external potential provide a scalable route to equivariant message passing and enable an efficient description of long-range effects. We demonstrate that this approach can be used to model molecular properties, such as energies and dipole moments, from the external potential, or learn effective operator-to-operator maps, including mappings to the Fock matrix and the reduced density matrix from which multiple molecular observables can be simultaneously derived.

external potential representation equivariant neural networks operator-to-operator learning representation learning symmetry preservation
Astrophysics Feb 16, 2026

UV and Optical Signatures of Late-time Disk Instabilities in Tidal Disruption Events

Daichi Tsuna, V. Ashley Villar, Anthony L. Piro et al.

Tidal disruption events (TDEs) are unique probes of evolving accretion in supermassive black holes. Recent models of TDE disks show that they undergo brief thermal instabilities with temporal super-Eddington accretion at late times, which has been suggested as a possibility to explain the ubiquitous late radio emergence in TDEs. We model the ultraviolet (UV) and optical signatures of such disk instabilities, expected from the accretion power being reprocessed by the optically-thick outflow following super-Eddington accretion. Our model predicts brief UV-bright transients lasting for days, with luminosities of $10^{42}$-$10^{43}$ erg s$^{-1}$ in near-UV and $10^{41}$-$10^{42}$ erg s$^{-1}$ in optical for a typical TDE by a $10^6~M_\odot$ black hole. These could be detectable by near-future surveys such as ULTRASAT, Vera C. Rubin Observatory and Argus Array, for TDEs of redshifts out to $\approx 0.1$. We further conduct a search for these transients in existing nearby TDEs using data from the Zwicky Transient Facility, placing upper limits on the flare rate for each TDE of $1$-$2$ yr$^{-1}$ dependent on the outflow mass. In the era of future surveys, combined UV/optical and radio monitoring would be an important test to the disk instability phenomena, as well as its explanation for the late-time radio emission in TDEs.

tidal disruption accretion super-eddington outflows transient light curve modeling multiwavelength monitoring phase transitions
Astrophysics Feb 10, 2026

The Landscape of Unstable Mass Transfer in Interacting Binaries and Its Imprint on the Population of Luminous Red Novae

Angela A. G. Twum, Alejandro Vigna-Gómez, Morgan MacLeod et al.

A common-envelope (CE) phase occurs when a star engulfs its companion and is widely considered the primary channel for producing Luminous Red Novae (LRNe). In this study, we combine binary-population synthesis with stellar-evolution calculations to systematically estimate the mass, velocity, and launching radius of ejecta produced during coalescence across a range of binary configurations. Our aim is to quantify how unstable mass-transfer dynamics in binaries at various evolutionary stages shape CE outcomes, enabling a predictive framework for modeling the LRN luminosity function. We find a bimodal distribution of plateau luminosities with significant implications for binary mass stability criteria that can be tested with forthcoming LSST observations. This bimodality emerges from differing mass-ejection outcomes during common-envelope interactions, which can lead either to stellar mergers, often accompanied by tidal disruption of the companion, or to successful envelope ejection. Although our predicted plateau luminosities and timescales broadly match existing observations, the models underpredict the number of LRNe with long-duration plateaus ($t_p \gtrsim 100\, \text{d}$) by about a third. We propose that these long-duration events arise from highly extended progenitors whose envelopes are ejected over multiple orbits (i.e., non-impulsively), producing relatively faint, long-lived transients. By constraining ejecta properties and incorporating pre-outburst progenitor imaging, we show how our models can clarify the physical processes that drive unstable mass transfer in these events. Finally, we argue that common-envelope interactions involving white-dwarf accretors can yield exotic outcomes, including red giants containing embedded white dwarfs that resemble Thorne-Żytków objects (TŻOs), along with calcium-rich supernovae that preserve hydrogen envelopes.

common-envelope evolution stellar evolution binary population synthesis luminous red novae simulation-based inference
Theoretical Physics Feb 9, 2026

Predicting magnetism with first-principles AI

Max Geier, Liang Fu

Computational discovery of magnetic materials remains challenging because magnetism arises from the competition between kinetic energy and Coulomb interaction that is often beyond the reach of standard electronic-structure methods. Here we tackle this challenge by directly solving the many-electron Schrödinger equation with neural-network variational Monte Carlo, which provides a highly expressive variational wavefunction for strongly correlated systems. Applying this technique to transition metal dichalcogenide moiré semicondutors, we predict itinerant ferromagnetism in WSe$_2$/WS$_2$ and an antiferromagnetic insulator in twisted $Γ$-valley homobilayer, using the same neural network without any physics input beyond the microscopic Hamiltonian. Crucially, both types of magnetic states are obtained from a single calculation within the $S_z=0$ sector, removing the need to compute and compare multiple $S_z$ sectors. This significantly reduces computational cost and paves the way for faster and more reliable magnetic material design.

neural wavefunction quantum states equivariant neural networks itinerant magnetism symmetry preservation
Astrophysics Feb 8, 2026

Dynamic Black-hole Emission Tomography with Physics-informed Neural Fields

Berthy T. Feng, Andrew A. Chael, David Bromley et al.

With the success of static black-hole imaging, the next frontier is the dynamic and 3D imaging of black holes. Recovering the dynamic 3D gas near a black hole would reveal previously-unseen parts of the universe and inform new physics models. However, only sparse radio measurements from a single viewpoint are possible, making the dynamic 3D reconstruction problem significantly ill-posed. Previously, BH-NeRF addressed the ill-posed problem by assuming Keplerian dynamics of the gas, but this assumption breaks down near the black hole, where the strong gravitational pull of the black hole and increased electromagnetic activity complicate fluid dynamics. To overcome the restrictive assumptions of BH-NeRF, we propose PI-DEF, a physics-informed approach that uses differentiable neural rendering to fit a 4D (time + 3D) emissivity field given EHT measurements. Our approach jointly reconstructs the 3D velocity field with the 4D emissivity field and enforces the velocity as a soft constraint on the dynamics of the emissivity. In experiments on simulated data, we find significantly improved reconstruction accuracy over both BH-NeRF and a physics-agnostic approach. We demonstrate how our method may be used to estimate other physics parameters of the black hole, such as its spin.

black hole tomography physics-informed neural networks inverse problems differentiable neural rendering neural radiance fields
Foundational AI Feb 5, 2026

Smoothness Errors in Dynamics Models and How to Avoid Them

Edward Berman, Luisa Li, Jung Yeon Park et al.

Modern neural networks have shown promise for solving partial differential equations over surfaces, often by discretizing the surface as a mesh and learning with a mesh-aware graph neural network. However, graph neural networks suffer from oversmoothing, where a node's features become increasingly similar to those of its neighbors. Unitary graph convolutions, which are mathematically constrained to preserve smoothness, have been proposed to address this issue. Despite this, in many physical systems, such as diffusion processes, smoothness naturally increases and unitarity may be overconstraining. In this paper, we systematically study the smoothing effects of different GNNs for dynamics modeling and prove that unitary convolutions hurt performance for such tasks. We propose relaxed unitary convolutions that balance smoothness preservation with the natural smoothing required for physical systems. We also generalize unitary and relaxed unitary convolutions from graphs to meshes. In experiments on PDEs such as the heat and wave equations over complex meshes and on weather forecasting, we find that our method outperforms several strong baselines, including mesh-aware transformers and equivariant neural networks.

graph neural networks relaxed unitary convolutions oversmoothing mesh-based pde solving spectral methods
Foundational AI Feb 4, 2026

Does SGD Seek Flatness or Sharpness? An Exactly Solvable Model

Yizhou Xu, Pierfrancesco Beneventano, Isaac Chuang et al.

A large body of theory and empirical work hypothesizes a connection between the flatness of a neural network's loss landscape during training and its performance. However, there have been conceptually opposite pieces of evidence regarding when SGD prefers flatter or sharper solutions during training. In this work, we partially but causally clarify the flatness-seeking behavior of SGD by identifying and exactly solving an analytically solvable model that exhibits both flattening and sharpening behavior during training. In this model, the SGD training has no \textit{a priori} preference for flatness, but only a preference for minimal gradient fluctuations. This leads to the insight that, at least within this model, it is data distribution that uniquely determines the sharpness at convergence, and that a flat minimum is preferred if and only if the noise in the labels is isotropic across all output dimensions. When the noise in the labels is anisotropic, the model instead prefers sharpness and can converge to an arbitrarily sharp solution, depending on the imbalance in the noise in the labels spectrum. We reproduce this key insight in controlled settings with different model architectures such as MLP, RNN, and transformers.

implicit bias of sgd sharpness-flatness tradeoff label noise geometry loss function design stochastic processes
Foundational AI Feb 4, 2026

Turbulence teaches equivariance to neural networks

Ryley McConkey, Julia Balla, Jeremiah Bailey et al.

We investigate how the rotational nature of turbulence affects learned mappings between quantities governed by the Navier-Stokes equations. By varying the degree of anisotropy in a turbulence dataset, we explore how statistical symmetry affects these mappings. To do this, we train super-resolution models at different wall-normal locations in a channel flow, where anisotropy varies naturally, and test their generalization. By evaluating the learned mappings on new coordinate frames and new flow conditions, we find that coordinate-frame generalization is a key part of the generalization problem. Turbulent flows naturally present a wide range of local orientations, so respecting the symmetries of the Navier-Stokes equations improves generalization to new flows. Importantly, turbulence's rotational structure can embed these symmetries into learned mappings -- an effect that strengthens with isotropy and dataset size. This is because a more isotropic dataset samples a wider range of orientations, more fully covering the rotational symmetries of the Navier-Stokes equations. The dependence on isotropy means equivariance error is also scale-dependent, consistent with Kolmogorov's hypothesis. Therefore, turbulence provides its own data augmentation (we term this implicit data augmentation). We expect this effect to apply broadly to learned mappings between tensorial flow quantities, making it relevant to most machine learning applications in turbulence.

equivariant neural networks symmetry preservation implicit data augmentation data augmentation superresolution
Astrophysics Feb 2, 2026

Physics-Informed Neural Networks for Modeling Galactic Gravitational Potentials

Charlotte Myers, Nathaniel Starkman, Lina Necib

We introduce a physics-informed neural framework for modeling static and time-dependent galactic gravitational potentials. The method combines data-driven learning with embedded physical constraints to capture complex, small-scale features while preserving global physical consistency. We quantify predictive uncertainty through a Bayesian framework, and model time evolution using a neural ODE approach. Applied to mock systems of varying complexity, the model achieves reconstruction errors at the sub-percent level ($0.14\%$ mean acceleration error) and improves dynamical consistency compared to analytic baselines. This method complements existing analytic methods, enabling physics-informed baseline potentials to be combined with neural residual fields to achieve both interpretable and accurate potential models.

physics-informed neural networks bayesian inference uncertainty quantification neural ode dark matter
Foundational AI Feb 1, 2026

High-accuracy sampling for diffusion models and log-concave distributions

Fan Chen, Sinho Chewi, Constantinos Daskalakis et al.

We present algorithms for diffusion model sampling which obtain $δ$-error in $\mathrm{polylog}(1/δ)$ steps, given access to $\widetilde O(δ)$-accurate score estimates in $L^2$. This is an exponential improvement over all previous results. Specifically, under minimal data assumptions, the complexity is $\widetilde O(d\,\mathrm{polylog}(1/δ))$ where $d$ is the dimension of the data; under a non-uniform $L$-Lipschitz condition, the complexity is $\widetilde O(\sqrt{dL}\,\mathrm{polylog}(1/δ))$; and if the data distribution has intrinsic dimension $d_\star$, then the complexity reduces to $\widetilde O(d_\star\,\mathrm{polylog}(1/δ))$. Our approach also yields the first $\mathrm{polylog}(1/δ)$ complexity sampler for general log-concave distributions using only gradient evaluations.

diffusion models first-order rejection sampling score-based models high-accuracy convergence log-concave sampling
Foundational AI Jan 29, 2026

Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models

Archer Wang, Emile Anand, Yilun Du et al.

Decomposing complex data into factorized representations can reveal reusable components and enable synthesizing new samples via component recombination. We investigate this in the context of diffusion-based models that learn factorized latent spaces without factor-level supervision. In images, factors can capture background, illumination, and object attributes; in robotic videos, they can capture reusable motion components. To improve both latent factor discovery and quality of compositional generation, we introduce an adversarial training signal via a discriminator trained to distinguish between single-source samples and those generated by recombining factors across sources. By optimizing the generator to fool this discriminator, we encourage physical and semantic consistency in the resulting recombinations. Our method outperforms implementations of prior baselines on CelebA-HQ, Virtual KITTI, CLEVR, and Falcor3D, achieving lower FID scores and better disentanglement as measured by MIG and MCC. Furthermore, we demonstrate a novel application to robotic video trajectories: by recombining learned action components, we generate diverse sequences that significantly increase state-space coverage for exploration on the LIBERO benchmark.

diffusion models compositional latent recombination disentangled representations discriminator-guided diffusion representation learning
Foundational AI Jan 29, 2026

The Ensemble Inverse Problem: Applications and Methods

Zhengyan Huan, Camila Pazos, Martin Klassen et al.

We introduce a new multivariate statistical problem that we refer to as the Ensemble Inverse Problem (EIP). The aim of EIP is to invert for an ensemble that is distributed according to the pushforward of a prior under a forward process. In high energy physics (HEP), this is related to a widely known problem called unfolding, which aims to reconstruct the true physics distribution of quantities, such as momentum and angle, from measurements that are distorted by detector effects. In recent applications, the EIP also arises in full waveform inversion (FWI) and inverse imaging with unknown priors. We propose non-iterative inference-time methods that construct posterior samplers based on a new class of conditional generative models, which we call ensemble inverse generative models. For the posterior modeling, these models additionally use the ensemble information contained in the observation set on top of single measurements. Unlike existing methods, our proposed methods avoid explicit and iterative use of the forward model at inference time via training across several sets of truth-observation pairs that are consistent with the same forward model, but originate from a wide range of priors. We demonstrate that this training procedure implicitly encodes the likelihood model. The use of ensemble information helps posterior inference and enables generalization to unseen priors. We benchmark the proposed method on several synthetic and real datasets in inverse imaging, HEP, and FWI. The codes are available at https://github.com/ZhengyanHuan/The-Ensemble-Inverse-Problem--Applications-and-Methods.

inverse problems generative models ensemble inverse learning posterior estimation unfolding
Foundational AI Jan 22, 2026

Active learning for photonics

Ryan Lopez, Charlotte Loh, Rumen Dangovski et al.

Active learning for photonic crystals explores the integration of analytic approximate Bayesian last layer neural networks (LL-BNNs) with uncertainty-driven sample selection to accelerate photonic band gap prediction. We employ an analytic LL-BNN formulation, corresponding to the infinite Monte Carlo sample limit, to obtain uncertainty estimates that are strongly correlated with the true predictive error on unlabeled candidate structures. These uncertainty scores drive an active learning strategy that prioritizes the most informative simulations during training. Applied to the task of predicting band gap sizes in two-dimensional, two-tone photonic crystals, our approach achieves up to a 2.6x reduction in required training data compared to a random sampling baseline while maintaining predictive accuracy. The efficiency gains arise from concentrating computational resources on high uncertainty regions of the design space rather than sampling uniformly. Given the substantial cost of full band structure simulations, especially in three dimensions, this data efficiency enables rapid and scalable surrogate modeling. Our results suggest that analytic LL-BNN based active learning can substantially accelerate topological optimization and inverse design workflows for photonic crystals, and more broadly, offers a general framework for data efficient regression across scientific machine learning domains.

active learning uncertainty quantification last layer bnn bayesian inference photonic band gap prediction
Theoretical Physics Jan 20, 2026

Universality of Neural Network Field Theory

Christian Ferko, James Halverson, Aaron Mutchler

We prove that any quantum field theory, or more generally any probability distribution over tempered distributions in $\mathbb{R}^d$, admits a neural network description with a countable infinity of parameters. As an example, we realize the $2d$ Liouville theory as a neural network and numerically compute the three-point function of vertex operators, finding agreement with the DOZZ formula.

quantum field theory neural network field theory conformal field theory borel isomorphism stochastic processes
Foundational AI Jan 20, 2026

Meta Flow Maps enable scalable reward alignment

Peter Potaptchik, Adhi Saravanan, Abbas Mammadov et al.

Controlling generative models is computationally expensive. This is because optimal alignment with a reward function--whether via inference-time steering or fine-tuning--requires estimating the value function. This task demands access to the conditional posterior $p_{1|t}(x_1|x_t)$, the distribution of clean data $x_1$ consistent with an intermediate state $x_t$, a requirement that typically compels methods to resort to costly trajectory simulations. To address this bottleneck, we introduce Meta Flow Maps (MFMs), a framework extending consistency models and flow maps into the stochastic regime. MFMs are trained to perform stochastic one-step posterior sampling, generating arbitrarily many i.i.d. draws of clean data $x_1$ from any intermediate state. Crucially, these samples provide a differentiable reparametrization that unlocks efficient value function estimation. We leverage this capability to solve bottlenecks in both paradigms: enabling inference-time steering without inner rollouts, and facilitating unbiased, off-policy fine-tuning to general rewards. Empirically, our single-particle steered-MFM sampler outperforms a Best-of-1000 baseline on ImageNet across multiple rewards at a fraction of the compute.

stochastic flow maps flow matching posterior estimation reward optimization diffusion models
Astrophysics Jan 20, 2026

Opportunities in AI/ML for the Rubin LSST Dark Energy Science Collaboration

LSST Dark Energy Science Collaboration, Eric Aubourg, Camille Avestruz et al.

The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will produce unprecedented volumes of heterogeneous astronomical data (images, catalogs, and alerts) that challenge traditional analysis pipelines. The LSST Dark Energy Science Collaboration (DESC) aims to derive robust constraints on dark energy and dark matter from these data, requiring methods that are statistically powerful, scalable, and operationally reliable. Artificial intelligence and machine learning (AI/ML) are already embedded across DESC science workflows, from photometric redshifts and transient classification to weak lensing inference and cosmological simulations. Yet their utility for precision cosmology hinges on trustworthy uncertainty quantification, robustness to covariate shift and model misspecification, and reproducible integration within scientific pipelines. This white paper surveys the current landscape of AI/ML across DESC's primary cosmological probes and cross-cutting analyses, revealing that the same core methodologies and fundamental challenges recur across disparate science cases. Since progress on these cross-cutting challenges would benefit multiple probes simultaneously, we identify key methodological research priorities, including Bayesian inference at scale, physics-informed methods, validation frameworks, and active learning for discovery. With an eye on emerging techniques, we also explore the potential of the latest foundation model methodologies and LLM-driven agentic AI systems to reshape DESC workflows, provided their deployment is coupled with rigorous evaluation and governance. Finally, we discuss critical software, computing, data infrastructure, and human capital requirements for the successful deployment of these new methodologies, and consider associated risks and opportunities for broader coordination with external actors.

dark energy uncertainty quantification bayesian inference simulation-based inference dark matter
Foundational AI Jan 12, 2026

PFT: Phonon Fine-tuning for Machine Learned Interatomic Potentials

Teddy Koker, Abhijeet Gangan, Mit Kotak et al.

Many materials properties depend on higher-order derivatives of the potential energy surface, yet machine learned interatomic potentials (MLIPs) trained with a standard loss on energy, force, and stress errors can exhibit error in curvature, degrading the prediction of vibrational properties. We introduce phonon fine-tuning (PFT), which directly supervises second-order force constants of materials by matching MLIP energy Hessians to DFT-computed force constants from finite displacement phonon calculations. To scale to large supercells, PFT stochastically samples Hessian columns and computes the loss with a single Hessian-vector product. We also use a simple co-training scheme to incorporate upstream data to mitigate catastrophic forgetting. On the MDR Phonon benchmark, PFT improves Nequix MP by 55% on average across phonon thermodynamic properties and achieves state-of-the-art accuracy among models trained on Materials Project trajectories. PFT also generalizes to improve properties beyond second-derivatives, improving thermal conductivity predictions that rely on third-order derivatives of the potential energy.

interatomic potentials phonon properties fine-tuning hessian supervision loss function design
Theoretical Physics Jan 9, 2026

String Theory from Infinite Width Neural Networks

Samuel Frank, James Halverson

We realize bosonic string theory with ensembles of infinite width neural networks. The string tension is tuned by the variance of the output weights. The construction provides a new computation of the foundational Virasoro-Shapiro and Veneziano amplitudes as neural network correlators.

string theory neural network field theory conformal field theory scattering amplitudes neural network gaussian process
Theoretical Physics Jan 7, 2026

A glimpse into the Ultrametric spectrum

An Huang, Christian B. Jepsen

The non-relativistic string spectrum is built from integer-spaced energy quanta in such a way that the high-temperature asymptotics, via the Hardy-Ramanujan formula for integer partitions, reduces to standard two-dimensional thermodynamics. Here we explore deformed realizations of this behavior motivated by $p$-adic string theory and Lorentzian versions thereof with a non-trivial spectrum. We study the microstate scaling that results on associating quantum harmonic oscillators to the normal modes of tree-graphs rather than string graphs and observe that Hardy-Ramanujan scaling is not realized. But by computing the eigenvalues of the derivative operator on the $p$-adic circle and by determining the eigenspectrum of the Neumann-to-Dirichlet operator, we uncover a spectrum of exponentially growing energies but with exponentially growing degeneracies balanced in such a way that Hardy-Ramanujan scaling is realized, but modulated with log-periodic fluctuations.

ultrametric spectrum string theory p-adic string theory spectral methods eigenvalue decomposition
Astrophysics Jan 6, 2026

MARVEL: A Multi Agent-based Research Validator and Enabler using Large Language Models

Nikhil Mukund, Yifang Luo, Fan Zhang et al.

We present MARVEL (https://ligogpt.mit.edu/marvel), a locally deployable, open-source framework for domain-aware question answering and assisted scientific research. It is designed to address the increasing demands of a digital assistant for scientific groups that can read highly technical data, cite precisely, and operate within authenticated networks. MARVEL combines a fast path for straightforward queries with a more deliberate DeepSearch mode that integrates retrieval-augmented generation and Monte Carlo Tree Search. It explores complementary subqueries, allocates more compute to promising branches, and maintains a global evidence ledger that preserves sources during drafting. We applied this framework in the context of gravitational-wave research related to the Laser Interferometer Gravitational-wave Observatory. Answers are grounded in a curated semantic index of research literature, doctoral theses, LIGO documents, and long-running detector electronic logbooks, with targeted web searches when appropriate. Because direct benchmarking against commercial LLMs cannot be performed on private data, we evaluated MARVEL on two publicly available surrogate datasets that capture comparable semantic and technical characteristics. On these benchmarks, MARVEL matches a GPT-4o mini baseline on literature-centric queries and substantially outperforms it on detector-operations content, where domain retrieval and guided reasoning are decisive. By making the complete framework and evaluation datasets openly available, we aim to provide a reproducible foundation for developing domain-specific scientific assistants.

retrieval-augmented generation monte carlo methods multi-agent orchestration domain-specific qa embeddings
Theoretical Physics Dec 31, 2025

Green's function on the Tate curve

An Huang, Rebecca Rohrlich, Yaojia Sun et al.

Motivated by the question of defining a $p$-adic string worldsheet action in genus one, we define a Laplacian operator on the Tate curve, and study its Green's function. We show that the Green's function exists. We provide an explicit formula for the Green's function, which turns out to be a non-Archimedean counterpart of the Archimedean Green's function on a flat torus. In particular, it turns out that this Green's function recovers the Néron local height function for the Tate curve in the $p\to\infty$ limit, when the $j$-invariant has odd valuation. So this non-Archimedean height function now acquires a physics meaning in terms of the large $p$ limit of a non-Archimedean conformal field theory two point function on the Tate curve, as well as a direct analytic interpretation as a Green's function, on the same footing as in the Archimedean place.

p-adic analysis string theory conformal field theory néron height function bruhat-tits tree
Theoretical Physics Dec 26, 2025

Machine Learning Invariants of Tensors

Athithan Elamaran, Christian Ferko, Sterling Scarlett

We propose a data-driven approach to identifying the functionally independent invariants that can be constructed from a tensor with a given symmetry structure. Our algorithm proceeds by first enumerating graphs, or tensor networks, that represent inequivalent contractions of a product of tensors, computing instances of these scalars using randomly generated data, and then seeking linear relations between invariants using numerical linear algebra. Such relations yield syzygies, or functional dependencies relating different invariants. We apply this approach in an extended case study of the independent invariants that can be constructed from an antisymmetric $3$-form $H_{μνρ}$ in six dimensions, finding five independent invariants. This result confirms that the most general Lagrangian for such a $3$-form, which depends on $H_{μνρ}$ but not its derivatives, is an arbitrary function of five variables, and we give explicit formulas relating other invariants to the five independent scalars in this generating set.

group theory symmetry preservation invariant enumeration tensor networks syzygy detection
Theoretical Physics Dec 22, 2025

Entanglement cohomology for GHZ and W states

Christian Ferko, Keiichiro Furuya

Entanglement cohomology assigns a graded cohomology ring to a multipartite pure state, providing homological invariants that are stable under local unitaries and characterize inequivalent patterns of entanglement. In this work we derive exact expressions for the dimensions of these cohomology groups in two canonical entanglement classes, generalized GHZ and W states on an arbitrary number of parties and local Hilbert space dimensions, thus proving conjectures of arXiv:1901.02011. Using the additional structure of the Hodge star and wedge product operations, we propose two new classes of local unitary invariants: the spectrum of the natural Laplacian acting on entanglement $k$-forms, and the intersection numbers obtained from wedge products of representatives for cohomology classes. We present numerical experiments which investigate these invariants in particular states, suggesting that they may provide useful quantities for describing multipartite entanglement.

quantum states entanglement entanglement cohomology hodge theory intersection numbers
Theoretical Physics Dec 19, 2025

Constraining primordial non-Gaussianity from DESI DR1 quasars and Planck PR4 CMB Lensing

Sofia Chiarenza, Alex Krolewski, Marco Bonici et al.

We present the first measurement of local-type primordial non-Gaussianity from the cross-correlation between $1.2$ million spectroscopically confirmed quasars from the first data release (DR1) of the Dark Energy Spectroscopic Instrument (DESI) and the Planck PR4 CMB lensing reconstructions. The analysis is performed in three tomographic redshift bins covering $0.8 < z < 3.5$, covering a sky fraction of $\sim 20\%$. We adopt a catalog-based pseudo-$C_\ell$ estimator and apply linear imaging weights validated on noiseless mocks. Compared to previous analyses using photometric quasar samples, our results benefit from the high purity of the DESI spectroscopic sample, the reduced noise of PR4 lensing, and the absence of excess large-scale power in the spectroscopic quasar auto-correlation. Fitting simultaneously for the non-Gaussianity parameter $f_{\mathrm{NL}}$ and the linear bias amplitude in each redshift bin, we obtain $f_{\mathrm{NL}} = 2^{+28}_{-34}$ for a response parameter $p=1.6$, and $f_{\mathrm{NL}} = 6^{+20}_{-24}$ for $p=1.0$. These results improve the constraints on $f_{\mathrm{NL}}$ by $\sim 35\%$ compared to the previous analysis based on the Legacy Imaging Survey DR9. Our results demonstrate the statistical power of DESI quasars for probing inflationary physics, and highlight the promise of future DESI data releases.

primordial non-gaussianity scale-dependent bias cmb lensing cross-correlation cosmic microwave background spectral methods
Astrophysics Dec 19, 2025

A Search for Binary Black Hole Mergers in LIGO O1-O3 Data with Convolutional Neural Networks

Ethan Silver, Plamen Krastev, Edo Berger

Since the first detection of gravitational waves in 2015 by LIGO from the binary black hole merger GW150914, gravitational-wave astronomy has developed significantly, with over 200 compact binary merger events cataloged. The use of neural networks has the potential to significantly speed up the detection, classification, and especially parameter estimation for gravitational wave events, compared to current techniques, quite important for electromagnetic follow-up of events. In this work, we present a machine learning pipeline using neural networks to detect gravitational wave events. We generate training data using real LIGO data to train and refine neural networks that can detect binary black hole (BBH) mergers, and apply these models to search through LIGO's first three observing runs. We detect 57 out of the 75 total cataloged BBH events with two detectors of data in O1, O2, and O3, with 57 false positives that can mostly be ruled out with parameter inference and human inspection. Finally, we extensively test this pipeline on time-shifted data to characterize its False Alarm Rate (FAR). These results are an important step in developing machine learning-based GW searches, enabling low-latency detection and multi-messenger astronomy.

gravitational waves convolutional networks signal detection classification hypothesis testing
Theoretical Physics Dec 15, 2025

Bridging Simulations and EFT: A Hybrid Model of the Lyman-Alpha Forest Field

Roger de Belsunce, Boryana Hadzhiyska, Mikhail M. Ivanov

The Lyman-alpha (Lya) forest is a unique probe of cosmology and the intergalactic medium at high redshift and small scales. The statistical power of the ongoing Dark Energy Spectroscopic Instrument (DESI) demands precise theoretical tools to model the Lya forest. We present a hybrid effective field theory (HEFT) forward model in redshift space that leverages the accuracy of non-linear particle displacements computed using the N-body simulation suite AbacusSummit with the predictive power of an analytical, perturbative bias forward model in the framework of the effective field theory (EFT). The residual noise between the model and the simulated Lya field has a nearly white (scale-and orientation-independent) power spectrum on quasi-linear scales, substantially simplifying its modeling compared to a purely perturbative description. As a consequence of the improved control over the 3D Lya forest stochasticity, we find agreement between the modeled and the true power spectra at the 5 per cent level down to scales of k <= 1 h/Mpc. This procedure offers a promising path toward constructing efficient and accurate emulators to predict large-scale clustering summary statistics for full-shape cosmological analyses of Lya forest data from both DESI and its successor, DESI-II.

effective field theory lyman-alpha forest cosmological simulation lagrangian methods bias expansion
Theoretical Physics Dec 11, 2025

Electronic crystals and quasicrystals in semiconductor quantum wells: an AI-powered discovery

Filippo Gaggioli, Pierre-Antoine Graham, Liang Fu

The homogeneous electron gas is a cornerstone of quantum condensed matter physics, providing the foundation for developing density functional theory and understanding electronic phases in semiconductors. However, theoretical understanding of strongly-correlated electrons in realistic semiconductor systems remains limited. In this work, we develop a neural network based variational approach to study quantum wells in three dimensional geometry for a variety of electron densities and well thicknesses. Starting from first principles, our unbiased AI-powered method reveals metallic and crystalline phases with both monolayer and bilayer charge distributions. In the emergent bilayer, we discover a new quantum phase of matter: the electronic quasicrystal.

electronic quasicrystal monte carlo methods neural network variational monte carlo attention mechanisms quantum states
Astrophysics Dec 9, 2025

Self-lensing flares from black hole binaries V: systematic searches in LSST

Kevin Park, Zoltan Haiman, Chengcheng Xin et al.

The Vera C. Rubin Observatory has now seen first light, and over a 10 year duration, LSST is projected to catalogue tens of millions of quasars, many of which are expected to be associated with sub-parsec supermassive black hole binaries (SMBHBs). Out of these SMBHBs, up to thousands of relatively massive binary-quasars are expected to exhibit gravitational self-lensing flares (SLFs) that last for at least 20-30 days. We assess the effectiveness of the Lomb-Scargle (LS) periodogram and matched filters (MFs) as methods for systematic searches for these binaries, using toy-models of hydrodynamical, Doppler, and self-lensing variability from equal-mass, eccentric SMBHBs. We inject SLFs into random realizations of damped random walk (DRW) lightcurves, representing stochastic quasar variability, and compute the LS periodogram with and without the SLF. We find that periodograms of SLF+DRW light-curves do not have maximum peak heights that could not arise from DRW-only periodograms. On the other hand, the matched filter signal-to-noise ratio (SNR) can distinguish SLFs from noise even with LSST-like cadences and DRW noise. Furthermore, we develop a three-step procedure with matched filters, which can also recover injected binary parameters from these light-curves. We expect this method to be computationally efficient enough to be applicable to millions of quasar light-curves in LSST.

self-lensing flares supermassive black hole binaries signal detection matched filter stochastic processes
Astrophysics Dec 6, 2025

A Fully Photometric Approach to Type Ia Supernova Cosmology in the LSST Era: Host Galaxy Redshifts and Supernova Classification

Ayan Mitra, Richard Kessler, Rebecca C. Chen et al.

The upcoming Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) is expected to discover nearly a million Type Ia supernovae (SNeIa), offering an unprecedented opportunity to constrain dark energy. The vast majority of these events will lack spectroscopic classification and redshifts, necessitating a fully photometric approach to maximize cosmology constraining power. We present detailed simulations based on the Extended LSST Astronomical Time Series Classification Challenge (ELAsTiCC), and a cosmological analysis using photometrically classified SNeIa with host galaxy photometric redshifts. This dataset features realistic multi-band light curves, non-SNIa contamination, host mis-associations, and transient-host correlations across the high-redshift Deep Drilling Fields (DDF) (~ 50 deg^2). We also include a spectroscopically confirmed low-redshift sample based on the Wide Fast Deep (WFD) fields. We employ a joint SN+host photometric redshift fit, a neural network based photometric classifier (SCONE), and BEAMS with Bias Corrections (BBC) methodology to construct a bias-corrected Hubble diagram. We produce statistical + systematic covariance matrices, and perform cosmology fitting with a prior using Cosmic Microwave Background constraints. We fit and present results for the wCDM dark energy model, and the more general Chevallier-Polarski-Linder (CPL) w0wa model. With a simulated sample of ~6000 events, we achieve a Figure of Merit (FoM) value of about 150, which is significantly larger than the DESVYR FoM of 54. Averaging analysis results over 25 independent samples, we find small but significant biases indicating a need for further analysis testing and development.

supernova classification dark energy photometric redshift estimation bayesian inference classification
Astrophysics Dec 5, 2025

SN 2024afav: A Superluminous Supernova with Multiple Light Curve Bumps and Spectroscopic Signatures of Circumstellar Interaction

Harsh Kumar, Peter K. Blanchard, Edo Berger et al.

We present a comprehensive optical and near-infrared spectroscopic study of SN 2024afav - a hydrogen-poor superluminous supernova (SLSN-I) that peaks at $\approx$ -20.7 mag and exhibits an unusual multi-bumped light curve. Our spectroscopic observations, spanning phases of -14 to +160 d, reveal several unusual features: (i) a narrow (1,800 km s$^{-1}$) and blueshifted (11,000 km s$^{-1}$) absorption from H$α$ starting at +20 d; (ii) persistent optical and NIR He I lines at all available phases, showing double absorption structure in NIR spectra at +23 d, with a high velocity component at a similar velocity to H$α$; (iii) early appearance of nebular [O III] emission starting at $\approx$ +50 d; and (iv) strong [O II] + [Ca II] 7300 Å emission complex starting at $\approx$ +110 d. These unusual features, and their onset at the time of the light curve bumps, provide compelling evidence of circumstellar interaction between the SN ejecta and a nearby hydrogen-rich shell, as well as the presence of helium in both the outer layers of the progenitor star and in the circumstellar medium. A comparison of SN 2024afav to other SLSNe-I showing bumpy light curves and similar spectral properties (PTF10hgi, SN 2017egm, SN 2019hge), points to a rare sub-group of SLSNe-I in which CSM interaction provides an important modulation to the energy input.

circumstellar interaction supernova classification light curve bumps spectral methods stellar evolution
Theoretical Physics Dec 5, 2025

Lattice field theory for superconducting circuits

Joshua Lin, Max Hays, Stephen Sorokanich et al.

Large superconducting quantum circuits have a number of important applications in quantum computing. Accurately predicting the performance of these devices from first principles is challenging, as it requires solving the many-body Schrödinger equation. This work introduces a new, general ab-initio method for analyzing large quantum circuits based on lattice field theory, a tool commonly applied in nuclear and particle physics. This method is competitive with state-of-the-art techniques such as tensor networks, but avoids introducing systematic errors due to truncation of the infinite-dimensional Hilbert space associated with superconducting phases. The approach is applied to fluxonium, a specific many-component superconducting qubit with favorable qualities for quantum computation. A systematic study of the influence of impedance on fluxonium is conducted that parallels previous experimental studies, and ground capacitance effects are explored. The qubit frequency and charge noise dephasing rate are extracted from statistical analyses of charge noise, where thousands of instantiations of charge disorder in the Josephson junction array of a fixed fluxonium qubit are explicitly averaged over at the microscopic level. This is difficult to achieve with any other existing method.

lattice gauge theory quantum field theory monte carlo methods fluxonium qubit path integral formulation
Astrophysics Dec 3, 2025

The DREAMS Project: Disentangling the Impact of Halo-to-Halo Variance and Baryonic Feedback on Milky Way Dark Matter Speed Distributions

Ethan Lilie, Jonah C. Rose, Mariangela Lisanti et al.

Direct detection experiments require information about the local dark matter speed distribution to produce constraints on dark matter candidates, or infer their properties in the event of a discovery. In this paper, we analyze how the uncertainty in the dark matter speed distribution near the Sun is affected by baryonic feedback, halo-to-halo variance, and halo mass. To do so, we harness the statistical power of the new DREAMS Cold Dark Matter simulation suite, which is comprised of 1024 zoom-in Milky Way-mass halos with varied initial conditions as well as cosmological and astrophysical parameters. Applying a normalizing flows emulator to these simulations, we find that the uncertainty in the local DM speed distribution is dominated by halo-to-halo variance and, to a lesser extent, uncertainty in host halo mass. Uncertainties in supernova and black hole feedback (from the IllustrisTNG model in this case) are negligible in comparison. Using the DREAMS suite, we present a state-of-the-art prediction for the DM speed distribution in the Milky Way. Although the Standard Halo Model is contained within the uncertainty of this prediction, individual galaxies may have distributions that differ from it. Lastly, we apply our DREAMS results to the XENON1T experiment and demonstrate that the astrophysical uncertainties are comparable to the experimental ones, solidifying previous results in the literature obtained with a smaller sample of simulated Milky Way-mass halos.

dark matter cosmological simulation halo-to-halo variance normalizing flows uncertainty quantification
Astrophysics Dec 3, 2025

Minuet: A Diffusion Autoencoder for Compact Semantic Compression of Multi-Band Galaxy Images

Alexander T. Gagliano, Yunyi Shen, V. A. Villar

The Vera C. Rubin Observatory is slated to observe nearly 20 billion galaxies during its decade-long Legacy Survey of Space and Time. The rich imaging data it collects will be an invaluable resource for probing galaxy evolution across cosmic time, characterizing the host galaxies of transient phenomena, and identifying novel populations of anomalous systems. While machine learning models have shown promise for extracting galaxy features from multi-band astronomical imaging, the large dimensionality of the learned latent space presents a challenge for mechanistic interpretability studies. In this work, we present Minuet, a low-dimensional diffusion autoencoder for multi-band galaxy imaging. Minuet is trained to reconstruct 72x72-pixel $grz$ image cutouts of 6M galaxies within $z<1$ from the Dark Energy Camera Legacy Survey using only five latent dimensions. By using a diffusion model conditioned on the transformer-based autoencoder's output for image reconstruction, we achieve semantically-meaningful latent representations of galaxy images while still allowing for high-fidelity, probabilistic reconstructions. We train a series of binary classifiers on Minuet's latent features to quantify their connection to morphological labels from Galaxy Zoo, and a conditional flow to produce posterior distributions of SED-derived redshifts, stellar masses, and star-formation rates. We further show the value of Minuet for nearest neighbor searches in the learned latent space. Minuet provides strong evidence for the low intrinsic dimensionality of galaxy imaging, and introduces a class of astrophysical models that produce highly compact representations for diverse science goals.

diffusion models autoencoders dimensionality reduction representation learning self-supervised learning
Astrophysics Dec 2, 2025

The DREAMS Project: Disentangling the Impact of Halo-to-Halo Variance and Baryonic Feedback on Milky Way Dark Matter Density Profiles

Alex M. Garcia, Jonah C. Rose, Paul Torrey et al.

Astrophysical searches for dark matter in the Milky Way require a reliable model for its density distribution, which in turn depends on the influence of baryonic feedback on the Galaxy. In this work, we utilize a new suite of Milky Way-mass halos from the DREAMS Project, simulated with Cold Dark Matter (CDM),to quantify the influence of baryon feedback and intrinsic halo-to-halo variance on dark matter density profiles. Our suite of 1024 halos varies over supernova and black hole feedback parameters from the IllustrisTNG model, as well as variations in two cosmological parameters. We find that Milky Way-mass dark matter density profiles in the IllustrisTNG model are largely insensitive to astrophysics and cosmology variations, with the dominant source of scatter instead arising from halo-to-halo variance. However, most of the (comparatively minor) feedback-driven variations come from the changes to supernova prescriptions. By comparing to dark matter-only simulations, we find that the strongest supernova wind energies are so effective at preventing galaxy formation that the halos are nearly entirely collisionless dark matter. Finally, regardless of physics variation, all the DREAMS halos are roughly consistent with a halo contracting adiabatically from the presence of baryons, unlike models that have bursty stellar feedback. This work represents a step toward assessing the robustness of Milky Way dark matter profiles, with direct implications for dark matter searches where systematic uncertainty in the density profile remains a major challenge.

dark matter cosmological simulation baryonic feedback halo-to-halo variance uncertainty quantification
Astrophysics Dec 1, 2025

The DREAMS Project: Disentangling the Impact of Halo-to-Halo Variance and Baryonic Feedback on Milky Way Satellite Galaxies

Jonah C. Rose, Mariangela Lisanti, Paul Torrey et al.

We analyze the properties of satellite galaxies around 1,024 Milky Way-mass hosts from the DREAMS Project, simulated within a $Λ$CDM cosmology. Utilizing the TNG galaxy-formation model, the DREAMS simulations incorporate both baryonic physics and cosmological uncertainties for a large sample of galaxies with diverse environments and formation histories. We investigate the relative impact of the physical uncertainty from the galaxy-formation model on predicted satellite properties using four metrics: the satellite stellar mass function, radial distribution, inner slope of dark matter density profile, and stellar half-light radius. We compare these predictions to observations from the SAGA Survey and the DREAMS N-body simulations and find that uncertainties from baryonic physics modeling are subdominant to the scatter arising from halo-to-halo variance. Where baryonic modeling does affect satellites, the supernova wind energy has the largest effect on the satellite properties that we investigate. Specifically, increased supernova wind energy suppresses the stellar mass of satellites and results in more extended stellar half-light radii. The adopted wind speed has only a minor impact, and other astrophysical and cosmological parameters show no measurable effect. Our findings highlight the robustness of satellite properties against uncertainties in baryonic physics modeling.

cosmological simulation halo-to-halo variance baryonic feedback uncertainty quantification dark matter
Theoretical Physics Dec 1, 2025

Refining Heuristic Predictors of Fractional Chern Insulators using Machine Learning

Oriol Mayné i Comas, André Grossi Fonseca, Sachin Vaidya et al.

We develop an interpretable, data-driven framework to quantify how single-particle band geometry governs the stability of fractional Chern insulators (FCIs). Using large-scale exact diagonalization, we evaluate an FCI metric that yields a continuous spectral measure of FCI stability across parameter space. We then train Kolmogorov-Arnold networks (KANs) -- a recently developed interpretable neural architecture -- to regress this metric from two band-geometric descriptors: the trace violation $T$ and the Berry curvature fluctuations $σ_B$. Applied to spinless fermions at filling $ν=1/3$ in models on the checkerboard and kagome lattices, our approach yields compact analytical formulas that predict FCI stability with over $>80 \%$ accuracy in both regression and classification tasks, and remain reliable even in data-scarce regimes. The learned relations reveal model-dependent trends, clarifying the limits of Landau-level-mimicking heuristics. Our framework provides a general method for extracting simple, phenomenological "laws" that connect many-body phase stability to chosen physical descriptors, enabling rapid hypothesis formation and targeted design of quantum phases.

fractional chern insulators kolmogorov-arnold networks band geometry interpretability regression
Foundational AI Nov 27, 2025

Test-time scaling of diffusions with flow maps

Amirmojtaba Sabour, Michael S. Albergo, Carles Domingo-Enrich et al.

A common recipe to improve diffusion models at test-time so that samples score highly against a user-specified reward is to introduce the gradient of the reward into the dynamics of the diffusion itself. This procedure is often ill posed, as user-specified rewards are usually only well defined on the data distribution at the end of generation. While common workarounds to this problem are to use a denoiser to estimate what a sample would have been at the end of generation, we propose a simple solution to this problem by working directly with a flow map. By exploiting a relationship between the flow map and velocity field governing the instantaneous transport, we construct an algorithm, Flow Map Trajectory Tilting (FMTT), which provably performs better ascent on the reward than standard test-time methods involving the gradient of the reward. The approach can be used to either perform exact sampling via importance weighting or principled search that identifies local maximizers of the reward-tilted distribution. We demonstrate the efficacy of our approach against other look-ahead techniques, and show how the flow map enables engagement with complicated reward functions that make possible new forms of image editing, e.g. by interfacing with vision language models.

flow map trajectory tilting diffusion models test-time scaling flow matching reward optimization
Theoretical Physics Nov 20, 2025

Fermions and Supersymmetry in Neural Network Field Theories

Samuel Frank, James Halverson, Anindita Maiti et al.

We introduce fermionic neural network field theories via Grassmann-valued neural networks. Free theories are obtained by a generalization of the Central Limit Theorem to Grassmann variables. This enables the realization of the free Dirac spinor at infinite width and a four fermion interaction at finite width. Yukawa couplings are introduced by breaking the statistical independence of the output weights for the fermionic and bosonic fields. A large class of interacting supersymmetric quantum mechanics and field theory models are introduced by super-affine transformations on the input that realize a superspace formalism.

quantum field theory grassmann neural networks neural network field theory supersymmetric field theory stochastic processes
Astrophysics Nov 18, 2025

Large Language Model Driven Analysis of General Coordinates Network (GCN) Circulars

Vidushi Sharma, Ronit Agarwala, Judith L. Racusin et al.

The General Coordinates Network (GCN) is NASA's time-domain and multi-messenger alert system. GCN distributes two data products - automated ``Notices,'' and human-generated ``Circulars,'' that report the observations of high-energy and multi-messenger astronomical transients. The flexible and non-structured format of GCN Circulars, comprising of more than 40500 Circulars accumulated over three decades, makes it challenging to manually extract observational information, such as redshift or observed wavebands. In this work, we employ large language models (LLMs) to facilitate the automated parsing of transient reports. We develop a neural topic modeling pipeline with open-source tools for the automatic clustering and summarization of astrophysical topics in the Circulars database. Using neural topic modeling and contrastive fine-tuning, we classify Circulars based on their observation wavebands and messengers. Additionally, we separate gravitational wave (GW) event clusters and their electromagnetic (EM) counterparts from the Circulars database. Finally, using the open-source Mistral model, we implement a system to automatically extract gamma-ray burst (GRB) redshift information from the Circulars archive, without the need for any training. Evaluation against the manually curated Neil Gehrels Swift Observatory GRB table shows that our simple system, with the help of prompt-tuning, output parsing, and retrieval augmented generation (RAG), can achieve an accuracy of 97.2 % for redshift-containing Circulars. Our neural search enhanced RAG pipeline accurately retrieved 96.8 % of redshift circulars from the manually curated database. Our study demonstrates the potential of LLMs, to automate and enhance astronomical text mining, and provides a foundation work for future advances in transient alert analysis.

astronomical text mining retrieval augmented generation clustering neural topic modeling classification
Astrophysics Nov 13, 2025

Hydrogen-Poor Superluminous Supernovae in the Nebular Phase: Spectral Diversity Due to Ejecta Ionization as a Probe of the Power Source

Peter K. Blanchard, Edo Berger, Sebastian Gomez et al.

We present a large sample of 39 nebular-phase optical spectra of 25 hydrogen-poor superluminous supernovae (SLSNe-I) and jointly analyze them with previously published spectra of 12 events. We measure the properties of key emission features, namely those at 6300, 7300, and 7774 angstroms (associated with [O I], [Ca II]/[O II], and O I, respectively), and find that SLSNe exhibit much wider spectral diversity than normal SNe Ic, primarily in the line ratio $L_{7300}/L_{6300}$, which is highly sensitive to ejecta ionization. Some events exhibit weak [O I] and a clear [O II] contribution to the 7300 angstrom feature, enhancing the ratio, along with [O III] lines at 4363 and 5007 angstroms. Other SLSNe show weak or no lines of ionized oxygen. Moreover, we find that the population exhibits decreasing $L_{7300}/L_{6300}$ over time, while a few outliers instead display sustained high or increasing ratios for extended periods. The ratio $L_{7300}/L_{6300}$ is also correlated with the rise and decline times of the light curves, with slower events exhibiting higher ionization, the first robust connection between early light curve and late-time spectral properties, likely due to the magnetar's impact: slower-evolving SLSNe are generally powered by engines with longer spin-down timescales, which deposit more energy at later phases. Among the events with decreasing $L_{7300}/L_{6300}$, SLSNe with high ionization are on average powered by magnetars with higher thermalized spin-down power, a correlation that is most significant for events with $M_{\rm ej}\lesssim12$ M$_{\odot}$. The ionization in the outliers with increasing $L_{7300}/L_{6300}$ may be due to late CSM interaction. $L_{7300}/L_{6300}$ and its evolution are therefore key diagnostics of SLSN engines and progenitor mass loss.

ejecta ionization supernova classification magnetar spin-down nebular spectroscopy spectral methods
Foundational AI Nov 12, 2025

Learning to Validate Generative Models: a Goodness-of-Fit Approach

Pietro Cappelli, Gaia Grosso, Marco Letizia et al.

Generative models are increasingly central to scientific workflows, yet their systematic use and interpretation require a proper understanding of their limitations through rigorous validation. Classic approaches struggle with scalability, statistical power, or interpretability when applied to high-dimensional data, making it difficult to certify the reliability of these models in realistic, high-dimensional scientific settings. Here, we propose the use of the New Physics Learning Machine (NPLM), a learning-based approach to goodness-of-fit testing inspired by the Neyman--Pearson construction, to test generative networks trained on high-dimensional scientific data. We demonstrate the performance of NPLM for validation in two benchmark cases: generative models trained on mixtures of Gaussian models with increasing dimensionality, and a public end-to-end model, known as FlowSim, developed to generate high-energy physics collision events. We demonstrate that the NPLM can serve as a powerful validation method while also providing a means to diagnose sub-optimally modeled regions of the data.

goodness-of-fit testing model validation likelihood ratio hypothesis testing generative models
Foundational AI Nov 11, 2025

Distance by de-correlation: Computing distance with heterogeneous grid cells

Pritipriya Dasbehera, Akshunna S. Dogra, William T. Redman

Encoding the distance between locations in space is essential for accurate navigation. Grid cells, a functional class of neurons in medial entorhinal cortex, are believed to support this computation. However, existing theories of how populations of grid cells code distance rely on complex coding schemes, with assumptions that may not be met by anatomical constraints. Inspired by recent work finding grid cells to have small, but robust heterogeneity in their grid properties, we hypothesize that distance coding can be achieved by a simple de-correlation of population activity. We develop a mathematical theory for describing this de-correlation in one-dimension, showing that its predictions are consistent with simulations of noisy grid cells. Our simulations highlight a non-intuitive prediction of such a distance by de-correlation framework. Namely, that some further distances are better encoded than some nearer distances. We find evidence of this "sweet spot" in previously published rodent behavioral experiments and demonstrate that a decoder which estimates distance from the de-correlation of populations of simulated noisy grid cells leads to a similar pattern of errors. Finally, by simulating noisy grid cells in two-dimensions, we find that there exists a trade-off between the range of distances that can be encoded by de-correlation of population activity and the distinguishability of different distances, which is controlled by the amount of variability in grid properties. We show that the previously observed average amount of grid property variability strikes a balance between the two, enabling the encoding of distances up to several meters. Our work provides new insight on how grid cells can underlie the coding of distance, without the assumptions previously needed, and why grid cells may have small amounts of heterogeneity in their grid properties.

grid cell coding population vector correlation neural heterogeneity stochastic processes representation learning
Astrophysics Nov 6, 2025

Spectral Diversity in Type Ibn Supernovae and the Large Host Offset of SN2024acyl

Yize Dong, V. Ashley Villar, Anya Nugent et al.

In this paper, we first present observations of SN~2024acyl, a normal Type Ibn supernova with a large projected offset ($\sim$35~kpc) from its host galaxy. The low star-formation rate measured at the explosion site raises the possibility that the progenitor of SN~2024acyl may not have been a massive star. We then examine, more broadly, the spectral diversity of Type Ibn supernovae around 20--35 days after peak brightness and identify two distinct groups: Group I, which shows bluer rest-frame optical color and narrower He~I emission lines; and Group II, which shows redder rest-frame optical color and broader He~I lines. Group~I also tends to show higher peak luminosities. The diversity we identify appears to be closely connected to the diversity observed around peak and to persist into late phases ($>80$ days after peak). Given its redder color and broader He~I lines, we classify SN~2024acyl as belonging to Group II. Based on the current dataset, we find no clear connection between this spectral diversity and either the host environments of Type Ibn SNe or their pre-explosion activity. The observed diversity in Type Ibn SNe likely reflects differences in circumstellar material properties and/or explosion energetics. These differences could result from a range of progenitor properties, such as different helium star mass, orbital period and companion type if they are in binary systems, and may indicate fundamentally diverse progenitors. Whether a continuous distribution exists between the two groups remains to be determined and will require further data to explore.

supernova classification spectral diversity circumstellar material interaction stellar evolution host galaxy environment
Foundational AI Nov 5, 2025

Sparse, self-organizing ensembles of local kernels detect rare statistical anomalies

Gaia Grosso, Sai Sumedh R. Hindupur, Thomas Fel et al.

Modern artificial intelligence has revolutionized our ability to extract rich and versatile data representations across scientific disciplines. Yet, the statistical properties of these representations remain poorly controlled, causing misspecified anomaly detection (AD) methods to falter. Weak or rare signals can remain hidden within the apparent regularity of normal data, creating a gap in our ability to detect and interpret anomalies. We examine this gap and identify a set of structural desiderata for detection methods operating under minimal prior information: sparsity, to enforce parsimony; locality, to preserve geometric sensitivity; and competition, to promote efficient allocation of model capacity. These principles define a class of self-organizing local kernels that adaptively partition the representation space around regions of statistical imbalance. As an instantiation of these principles, we introduce SparKer, a sparse ensemble of Gaussian kernels trained within a semi-supervised Neyman--Pearson framework to locally model the likelihood ratio between a sample that may contain anomalies and a nominal, anomaly-free reference. We provide theoretical insights into the mechanisms that drive detection and self-organization in the proposed model, and demonstrate the effectiveness of this approach on realistic high-dimensional problems of scientific discovery, open-world novelty detection, intrusion detection, and generative-model validation. Our applications span both the natural- and computer-science domains. We demonstrate that ensembles containing only a handful of kernels can identify statistically significant anomalous locations within representation spaces of thousands of dimensions, underscoring both the interpretability, efficiency and scalability of the proposed approach.

anomaly detection kernel methods likelihood ratio sparse models self-organizing kernels
Theoretical Physics Oct 27, 2025

The Compressed 3D Lyman-Alpha Forest Bispectrum

Roger de Belsunce, James M. Sullivan, Patrick McDonald

Cosmological studies of the Lyman-Alpha (Lya) forest typically constrain parameters using two-point statistics. However, higher-order statistics, such as the three-point function (or its Fourier counterpart, the bispectrum) offer additional information and help break the degeneracy between the mean flux and power spectrum amplitude, albeit at a significant computational cost. To address this, we extend an existing highly informative compression of the bispectrum, the skew spectra, to the Lya forest. We derive the tree-level bispectrum of Lya forest fluctuations in the framework of effective field theory (EFT) directly in redshift space and validate our methodology on synthetic Lya forest data. We measure the anisotropic cross-spectra between the transmitted flux fraction and all quadratic operators arising in the bispectrum, yielding a set of 26 skew spectra. Using idealized 3D Gaussian smoothing (R=10 Mpc/h), we find good agreement (1-2 sigma level based on the statistical errors of the mocks) with the theoretical tree-level bispectrum prediction for monopole and quadrupole up to k <= 0.17 h/Mpc. To enable the cosmological analysis of Lya forest data from the currently observing Dark Energy Spectroscopic Instrument (DESI), where we cannot do 3D smoothing, we use a line-of-sight smoothing and introduce a new statistic, the shifted skew spectra. These probe non-squeezed bispectrum triangles and avoid locally applying quadratic operators to the field by displacing one copy of the field in the radial direction. Using a fixed displacement of 40 Mpc/h (and line-of-sight smoothing of 10 Mpc/h) yields a similar agreement with the theory prediction. For the special case of correlating the squared (and displaced) field with the original one, we analytically forward model the window function making this approach readily applicable to DESI data.

skew spectra lyman-alpha forest effective field theory spectral methods shifted skew spectra
Foundational AI Oct 24, 2025

AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing

Samuel Bright-Thonney, Christina Reissel, Gaia Grosso et al.

Novelty detection in large scientific datasets faces two key challenges: the noisy and high-dimensional nature of experimental data, and the necessity of making statistically robust statements about any observed outliers. While there is a wealth of literature on anomaly detection via dimensionality reduction, most methods do not produce outputs compatible with quantifiable claims of scientific discovery. In this work we directly address these challenges, presenting the first step towards a unified pipeline for novelty detection adapted for the rigorous statistical demands of science. We introduce AutoSciDACT (Automated Scientific Discovery with Anomalous Contrastive Testing), a general-purpose pipeline for detecting novelty in scientific data. AutoSciDACT begins by creating expressive low-dimensional data representations using a contrastive pre-training, leveraging the abundance of high-quality simulated data in many scientific domains alongside expertise that can guide principled data augmentation strategies. These compact embeddings then enable an extremely sensitive machine learning-based two-sample test using the New Physics Learning Machine (NPLM) framework, which identifies and statistically quantifies deviations in observed data relative to a reference distribution (null hypothesis). We perform experiments across a range of astronomical, physical, biological, image, and synthetic datasets, demonstrating strong sensitivity to small injections of anomalous data across all domains.

contrastive learning anomaly detection hypothesis testing novelty detection dimensionality reduction
Foundational AI Oct 24, 2025

On Uncertainty Calibration for Equivariant Functions

Edward Berman, Jacob Ginesin, Marco Pacini et al.

Data-sparse settings such as robotic manipulation, molecular physics, and galaxy morphology classification are some of the hardest domains for deep learning. For these problems, equivariant networks can help improve modeling across undersampled parts of the input space, and uncertainty estimation can guard against overconfidence. However, until now, the relationships between equivariance and model confidence, and more generally equivariance and model calibration, has yet to be studied. Since traditional classification and regression error terms show up in the definitions of calibration error, it is natural to suspect that previous work can be used to help understand the relationship between equivariance and calibration error. In this work, we present a theory relating equivariance to uncertainty estimation. By proving lower and upper bounds on uncertainty calibration errors (ECE and ENCE) under various equivariance conditions, we elucidate the generalization limits of equivariant models and illustrate how symmetry mismatch can result in miscalibration in both classification and regression. We complement our theoretical framework with numerical experiments that clarify the relationship between equivariance and uncertainty using a variety of real and simulated datasets, and we comment on trends with symmetry mismatch, group size, and aleatoric and epistemic uncertainties.

calibration equivariant neural networks uncertainty quantification symmetry mismatch calibration error bounds
Foundational AI Oct 23, 2025

Diffusion Autoencoders with Perceivers for Long, Irregular and Multimodal Astronomical Sequences

Yunyi Shen, Alexander Gagliano

Self-supervised learning has become a central strategy for representation learning, but the majority of architectures used for encoding data have only been validated on regularly-sampled inputs such as images, audios. and videos. In many scientific domains, data instead arrive as long, irregular, and multimodal sequences. To extract semantic information from these data, we introduce the Diffusion Autoencoder with Perceivers (daep). daep tokenizes heterogeneous measurements, compresses them with a Perceiver encoder, and reconstructs them with a Perceiver-IO diffusion decoder, enabling scalable learning in diverse data settings. To benchmark the daep architecture, we adapt the masked autoencoder to a Perceiver encoder/decoder design, and establish a strong baseline (maep) in the same architectural family as daep. Across diverse spectroscopic and photometric astronomical datasets, daep achieves lower reconstruction errors, produces more discriminative latent spaces, and better preserves fine-scale structure than both VAE and maep baselines. These results establish daep as an effective framework for scientific domains where data arrives as irregular, heterogeneous sequences.

self-supervised learning diffusion models perceiver architectures irregular time series encoding autoencoders
Foundational AI Oct 22, 2025

FINDER: Feature Inference on Noisy Datasets using Eigenspace Residuals

Trajan Murphy, Akshunna S. Dogra, Hanfeng Gu et al.

''Noisy'' datasets (regimes with low signal to noise ratios, small sample sizes, faulty data collection, etc) remain a key research frontier for classification methods with both theoretical and practical implications. We introduce FINDER, a rigorous framework for analyzing generic classification problems, with tailored algorithms for noisy datasets. FINDER incorporates fundamental stochastic analysis ideas into the feature learning and inference stages to optimally account for the randomness inherent to all empirical datasets. We construct ''stochastic features'' by first viewing empirical datasets as realizations from an underlying random field (without assumptions on its exact distribution) and then mapping them to appropriate Hilbert spaces. The Kosambi-Karhunen-Loéve expansion (KLE) breaks these stochastic features into computable irreducible components, which allow classification over noisy datasets via an eigen-decomposition: data from different classes resides in distinct regions, identified by analyzing the spectrum of the associated operators. We validate FINDER on several challenging, data-deficient scientific domains, producing state of the art breakthroughs in: (i) Alzheimer's Disease stage classification, (ii) Remote sensing detection of deforestation. We end with a discussion on when FINDER is expected to outperform existing methods, its failure modes, and other limitations.

classification kosambi-karhunen-loève expansion spectral methods eigenvalue decomposition noisy dataset learning
Theoretical Physics Oct 21, 2025

An integrated neural wavefunction solver for spinful Fermi systems

Alexander Avdoshkin, Max Geier, Liang Fu

We present an approach to solving the ground state of Fermi systems that contain spin or other discrete degrees of freedom in addition to continuous coordinates. The approach combines a Markov chain Monte Carlo sampling for energy estimation that we adapted to cover the extended configuration space with a transformer-based wavefunction to represent fermionic states. This sampling is necessary when the Hamiltonian contains explicit spin dependence and, for spin-independent Hamiltonians, we find that the inclusion of spin updates leads to faster convergence to an antiferromagnetic ground state. A transformer with both continuous position and discrete spin as inputs achieves universal approximation to spinful generalized orbitals. We validate the method on a range of two-dimensional material problems: a two-dimensional electron gas with Rashba spin-orbit coupling, a noncollinear spin texture, and a quantum antiferromagnet in a honeycomb moiré potential.

neural variational monte carlo spinful fermionic ansatz transformers monte carlo methods quantum states
Foundational AI Oct 20, 2025

Tropical super Gromov-Witten invariants

Artan Sheshmani, Shing-Tung Yau, Benjamin Zhou

We show that super Gromov-Witten invariants can be defined and computed by methods of tropical geometry. When the target is a point, the super invariants are descendant invariants on the moduli space of curves, which can be computed tropically. When the target is a convex, toric variety $X$, we describe a procedure to compute the tropical Euler class of the SUSY normal bundle $\overline{N}_{n, β}$ on $\overline{\mathcal{M}}_{0,n}(X, β)$, assuming it is locally tropicalizable in the sense of [CG], [CGM]. Then, we define the tropical, genus-0, $n$-marked, super Gromov-Witten invariant of $X$, and compute an example. This gives a tropical interpretation of super Gromov-Witten invariants of convex, toric varieties.

super gromov-witten invariants tropical geometry susy normal bundle equivariant euler class moduli space of curves
Astrophysics Oct 20, 2025

Optimizing Kilonova Searches: A Case Study of the Type IIb SN 2025ulz in the Localization Volume of the Low-Significance Gravitational Wave Event S250818k

Noah Franz, Bhagya Subrayan, Charles D. Kilpatrick et al.

Kilonovae, the ultraviolet/optical/infrared counterparts to binary neutron star mergers, are an exceptionally rare class of transients. Optical follow-up campaigns are plagued by contaminating transients, which may mimic kilonovae, but do not receive sufficient observations to measure the full photometric evolution. In this work, we present an analysis of the multi-wavelength dataset of supernova (SN) 2025ulz, a proposed kilonova candidate following the low-significance detection of gravitational waves originating from the potential binary neutron star merger S250818k. Despite an early rapid decline in brightness, our multi-wavelength observations of SN 2025ulz reveal that it is a type IIb supernova. As part of this analysis, we demonstrate the capabilities of a novel quantitative scoring algorithm to determine the likelihood that a transient candidate is a kilonova, based primarily on its 3D location and light curve evolution. We also apply our scoring algorithm to other transient candidates in the localization volume of S250818k and find that, at all times after the discovery of SN 2025ulz, there are $\geq 4$ candidates with a score comparable to SN 2025ulz, indicating that the kilonova search may have benefited from the additional follow-up of other candidates. During future kilonova searches, this type of scoring algorithm will be useful to rule out contaminating transients in real time, optimizing the use of valuable telescope resources.

gravitational waves kilonova scoring supernova classification transient contamination signal detection
Astrophysics Oct 16, 2025

Hierarchical Simulation-Based Inference of Supernova Power Sources and their Physical Properties

Edgar P. Vidal, Alexander T. Gagliano, Carolina Cuesta-Lazaro

Time domain surveys such as the Vera C. Rubin Observatory are projected to annually discover millions of astronomical transients. This and complementary programs demand fast, automated methods to constrain the physical properties of the most interesting objects for spectroscopic follow up. Traditional approaches to likelihood-based inference are computationally expensive and ignore the multi-component energy sources powering astrophysical phenomena. In this work, we present a hierarchical simulation-based inference model for multi-band light curves that 1) identifies the energy sources powering an event of interest, 2) infers the physical properties of each subclass, and 3) separates physical anomalies in the learned embedding space. Our architecture consists of a transformer-based light curve summarizer coupled to a flow-matching regression module and a categorical classifier for the physical components. We train and test our model on $\sim$150k synthetic light curves generated with $\texttt{MOSFiT}$. Our network achieves a 90% classification accuracy at identifying energy sources, yields well-calibrated posteriors for all active components, and detects rare anomalies such as tidal disruption events (TDEs) through the learned latent space. This work demonstrates a scalable joint framework for population studies of known transients and the discovery of novel populations in the era of Rubin.

simulation-based inference posterior estimation multi-component energy modeling flow matching supernova classification
Astrophysics Oct 15, 2025

Constraining Power of Wavelet vs. Power Spectrum Statistics for CMB Lensing and Weak Lensing with Learned Binning

Kyle Boone, Georgios Valogiannis, Marco Gatti et al.

We present forecasts for constraints on the matter density ($Ω_m$) and the amplitude of matter density fluctuations at 8h$^{-1}$Mpc ($σ_8$) from CMB lensing convergence maps and galaxy weak lensing convergence maps. For CMB lensing convergence auto statistics, we compare the angular power spectra ($C_\ell$'s) to the wavelet scattering transform (WST) coefficients. For CMB lensing convergence $\times$ galaxy weak lensing convergence statistics, we compare the cross angular power spectra to wavelet phase harmonics (WPH). This work also serves as the first application of WST and WPH to these probes. For CMB lensing convergence, we find that WST and $C_\ell$'s yield similar constraints in forecasts for all surveys considered in this work. When CMB lensing convergence is crossed with galaxy weak lensing convergence projected from $\textit{Euclid}$ Data Release 2 (DR2), we find that WPH outperforms cross-$C_\ell$'s by factors between $2.2$ and $3.4$ for individual parameter constraints. To compare these different summary statistics, we develop a novel learned binning approach. This method compresses summary statistics while maintaining interpretability. We find this leads to improved constraints compared to more naive binning schemes for our wavelet-based statistics, but not for $C_\ell$'s. By learning the binning and measuring constraints on distinct data sets, our method is robust to overfitting by construction.

cosmic microwave background wavelet scattering transform wavelet phase harmonics learned binning cosmological simulation
Astrophysics Oct 7, 2025

Studying the gravitational-wave population without looking that FAR out

Noah E. Wolfe, Matthew Mould, Jack Heinzel et al.

From catalogs of gravitational-wave transients, the population-level properties of their sources and the formation channels of merging compact binaries can be constrained. However, astrophysical conclusions can be biased by misspecification or misestimation of the population likelihood. Despite detection thresholds on the false-alarm rate (FAR) or signal-to-noise ratio (SNR), the current catalog is likely contaminated by noise transients. Further, computing the population likelihood becomes less accurate as the catalog grows. Current methods to address these challenges often scale poorly with the number of events and potentially become infeasible for future catalogs. Here, we evaluate a simple remedy: increasing the significance threshold for including events in population analyses. To determine the efficacy of this approach, we analyze simulated catalogs of up to 1600 gravitational-wave signals from black-hole mergers using full Bayesian parameter estimation with current detector sensitivities. We show that the growth in statistical uncertainty about the black-hole population, as we analyze fewer events but with higher SNR, depends on the source parameters of interest. When the SNR threshold is raised from 11 to 15 -- reducing our catalog size by two--thirds -- we find that statistical uncertainties on the mass distribution only grow by a few 10% and constraints on the spin distribution are essentially unchanged; meanwhile, uncertainties on the high-redshift cosmic merger rate more than double. Simultaneously, numerical uncertainty in the estimate of the population likelihood more than halves, allowing us to ensure unbiased inference without additional computational expense. Our results demonstrate that focusing on higher-significance events is an effective way to facilitate robust astrophysical inference with growing gravitational-wave catalogs.

gravitational waves bayesian inference likelihood estimation uncertainty quantification posterior estimation
Foundational AI Oct 3, 2025

Topological Invariance and Breakdown in Learning

Yongyi Yang, Tomaso Poggio, Isaac Chuang et al.

We prove that for a broad class of permutation-equivariant learning rules (including SGD, Adam, and others), the training process induces a bi-Lipschitz mapping between neurons and strongly constrains the topology of the neuron distribution during training. This result reveals a qualitative difference between small and large learning rates $η$. With a learning rate below a topological critical point $η^*$, the training is constrained to preserve all topological structure of the neurons. In contrast, above $η^*$, the learning process allows for topological simplification, making the neuron manifold progressively coarser and thereby reducing the model's expressivity. Viewed in combination with the recent discovery of the edge of stability phenomenon, the learning dynamics of neuron networks under gradient descent can be divided into two phases: first they undergo smooth optimization under topological constraints, and then enter a second phase where they learn through drastic topological simplifications. A key feature of our theory is that it is independent of specific architectures or loss functions, enabling the universal application of topological methods to the study of deep learning.

topological invariance topological critical point symmetry preservation equivariant neural networks phase transitions
Experimental Physics Oct 2, 2025

Reducing Simulation Dependence in Neutrino Telescopes with Masked Point Transformers

Felix J. Yu, Nicholas Kamp, Carlos A. Argüelles

Machine learning techniques in neutrino physics have traditionally relied on simulated data, which provides access to ground-truth labels. However, the accuracy of these simulations and the discrepancies between simulated and real data remain significant concerns, particularly for large-scale neutrino telescopes that operate in complex natural media. In recent years, self-supervised learning has emerged as a powerful paradigm for reducing dependence on labeled datasets. Here, we present the first self-supervised training pipeline for neutrino telescopes, leveraging point cloud transformers and masked autoencoders. By shifting the majority of training to real data, this approach minimizes reliance on simulations, thereby mitigating associated systematic uncertainties. This represents a fundamental departure from previous machine learning applications in neutrino telescopes, paving the way for substantial improvements in event reconstruction and classification.

self-supervised learning neutrino detection transformers autoencoders robustness
Foundational AI Oct 2, 2025

Matching the Optimal Denoiser in Point Cloud Diffusion with (Improved) Rotational Alignment

Ameya Daigavane, YuQing Xie, Bodhi P. Vani et al.

Diffusion models are a popular class of generative models trained to reverse a noising process starting from a target data distribution. Training a diffusion model consists of learning how to denoise noisy samples at different noise levels. When training diffusion models for point clouds such as molecules and proteins, there is often no canonical orientation that can be assigned. To capture this symmetry, the true data samples are often augmented by transforming them with random rotations sampled uniformly over $SO(3)$. Then, the denoised predictions are often rotationally aligned via the Kabsch-Umeyama algorithm to the ground truth samples before computing the loss. However, the effect of this alignment step has not been well studied. Here, we show that the optimal denoiser can be expressed in terms of a matrix Fisher distribution over $SO(3)$. Alignment corresponds to sampling the mode of this distribution, and turns out to be the zeroth order approximation for small noise levels, explaining its effectiveness. We build on this perspective to derive better approximators to the optimal denoiser in the limit of small noise. Our experiments highlight that alignment is often a `good enough' approximation for the noise levels that matter most for training diffusion models.

diffusion models rotational alignment symmetry preservation matrix fisher distribution group theory
Foundational AI Oct 1, 2025

To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking

Hannah Lawrence, Elyssa Hofgard, Vasco Portilheiro et al.

Symmetry-aware methods for machine learning, such as data augmentation and equivariant architectures, encourage correct model behavior on all transformations (e.g. rotations or permutations) of the original dataset. These methods can improve generalization and sample efficiency, under the assumption that the transformed datapoints are highly probable, or "important", under the test distribution. In this work, we develop a method for critically evaluating this assumption. In particular, we propose a metric to quantify the amount of anisotropy, or symmetry-breaking, in a dataset, via a two-sample neural classifier test that distinguishes between the original dataset and its randomly augmented equivalent. We validate our metric on synthetic datasets, and then use it to uncover surprisingly high degrees of alignment in several benchmark point cloud datasets. We show theoretically that distributional symmetry-breaking can actually prevent invariant methods from performing optimally even when the underlying labels are truly invariant, as we show for invariant ridge regression in the infinite feature limit. Empirically, we find that the implication for symmetry-aware methods is dataset-dependent: equivariant methods still impart benefits on some anisotropic datasets, but not others. Overall, these findings suggest that understanding equivariance -- both when it works, and why -- may require rethinking symmetry biases in the data.

distributional symmetry breaking equivariant neural networks data augmentation symmetry preservation group theory
Foundational AI Oct 1, 2025

A universal compression theory for lottery ticket hypothesis and neural scaling laws

Hong-Yi Wang, Di Luo, Tomaso Poggio et al.

When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller models and substantially less data. In this work, we provide a positive and constructive answer. We prove that a generic permutation-invariant function of $d$ objects can be asymptotically compressed into a function of $\operatorname{polylog} d$ objects with vanishing error, which is proved to be the optimal compression rate. This theorem yields two key implications: (Ia) a large neural network can be compressed to polylogarithmic width while preserving its learning dynamics; (Ib) a large dataset can be compressed to polylogarithmic size while leaving the loss landscape of the corresponding model unchanged. Implication (Ia) directly establishes a proof of the dynamical lottery ticket hypothesis, which states that any ordinary network can be strongly compressed such that the learning dynamics and result remain unchanged. (Ib) shows that a neural scaling law of the form $L\sim d^{-α}$ can be boosted to an arbitrarily fast power law decay, and ultimately to $\exp(-α' \sqrt[m]{d})$.

symmetry preservation lottery ticket hypothesis polylogarithmic compression neural scaling laws scalability
Foundational AI Sep 25, 2025

Implicit Augmentation from Distributional Symmetry in Turbulence Super-Resolution

Julia Balla, Jeremiah Bailey, Ali Backour et al.

The immense computational cost of simulating turbulence has motivated the use of machine learning approaches for super-resolving turbulent flows. A central challenge is ensuring that learned models respect physical symmetries, such as rotational equivariance. We show that standard convolutional neural networks (CNNs) can partially acquire this symmetry without explicit augmentation or specialized architectures, as turbulence itself provides implicit rotational augmentation in both time and space. Using 3D channel-flow subdomains with differing anisotropy, we find that models trained on more isotropic mid-plane data achieve lower equivariance error than those trained on boundary layer data, and that greater temporal or spatial sampling further reduces this error. We show a distinct scale-dependence of equivariance error that occurs regardless of dataset anisotropy that is consistent with Kolmogorov's local isotropy hypothesis. These results clarify when rotational symmetry must be explicitly incorporated into learning algorithms and when it can be obtained directly from turbulence, enabling more efficient and symmetry-aware super-resolution.

distributional symmetry equivariant neural networks symmetry preservation implicit equivariance learning superresolution
Foundational AI Sep 24, 2025

Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy

Manuel Perez-Carrasco, Maya Nasr, Sebastien Roche et al.

Effective cloud and cloud shadow detection is a critical prerequisite for accurate retrieval of concentrations of atmospheric methane (CH4) or other trace gases in hyperspectral remote sensing. This challenge is especially pertinent for MethaneSAT, a satellite mission launched in March 2024, to fill a significant data gap in terms of resolution, precision and swath between coarse-resolution global mappers and fine-scale point-source imagers of methane, and for its airborne companion mission, MethaneAIR. MethaneSAT delivers hyperspectral data at an intermediate spatial resolution (approx. 100 x 400, m), whereas MethaneAIR provides even finer resolution (approx. 25 m), enabling the development of highly detailed maps of concentrations that enable quantification of both the sources and rates of emissions. In this study, we use machine learning methods to address the cloud and cloud shadow detection problem for sensors with these high spatial resolutions. Cloud and cloud shadows in remote sensing data need to be effectively screened out as they bias methane retrievals in remote sensing imagery and impact the quantification of emissions. We deploy and evaluate conventional techniques-including Iterative Logistic Regression (ILR) and Multilayer Perceptron (MLP)-with advanced deep learning architectures, namely U-Net and a Spectral Channel Attention Network (SCAN) method. Our results show that conventional methods struggle with spatial coherence and boundary definition, affecting the detection of clouds and cloud shadows. Deep learning models substantially improve detection quality: U-Net performs best in preserving spatial structure, while SCAN excels at capturing fine boundary details... Our data and code is publicly available at: https://doi.org/10.7910/DVN/IKLZOJ

convolutional networks hyperspectral segmentation attention mechanisms methane remote sensing ensemble methods
Theoretical Physics Sep 23, 2025

The Pareto Frontier of Resilient Jet Tagging

Rikab Gambhir, Matt LeBlanc, Yuanchen Zhou

Classifying hadronic jets using their constituents' kinematic information is a critical task in modern high-energy collider physics. Often, classifiers are designed by targeting the best performance using metrics such as accuracy, AUC, or rejection rates. However, the use of a single metric can lead to the use of architectures that are more model-dependent than competitive alternatives, leading to potential uncertainty and bias in analysis. We explore such trade-offs and demonstrate the consequences of using networks with high performance metrics but low resilience.

robustness pareto frontier jet physics classification collider physics
Astrophysics Sep 15, 2025

When IIb Ceases To Be: Bridging the Gap Between IIb and Short-plateau Supernovae

Joseph R. Farah, D. Andrew Howell, Daichi Hiramatsu et al.

Hydrogen-rich supernovae (SNe) span a range of hydrogen envelope masses at core collapse, producing diverse light curves from extended plateaus in Type II SNe to double-peaked Type IIb SNe. Recent hydrodynamic modeling predicts a continuous sequence of light-curve morphologies as hydrogen is removed, with short plateau SNe (plateau durations ~50--70 days) emerging as a transitional class. However, the observational boundary between IIb and short-plateau remains poorly defined, and thus far unobserved. We report on extensive photometric and spectroscopic follow-up of SN 2023wdd and SN 2022acrv, candidate transitional events on the low-mass end of the short-plateau class. Both exhibit weak, double-peaked light curves which we interpret as exceptionally short plateaus (10--20 days), and hybrid spectral features: persistent H$α$ absorption with He I contamination, but without the helium dominance characteristic of IIb SNe. Using analytic shock-cooling models and numerical light curve fitting, we estimate hydrogen-rich envelope masses of ~0.6--0.8 $M_\odot$ -- significantly larger than canonical IIb values ($\lesssim0.1\,M_\odot$) but consistent with the ${\sim}0.9\,M_\odot$ threshold predicted for short-plateau behavior. Although the progenitor radii inferred from analytic and numerical methods differ by factors of 2--5, envelope mass estimates are consistent across approaches. Comparisons to well-studied IIb (SN 2016gkg, SN 2022hnt), short-plateau (SN 2023ufx, SN 2006ai, SN 2016egz, SN 2006Y), and II SNe (SN 2023ixf, SN 2013ej) suggest a monotonic relationship between hydrogen envelope mass and plateau length consistent with analytic and numerical expectations. These findings provide additional evidence for a continuous distribution of envelope stripping in hydrogen-rich core-collapse progenitors and place SN 2023wdd and SN 2022acrv along the IIb/short-plateau boundary.

supernova classification envelope stripping continuum light curve modeling stellar evolution shock-cooling emission
Foundational AI Sep 15, 2025

Towards non-commutative crepant resolutions of affine toric Gorenstein varieties

Aimeric Malter, Artan Sheshmani

In this paper we prove a common generalisation of results by Špenko-Van den Bergh and Iyama-Wemyss that can be used to generate non-commutative crepant resolutions (NCCRs) of some affine toric Gorenstein varieties. We use and generalise results by Novaković to study NCCRs for affine toric Gorenstein varieties associated to cones over polytopes with interior points. As a special case, we consider the case where the polytope is reflexive with $\le \dim P+2$ vertices, using results of Borisov and Hua to show the existence of NCCRs.

non-commutative resolutions toric geometry derived categories group theory symmetry preservation
Theoretical Physics Sep 12, 2025

Gradient-based search of quantum phases: discovering unconventional fractional Chern insulators

André Grossi Fonseca, Eric Wang, Sachin Vaidya et al.

The discovery and understanding of new quantum phases has time and again transformed both fundamental physics and technology, yet progress often relies on slow, intuition-based theoretical considerations or experimental serendipity. Here, we introduce a general gradient-based framework for targeted phase discovery. We define a differentiable function, dubbed "target-phase loss function", which encodes spectral fingerprints of a quantum state, thereby recasting phase search as a tractable optimization problem in Hamiltonian space. The method is broadly applicable to phases characterized by ground-state degeneracy and can be extended to a wide range of symmetry-broken and topological orders. As a demonstration, we apply it to spinless fermions on the kagome lattice and discover two distinctive fractional Chern insulators (FCIs), verified through detailed exact diagonalization: (i) at filling $ν= 1/3$, a "non-ideal" Abelian FCI whose band geometry lies far beyond the Landau-level mimicry paradigm and all recent generalizations; and (ii) at $ν= 1/2$, a non-Abelian FCI stabilized purely by finite-range two-body interactions. These results provide the first explicit realization of such types of FCIs and establish a versatile paradigm for systematic quantum-phase discovery.

loss function design fractional chern insulators hamiltonian systems spectral methods quantum states
Astrophysics Sep 10, 2025

Characterizing Supernova Host Galaxies with FrankenBlast: A Scalable Tool for Transient Host Galaxy Association, Photometry, and Stellar Population Modeling

Anya E. Nugent, V. Ashley Villar, Alex Gagliano et al.

We present FrankenBlast, a customized and improved version of the Blast web application. FrankenBlast associates transients to their host galaxies, performs host photometry, and runs a innovative SED fitting code to constrain host stellar population properties--all within minutes per object. We test FrankenBlast on 14,432 supernovae (SNe), ~half of which are spectroscopically-classified, and are able to constrain host properties for 9262 events. When contrasting the host stellar masses ($M_*$), specific star formation rates (sSFR), and host dust extinction ($A_V$) between spectroscopically and photometrically-classified SNe Ia, Ib/c, II, and IIn, we determine that deviations in these distributions are primarily due to misclassified events contaminating the photometrically-classified sample. We further show that the higher redshifts of the photometrically-classified sample also force their $M_*$ and sSFR distributions to deviate from those of the spectroscopically-classified sample, as these properties are redshift-dependent. We compare host properties between spectroscopically-classified SN populations and determine if they primarily trace $M_*$ or SFR. We find that all SN populations seem to both depend on $M_*$ and SFR, with SNe II and IIn somewhat more SFR-dependent than SNe Ia and Ib/c, and SNe Ia more $M_*$-dependent than all other classes. We find the difference in the SNe Ib/c and II hosts the most intriguing and speculate that SNe Ib/c must be more dependent on higher $M_*$ and more evolved environments for the right conditions for progenitor formation. All data products and FrankenBlast are publicly available, along with a developing FrankenBlast version intended for Rubin Observatory science products.

supernova classification galaxy classification host galaxy sed fitting bayesian inference transient host association
Theoretical Physics Sep 5, 2025

Soliton Surfaces and the Geometry of Integrable Deformations of the $\mathbb{CP}^{N-1}$ Model

Christian Ferko, Michele Galli, Zejun Huang et al.

The $\mathbb{CP}^{N-1}$ model is an analytically tractable $2d$ quantum field theory which shares several properties with $4d$ Yang-Mills theory. By virtue of its classical integrability, this model also admits a family of integrable higher-spin auxiliary field deformations, including the $T \overline{T}$ deformation as a special case. We study the $\mathbb{CP}^{N-1}$ model and its deformations from a geometrical perspective, constructing their soliton surfaces and recasting physical properties of these theories as statements about surface geometry. We examine how the $T \overline{T}$ flow affects the unit constraint in the $\mathbb{CP}^{N-1}$ model and prove that any solution of this theory with vanishing energy-momentum tensor remains a solution under analytic stress tensor deformations -- an argument that extends to generic dimensions and instanton-like solutions in stress tensor flows including the non-analytic, $2d$, root-$T \overline{T}$ case and classes of higher-spin, Smirnov-Zamolodchikov-type, deformations. Finally, we give two geometric interpretations for general $T \overline{T}$-like deformations of symmetric space sigma models, showing that such flows can be viewed as coupling the undeformed theory to a unit-determinant field-dependent metric, or using a particular choice of moving frame on the soliton surface.

tt-bar deformation soliton surfaces quantum field theory classical integrability sigma model geometry
Foundational AI Sep 5, 2025

Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning

Jasmine Shone, Zhening Li, Shaden Alshammari et al.

The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode similarities between data points. However, a KL-based loss may be misaligned with the true objective, and properties of KL divergence such as asymmetry and unboundedness may create optimization challenges. We present Beyond I-Con, a framework that enables systematic discovery of novel loss functions by exploring alternative statistical divergences. Key findings: (1) on unsupervised clustering of DINO-ViT embeddings, we achieve state-of-the-art results by modifying the PMI algorithm to use total variation (TV) distance; (2) supervised contrastive learning with Euclidean distance as the feature space metric is improved by replacing the standard loss function with Jenson-Shannon divergence (JSD); (3) on dimensionality reduction, we achieve superior qualitative results and better performance on downstream tasks than SNE by replacing KL with a bounded $f$-divergence. Our results highlight the importance of considering divergence choices in representation learning optimization.

representation learning loss function design f-divergence generalization contrastive learning dimensionality reduction
Theoretical Physics Sep 3, 2025

Attention is all you need to solve chiral superconductivity

Chun-Tse Li, Tzen Ong, Max Geier et al.

Recent advances on neural quantum states have shown that correlations between quantum particles can be efficiently captured by {\it attention} -- a foundation of modern neural architectures that enables neural networks to learn the relation between objects. In this work, we show that a general-purpose self-attention Fermi neural network is able to find chiral $p_x \pm i p_y$ superconductivity in an attractive Fermi gas by energy minimization, {\it without prior knowledge or bias towards pairing}. The superconducting state is identified from the optimized wavefunction by measuring various physical observables: the pair binding energy, the total angular momentum of the ground state, and off-diagonal long-range order in the two-body reduced density matrix. Our work paves the way for AI-driven discovery of unconventional and topological superconductivity in strongly correlated quantum materials.

attention mechanisms chiral superconductivity transformers neural quantum states quantum states
Foundational AI Sep 3, 2025

The Optimiser Hidden in Plain Sight: Training with the Loss Landscape's Induced Metric

Thomas R. Harvey

We present a class of novel optimisers for training neural networks that makes use of the Riemannian metric naturally induced when the loss landscape is embedded in higher-dimensional space. This is the same metric that underlies common visualisations of loss landscapes. By taking this geometric perspective literally and using the induced metric, we develop a new optimiser and compare it to existing methods, namely: SGD, Adam, AdamW, and Muon, across a range of tasks and architectures. Empirically, we conclude that this new class of optimisers is highly effective in low dimensional examples, and provides slight improvement over state-of-the-art methods for training neural networks. These new optimisers have theoretically desirable properties. In particular, the effective learning rate is automatically decreased in regions of high curvature acting as a smoothed out form of gradient clipping. Similarly, one variant of these optimisers can also be viewed as inducing an effective scheduled learning rate and decoupled weight decay is the natural choice from our geometric perspective. The basic method can be used to modify any existing preconditioning method. The new optimiser has a computational complexity comparable to that of Adam.

induced riemannian metric geometric deep learning loss landscape geometry curvature-adaptive steps loss function design
Foundational AI Aug 31, 2025

Any-Order Flexible Length Masked Diffusion

Jaeyeon Kim, Lee Cheuk-Kit, Carles Domingo-Enrich et al.

Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains. MDMs generate sequences in an any-order, parallel fashion, enabling fast inference and strong performance on non-causal tasks. However, a crucial limitation is that they do not support token insertions and are thus limited to fixed-length generations. To this end, we introduce Flexible Masked Diffusion Models (FlexMDMs), a discrete diffusion paradigm that simultaneously can model sequences of flexible length while provably retaining MDMs' flexibility of any-order inference. Grounded in an extension of the stochastic interpolant framework, FlexMDMs generate sequences by inserting mask tokens and unmasking them. Empirically, we show that FlexMDMs match MDMs in perplexity while modeling length statistics with much higher fidelity. On a synthetic maze planning task, they achieve $\approx 60 \%$ higher success rate than MDM baselines. Finally, we show pretrained MDMs can easily be retrofitted into FlexMDMs: on 16 H100s, it takes only three days to fine-tune LLaDA-8B into a FlexMDM, achieving superior performance on math (GSM8K, $58\% \to 67\%$) and code infilling performance ($52\% \to 65\%$).

diffusion models stochastic processes masked diffusion variable-length generation flow matching
Experimental Physics Aug 24, 2025

GW-YOLO: Multi-transient segmentation in LIGO using computer vision

Siddharth Soni, Nikhil Mukund, Erik Katsavounidis

Time series data and their time-frequency representation from gravitational-wave interferometers present multiple opportunities for the use of artificial intelligence methods associated with signal and image processing. Closely connected with this is the real-time aspect associated with gravitational-wave interferometers and the astrophysical observations they perform; the discovery potential of these instruments can be significantly enhanced when data processing can be achieved in O(1s) timescales. In this work, we introduce a novel signal and noise identification tool based on the YOLO (You Only Look Once) object detection framework. For its application into gravitational waves, we will refer to it as GW-YOLO. This tool can provide scene identification capabilities and essential information regarding whether an observed transient is any combination of noise and signal. Additionally, it supplies detailed time-frequency coordinates of the detected objects in the form of pixel masks, an essential property that can be used to understand and characterize astrophysical sources, as well as instrumental noise. The simultaneous identification of noise and signal, combined with precise pixel-level localization, represents a significant advancement in gravitational-wave data analysis. Our approach yields a 50\% detection efficiency for binary black hole signals at a signal-to-noise ratio (SNR) of 15 when such signals overlap with transient noise artifacts. When noise artifacts overlap with binary neutron star signals, our algorithm attains 50\% detection efficiency at an SNR of 30. This presents the first quantitative assessment of the ability to detect astrophysical events overlapping with realistic, instrument noise present in gravitational-wave interferometers.

gravitational waves convolutional networks signal detection instance segmentation anomaly detection
Foundational AI Aug 22, 2025

Training a Foundation Model for Materials on a Budget

Teddy Koker, Mit Kotak, Tess Smidt

Foundation models for materials modeling are advancing quickly, but their training remains expensive, often placing state-of-the-art methods out of reach for many research groups. We introduce Nequix, a compact E(3)-equivariant potential that pairs a simplified NequIP design with modern training practices, including equivariant root-mean-square layer normalization and the Muon optimizer, to retain accuracy while substantially reducing compute requirements. Nequix has 700K parameters and was trained in 100 A100 GPU-hours. On the Matbench-Discovery and MDR Phonon benchmarks, Nequix ranks third overall while requiring a 20 times lower training cost than most other methods, and it delivers two orders of magnitude faster inference speed than the current top-ranked model. We release model weights and fully reproducible codebase at https://github.com/atomicarchitects/nequix.

equivariant neural networks interatomic potentials materials discovery scalability budget-conscious training
Foundational AI Aug 18, 2025

Efficient Constraint-Aware Flow Matching via Randomized Exploration

Zhengyan Huan, Jacob Boerma, Li-Ping Liu et al.

We consider the problem of generating samples via Flow Matching (FM) with an additional requirement that the generated samples must satisfy given constraints. We consider two scenarios, viz.: (a) when a differentiable distance function to the constraint set is given, and (b) when the constraint set is only available via queries to a membership oracle. For case (a), we propose a simple adaptation of the FM objective with an additional term that penalizes the distance between the constraint set and the generated samples. For case (b), we propose to employ randomization and learn a mean flow that is numerically shown to have a high likelihood of satisfying the constraints. This approach deviates significantly from existing works that require simple convex constraints, knowledge of a barrier function, or a reflection mechanism to constrain the probability flow. Furthermore, in the proposed setting we show that a two-stage approach, where both stages approximate the same original flow but with only the second stage probing the constraints via randomization, is more computationally efficient. Through several synthetic cases of constrained generation, we numerically show that the proposed approaches achieve significant gains in terms of constraint satisfaction while matching the target distributions. As a showcase for a practical oracle-based constraint, we show how our approach can be used for training an adversarial example generator, using queries to a hard-label black-box classifier. We conclude with several future research directions. Our code is available at https://github.com/ZhengyanHuan/FM-RE.

flow matching constraint-aware generation randomized exploration membership oracle generative models
Theoretical Physics Aug 14, 2025

Observable Optimization for Precision Theory: Machine Learning Energy Correlators

Arindam Bhattacharya, Katherine Fraser, Matthew D. Schwartz

The practice of collider physics typically involves the marginalization of multi-dimensional collider data to uni-dimensional observables relevant for some physics task. In any cases, such as classification or anomaly detection, the observable can be arbitrarily complicated, such as the output of a neural network. However, for precision measurements, the observable must correspond to something computable systematically beyond the level of current simulation tools. In this work, we demonstrate that precision-theory-compatible observable space exploration can be systematized by using neural simulation-based inference techniques from machine learning. We illustrate this approach by exploring the space of marginalizations of the energy 3-point correlator to optimize sensitivity to the the top quark mass. We first learn the energy-weighted probability density from simulation, then search in the space of marginalizations for an optimal triangle shape. Although simulations and machine learning are used in the process of observable optimization, the output is an observable definition which can be then computed to high precision and compared directly to data without any memory of the computations which produced it. We find that the optimal marginalization is isosceles triangles on the sphere with a side ratio approximately $1:1:\sqrt{2}$ (i.e. right triangles) within the set of marginalizations we consider.

simulation-based inference collider physics observable marginalization density estimation normalizing flows
Experimental Physics Aug 12, 2025

The Neutrino Kaleidoscope: Searches for Non-Standard Neutrino Oscillations at Neutrino Telescopes with a TeV Muon Accelerator Source

Nicholas W. Kamp, Gray Putnam

Muon accelerators, a potential technology for enabling $\mathcal{O}$(10 TeV) parton center of mass energy collisions, would also source an intense, collimated beam of neutrinos at TeV energies. The energy and size of this beam would be excellently matched as a source for existing and planned neutrino telescopes: gigaton-sized detectors of astrophysical neutrinos at and above TeV energies. In this paper, we introduce the technical considerations and scientific reach of pairing a muon accelerator source of neutrinos with a neutrino telescope detector, a combination we dub the ''Neutrino Kaleidoscope''. In particular, such a pairing would enable searches for non-standard oscillations of the beam neutrinos as they traverse the earth between source and detector. These non-standard neutrino oscillations could be sourced by Lorentz invariance violation, which a neutrino kaleidoscope could probe up to the quantum gravity-motivated Planck scale. Such a search would also have a reach on sterile neutrinos orders of magnitude beyond existing terrestrial limits. Finally, we touch on some of the non-oscillation potential of a neutrino kaleidoscope.

neutrino detection new physics searches sterile neutrinos lorentz invariance violation collider physics
Foundational AI Aug 11, 2025

The DNA of nuclear models: How AI predicts nuclear masses

Kate A. Richardson, Sokratis Trifinopoulos, Mike Williams

Obtaining high-precision predictions of nuclear masses, or equivalently nuclear binding energies, $E_b$, remains an important goal in nuclear-physics research. Recently, many AI-based tools have shown promising results on this task, some achieving precision that surpasses the best physics models. However, the utility of these AI models remains in question given that predictions are only useful where measurements do not exist, which inherently requires extrapolation away from the training (and testing) samples. Since AI models are largely black boxes, the reliability of such an extrapolation is difficult to assess. We present an AI model that not only achieves cutting-edge precision for $E_b$, but does so in an interpretable manner. For example, we find that (and explain why) the most important dimensions of its internal representation form a double helix, where the analog of the hydrogen bonds in DNA here link the number of protons and neutrons found in the most stable nucleus of each isotopic chain. Furthermore, we show that the AI prediction of $E_b$ can be factorized and ordered hierarchically, with the most important terms corresponding to well-known symbolic models (such as the famous liquid drop). Remarkably, the improvement of the AI model over symbolic ones can almost entirely be attributed to an observation made by Jaffe in 1969 based on the structure of most known nuclear ground states. The end result is a fully interpretable data-driven model of nuclear masses based on physics deduced by AI.

representation learning interpretability nuclear binding energy multi-task learning liquid drop model
Astrophysics Aug 7, 2025

Detecting Model Misspecification in Cosmology with Scale-Dependent Normalizing Flows

Aizhan Akhmetzhanova, Carolina Cuesta-Lazaro, Siddharth Mishra-Sharma

Current and upcoming cosmological surveys will produce unprecedented amounts of high-dimensional data, which require complex high-fidelity forward simulations to accurately model both physical processes and systematic effects which describe the data generation process. However, validating whether our theoretical models accurately describe the observed datasets remains a fundamental challenge. An additional complexity to this task comes from choosing appropriate representations of the data which retain all the relevant cosmological information, while reducing the dimensionality of the original dataset. In this work we present a novel framework combining scale-dependent neural summary statistics with normalizing flows to detect model misspecification in cosmological simulations through Bayesian evidence estimation. By conditioning our neural network models for data compression and evidence estimation on the smoothing scale, we systematically identify where theoretical models break down in a data-driven manner. We demonstrate a first application to our approach using matter and gas density fields from three CAMELS simulation suites with different subgrid physics implementations.

normalizing flows out-of-distribution detection scale-dependent inference cosmological simulation model validation
Astrophysics Jul 30, 2025

Identification and photometric classification of extragalactic transients in the Vera C. Rubin Observatory's Data Preview 1

James Freeburn, Igor Andreoni, Kaylee M. de Soto et al.

The Vera C. Rubin Observatory will soon survey the southern sky, delivering a depth and sky coverage that is unprecedented in time domain astronomy. As part of commissioning, Data Preview 1 (DP1) has been released. It comprises a LSSTComCam observing campaign between November and December 2024 with multi-band imaging of seven fields, covering roughly 0.4 square degrees each, providing a first glimpse into the data products that will become available once the Legacy Survey of Space and Time begins. In this work, we search three fields for extragalactic transients. We identify eight new likely supernovae, and three known ones from a sample of 369,644 difference image analysis objects. Photometric classification using Superphot+ assigns sub-classes with >95% confidence to only one SN Ia and one SN II in this sample. Our findings are in agreement with supernova detection rate predictions of $15\pm4$ supernovae from simulations using simsurvey. The supernova detection rate in the data is possibly affected by the lack of suitable templates. Nevertheless, this work demonstrates the quality of the data products delivered in DP1 and indicates that the Rubin Observatory's Legacy Survey of Space and Time (LSST) is well placed to fulfill its discovery potential in time domain astronomy.

supernova classification difference image analysis classification signal detection photometric light curve fitting
Theoretical Physics Jul 23, 2025

Analytic Regression of Feynman Integrals from High-Precision Numerical Sampling

Oscar Barrera, Aurélien Dersy, Rabia Husain et al.

In mathematics or theoretical physics one is often interested in obtaining an exact analytic description of some data which can be produced, in principle, to arbitrary accuracy. For example, one might like to know the exact analytical form of a definite integral. Such problems are not well-suited to numerical symbolic regression, since typical numerical methods lead only to approximations. However, if one has some sense of the function space in which the analytic result should lie, it is possible to deduce the exact answer by judiciously sampling the data at a sufficient number of points with sufficient precision. We demonstrate how this can be done for the computation of Feynman integrals. We show that by combining high-precision numerical integration with analytic knowledge of the function space one can often deduce the exact answer using lattice reduction. A number of examples are given as well as an exploration of the trade-offs between number of datapoints, number of functional predicates, precision of the data, and compute. This method provides a bottom-up approach that neatly complements the top-down Landau-bootstrap approach of trying to constrain the exact answer using the analytic structure alone. Although we focus on the application to Feynman integrals, the techniques presented here are more general and could apply to a wide range of problems where an exact answer is needed and the function space is sufficiently well understood.

scattering amplitudes regression quantum field theory high-precision numerical integration lattice reduction
Astrophysics Jul 22, 2025

Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data

Yunyi Shen, Alexander T. Gagliano

Time-domain astrophysics relies on heterogeneous and multi-modal data. Specialized models are often constructed to extract information from a single modality, but this approach ignores the wealth of cross-modality information that may be relevant for the tasks to which the model is applied. In this work, we propose a multi-modal, mixture-of-expert variational autoencoder to learn a joint embedding for supernova light curves and spectra. Our method, which is inspired by the Perceiver architecture, natively accommodates variable-length inputs and the irregular temporal sampling inherent to supernova light curves. We train our model on radiative transfer simulations and validate its performance on cross-modality reconstruction of supernova spectra and physical parameters from the simulation. Our model achieves superior performance in cross-modality generation to nearest-neighbor searches in a contrastively-trained latent space, showing its promise for constructing informative latent representations of multi-modal astronomical datasets.

variational autoencoders mixture of experts cross-modality generation perceiver architectures representation learning
Astrophysics Jul 17, 2025

reLAISS: A Python Package for Flexible Similarity Searches of Supernovae and Their Host Galaxies

E. Reynolds, A. Gagliano, V. A. Villar

Discovery rates of supernovae are expected to surpass one million events annually with the Vera C. Rubin Observatory. With unprecedented sample sizes of both common and rare transient types, photometric classification alone will be insufficient for finding one-in-a-million events and prioritizing the 1% of events for spectroscopic follow-up observations. Here, we present reLAISS, a modified framework for similarity searches of supernovae using extracted features of ZTF light curves and Pan-STARRS host galaxy photometry and built on the original LAISS framework. Unlike its predecessor, reLAISS couples interpretable light curve morphology features with extinction-corrected host-galaxy colors to probe both explosion physics and associated stellar populations simultaneously. The library allows users to customize the number of neighbors retrieved, the weight of host and light curve features, and the use of Monte Carlo simulations to ensure relevant matches when features are poorly constrained. We release reLAISS as a pip-installable package with an accompanying reference set of 20,000 features, and a set of tutorials that demonstrate the code's expanded functionality. All source code can be found at https://github.com/evan-reynolds/re-laiss .

transient similarity search supernova classification feature extraction anomaly detection monte carlo methods
Astrophysics Jul 16, 2025

CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching

Sidharth Kannan, Tian Qiu, Carolina Cuesta-Lazaro et al.

Generative machine learning models have been demonstrated to be able to learn low dimensional representations of data that preserve information required for downstream tasks. In this work, we demonstrate that flow matching based generative models can learn compact, semantically rich latent representations of field level cold dark matter (CDM) simulation data without supervision. Our model, CosmoFlow, learns representations 32x smaller than the raw field data, usable for field level reconstruction, synthetic data generation, and parameter inference. Our model also learns interpretable representations, in which different latent channels correspond to features at different cosmological scales.

flow matching representation learning cosmological simulation dimensionality reduction generative models
Foundational AI Jul 10, 2025

The $\mathcal{D}$-Geometric Hilbert Scheme -- Part I: Involutivity and Stability

Jacob Kryczka, Artan Sheshmani

We construct a moduli space of formally integrable and involutive ideal sheaves arising from systems of partial differential equations (PDEs) in the algebro-geometric setting, by introducing the $\mathcal{D}$-Hilbert and $\mathcal{D}$-Quot functors in the sense of Grothendieck and establishing their representability. Central to this construction is the notion of Spencer (semi-)stability, which presents an extension of classical stability conditions from gauge theory and complex geometry, and which provides the boundedness needed for our moduli problem. As an application, we show that for flat connections on compact Kähler manifolds, Spencer poly-stability of the associated PDE ideal is equivalent to the existence of a Hermitian-Yang-Mills metric. This result provides a refinement of the classical Donaldson-Uhlenbeck-Yau correspondence, and identifies Spencer cohomology and stability as a unifying framework for geometric PDEs.

spencer cohomology moduli of pde systems spencer stability group theory quantum field theory
Foundational AI Jul 1, 2025

Proof of a perfect platonic representation hypothesis

Liu Ziyin, Isaac Chuang

In this note, we elaborate on and explain in detail the proof given by Ziyin et al. (2025) of the ``perfect" Platonic Representation Hypothesis (PRH) for the embedded deep linear network model (EDLN). We show that if trained with the stochastic gradient descent (SGD), two EDLNs with different widths and depths and trained on different data will become Perfectly Platonic, meaning that every possible pair of layers will learn the same representation up to a rotation. Because most of the global minima of the loss function are not Platonic, that SGD only finds the perfectly Platonic solution is rather extraordinary. The proof also suggests at least six ways the PRH can be broken. We also show that in the EDLN model, the emergence of the Platonic representations is due to the same reason as the emergence of progressive sharpening. This implies that these two seemingly unrelated phenomena in deep learning can, surprisingly, have a common cause. Overall, the theory and proof highlight the importance of understanding emergent "entropic forces" due to the irreversibility of SGD training and their role in representation learning. The goal of this note is to be instructive while avoiding jargon and lengthy technical details.

representation learning platonic representation hypothesis implicit regularization entropic forces stochastic processes
Astrophysics Jun 30, 2025

Modeling the Cosmological Lyman-$α$ Forest at the Field Level

Roger de Belsunce, Mikhail M. Ivanov, James M. Sullivan et al.

The distribution of absorption lines in the spectra of distant quasars, called the Lyman-$α$ (Ly-$α$) forest, is a unique probe of cosmology and the intergalactic medium at high redshifts and small scales. The statistical power of ongoing redshift surveys demands precise theoretical tools to model the Ly-$α$ forest. We address this challenge by developing an analytic, perturbative forward model to predict the Ly-$α$ forest at the field level for a given set of cosmological initial conditions. Our model shows a remarkable performance when compared with the Sherwood hydrodynamic simulations: it reproduces the flux distribution, the Ly-$α$ - dark matter halo cross-correlations, and the count-in-cell statistics at the percent level down to scales of a few Mpc. Our work provides crucial tools that bridge analytic modeling on large scales with simulations on small-scales, enabling field-level inference from Ly-$α$ forest data and simulation-based priors for cosmological analyses. This is especially timely for realizing the full scientific potential of the Ly-$α$ forest measurements by the Dark Energy Spectroscopic Instrument.

lyman-alpha forest effective field theory field-level inference perturbative bias expansion cosmological simulation
Foundational AI Jun 27, 2025

Exploration Behavior of Untrained Policies

Jacob Adamczyk

Exploration remains a fundamental challenge in reinforcement learning (RL), particularly in environments with sparse or adversarial reward structures. In this work, we study how the architecture of deep neural policies implicitly shapes exploration before training. We theoretically and empirically demonstrate strategies for generating ballistic or diffusive trajectories from untrained policies in a toy model. Using the theory of infinite-width networks and a continuous-time limit, we show that untrained policies return correlated actions and result in non-trivial state-visitation distributions. We discuss the distributions of the corresponding trajectories for a standard architecture, revealing insights into inductive biases for tackling exploration. Our results establish a theoretical and experimental framework for using policy initialization as a design tool to understand exploration behavior in early training.

reinforcement learning policy initialization stochastic processes infinite-width limit kernel methods
Theoretical Physics Jun 25, 2025

Holography with Null Boundaries

Christian Ferko, Savdeep Sethi

One of the key issues in holography is going beyond $\mathrm{AdS}$ and defining quantum gravity in spacetimes with a null boundary. Recent examples of this type involve linear dilaton asymptotics and are related to the $T \overline{T}$ deformation. We present a holographic correspondence derived from string theory, which is an example of a kind of celestial holography. The holographic definition is a spacetime non-commutative open string theory supported on D1-D5 branes together with fundamental strings. The gravity solutions interpolate between $\mathrm{AdS}_3$ metrics and six-dimensional metrics. Radiation can escape to null infinity, which makes both the encoding of quantum information in the boundary and the dynamics of black holes quite different from $\mathrm{AdS}$ spacetimes.

holography null boundary holography string theory non-commutative open string theory conformal field theory
Theoretical Physics Jun 19, 2025

Qubit thermodynamics: Entropy production from nonadiabatic driving

Pavel Zhelnin, Lucas Johns, Carlos A. Argüelles

Adiabaticity is a cornerstone of many promising approaches to quantum control, computing, and simulation. In practice, however, there is always a trade-off. Although the deleterious effects of noise can be diminished by running a control schedule more quickly, this benefit comes at the expense of nonadiabaticity. To put these two unwanted effects on the same theoretical footing, we analyze the nonadiabatic error in qubit control as a form of entropy production, examining the mechanism by which fine-grained information is effectively lost despite the dynamics being fundamentally unitary. A crucial issue here is the question of how to define equilibrium under a time-dependent Hamiltonian. Using the Landau--Zener protocol as a test case, we show that entropy increases nearly monotonically when equilibrium is defined with respect to the effective Hamiltonian in the optimal superadiabatic frame. We then consider single-passage Landau--Zener--Stückelberg--Majorana interferometry, in which the initial state of the qubit is arbitrary. Violations of the second law of thermodynamics are possible but require exquisite control to achieve deliberately.

nonadiabatic driving superadiabatic frame coarse-grained entropy quantum states hamiltonian systems
Foundational AI Jun 18, 2025

Geography of Landau-Ginzburg models and threefold syzygies

Yang He, Artan Sheshmani

We study the behavior of toric Landau-Ginzburg models under extremal contraction and minimal model program. We also establish a relation between the moduli space of toric Landau-Ginzburg models and the geography of central models. We conjecture that there is a correspondence between extremal contractions and minimal model program on Fano varieties, and degenerations of their associated toric Landau-Ginzburg models written explicitly. We prove the conjectures for smooth toric varieties, as well as general smooth Fano varieties in dimensions 2 and 3. As an application, we compute the elementary syzygies for smooth Fano threefolds.

landau-ginzburg mirror symmetry mori minimal model program sarkisov program fano variety geometry threefold syzygies
Foundational AI Jun 16, 2025

The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products

YuQing Xie, Ameya Daigavane, Mit Kotak et al.

$E(3)$-equivariant neural networks have demonstrated success across a wide range of 3D modelling tasks. A fundamental operation in these networks is the tensor product, which interacts two geometric features in an equivariant manner to create new features. Due to the high computational complexity of the tensor product, significant effort has been invested to optimize the runtime of this operation. For example, Luo et al. (2024) recently proposed the Gaunt tensor product (GTP) which promises a significant speedup. In this work, we provide a careful, systematic analysis of a number of tensor product operations. In particular, we emphasize that different tensor products are not performing the same operation. The reported speedups typically come at the cost of expressivity. We introduce measures of expressivity and interactability to characterize these differences. In addition, we realized the original implementation of GTP can be greatly simplified by directly using a spherical grid at no cost in asymptotic runtime. This spherical grid approach is faster on our benchmarks and in actual training of the MACE interatomic potential by 30%. Finally, we provide the first systematic microbenchmarks of the various tensor product operations. We find that the theoretical runtime guarantees can differ wildly from empirical performance, demonstrating the need for careful application-specific benchmarking. Code is available at https://github.com/atomicarchitects/PriceofFreedom.

equivariant neural networks group theory tensor product expressivity geometric deep learning symmetry preservation
Astrophysics Jun 6, 2025

A Detection of Helium in the Bright Superluminous Supernova SN 2024rmj

Harsh Kumar, Edo Berger, Peter K. Blanchard et al.

We present extensive ultraviolet (UV), optical, and near-infrared (NIR) photometric and spectroscopic observations of the nearby hydrogen-poor superluminous supernova (SLSN-I) SN2024rmj at z = 0.1189. SN 2024rmj reached a peak absolute magnitude of Mg $\approx$ -21.9, placing it at the luminous end of the SLSN-I distribution. The light curve exhibits a pronounced pre-peak bump ($\approx$ 60 d before the main peak) and a post-peak bump ($\approx$ 55 d after the main peak). The bulk of the light curve is otherwise well fit by a magnetar spin-down model, with typical values (spin: $\approx$ 2.1 ms; magnetic field: $\approx$ 6 $\times$ 10$^{13}$ G; ejecta mass: $\approx$ 12 M$_\odot$). The optical spectra exhibit characteristic SLSN-I features and evolution, but with a relatively high velocity of $\approx$ 8,000 km s$^{-1}$ post-peak. Most significantly, we find a clear detection of helium in the NIR spectra at He I $λ$1.083 $μ$m and $λ$2.058 $μ$m, blueshifted by $\approx$ 15,000 km s$^{-1}$ (13 d before peak) and $\approx$ 13,000 km s$^{-1}$ (40 d after peak), indicating that helium is confined to the outermost ejecta; based on these NIR detections, we also identify likely contribution from He I $λ$5876 Å in the optical spectra on a similar range of timescales. This represents the most definitive detection of helium in a bright SLSN-I to date, and indicates that progenitors with a thin helium layer can still explode as SLSNe.

signal detection supernova classification spectral methods magnetar spin-down stellar evolution
Foundational AI Jun 3, 2025

Generative Perception of Shape and Material from Differential Motion

Xinran Nicole Han, Ko Nishino, Todd Zickler

Perceiving the shape and material of an object from a single image is inherently ambiguous, especially when lighting is unknown and unconstrained. Despite this, humans can often disentangle shape and material, and when they are uncertain, they often move their head slightly or rotate the object to help resolve the ambiguities. Inspired by this behavior, we introduce a novel conditional denoising-diffusion model that generates samples of shape-and-material maps from a short video of an object undergoing differential motions. Our parameter-efficient architecture allows training directly in pixel-space, and it generates many disentangled attributes of an object simultaneously. Trained on a modest number of synthetic object-motion videos with supervision on shape and material, the model exhibits compelling emergent behavior: For static observations, it produces diverse, multimodal predictions of plausible shape-and-material maps that capture the inherent ambiguities; and when objects move, the distributions converge to more accurate explanations. The model also produces high-quality shape-and-material estimates for less ambiguous, real-world objects. By moving beyond single-view to continuous motion observations, and by using generative perception to capture visual ambiguities, our work suggests ways to improve visual reasoning in physically-embodied systems.

diffusion models generative perception generative models disentangled representations differential motion cues
Astrophysics Jun 2, 2025

A Wide Field Map of Ultra-Compact Dwarfs in the Coma Cluster

Richard T. Pomeroy, Juan P. Madrid, Conor R. O'Neill et al.

A dataset of 23,351 globular clusters (GCs) and ultra-compact dwarfs (UCDs) in the Coma cluster of galaxies was built using Hubble Space Telescope Advanced Camera for Surveys data. Based on the standard magnitude cut of $M_V \leq -11$, a total of 523 UCD candidates are found within this dataset of Compact Stellar Systems (CSS). From a color-magnitude diagram (CMD) analysis built using this catalog, we find a clear mass-magnitude relation extending marginally into the UCD parameter space. The luminosity function defined by this dataset, shows an excess of sources at bright magnitudes, suggesting a bimodal formation scenario for UCDs. We estimate the number of UCDs with a different origin than GC to be $N_{UCD} \geq 32 \pm 1$. We derive the total number of CSS within the core (1 Mpc) of Coma to be $N_{CSS} \approx 69,400 \pm 1400$. The radial distribution of UCDs in Coma shows that, like GCs, UCDs agglomerate around three giant ellipticals: NGC 4874, NGC 4889, and IC 4051. We find UCDs are more centrally concentrated around these three ellipticals than GCs. IC 4051 has a satellite population of UCDs similar to NGC 4874 and NGC 4889. We estimate only ~14% of UCDs, inhabit the intracluster space (ICUCD) between galaxies in the region, in comparison to ~24% for GCs (ICGC). We find red (metal-rich) UCDs are more likely located closer to a host galaxy, with blue (metal-poor) UCDs showing a greater dispersion and lower average density in the region.

compact stellar systems color-magnitude diagram galaxy classification tidal stripping intracluster population
Foundational AI Jun 2, 2025

A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models

YuQing Xie, Tess Smidt

Equivariant neural networks have proven to be effective for tasks with known underlying symmetries. However, optimizing equivariant networks can be tricky and best training practices are less established than for standard networks. In particular, recent works have found small training benefits from relaxing equivariance constraints. This raises the question: do equivariance constraints introduce fundamental obstacles to optimization? Or do they simply require different hyperparameter tuning? In this work, we investigate this question through a theoretical analysis of the loss landscape geometry. We focus on networks built using permutation representations, which we can view as a subset of unconstrained MLPs. Importantly, we show that the parameter symmetries of the unconstrained model has nontrivial effects on the loss landscape of the equivariant subspace and under certain conditions can provably prevent learning of the global minima. Further, we empirically demonstrate in such cases, relaxing to an unconstrained MLP can sometimes solve the issue. Interestingly, the weights eventually found via relaxation corresponds to a different choice of group representation in the hidden layer. From this, we draw 3 key takeaways. (1) By viewing the unconstrained version of an architecture, we can uncover hidden parameter symmetries which were broken by choice of constraint enforcement (2) Hidden symmetries give important insights on loss landscapes and can induce critical points and even minima (3) Hidden symmetry induced minima can sometimes be escaped by constraint relaxation and we observe the network jumps to a different choice of constraint enforcement. Effective equivariance relaxation may require rethinking the fixed choice of group representation in the hidden layers.

equivariant neural networks group theory loss landscape geometry parameter symmetry symmetry preservation
Astrophysics May 30, 2025

SPLASH: A Rapid Host-Based Supernova Classifier for Wide-Field Time-Domain Surveys

Adam Boesky, V. Ashley Villar, Alexander Gagliano et al.

The upcoming Legacy Survey of Space and Time (LSST) conducted by the Vera C. Rubin Observatory will detect millions of supernovae (SNe) and generate millions of nightly alerts, far outpacing available spectroscopic resources. Rapid, scalable photometric classification methods are therefore essential for identifying young SNe for follow-up and enabling large-scale population studies. We present SPLASH, a host-based classification pipeline that infers supernova classes using only host galaxy photometry. SPLASH first associates SNe with their hosts (yielding a redshift estimate), then infers host galaxy stellar mass and star formation rate using deep learning, and finally classifies SNe using a random forest trained on these inferred properties, along with host-SN angular separation and redshift. SPLASH achieves a binary (Type Ia vs. core-collapse) classification accuracy of $76\%$ and an F1-score of $69\%$, comparable to other state-of-the-art methods. By selecting only the most confident predictions, SPLASH can return highly pure subsets of all major SN types, making it well-suited for targeted follow-up. Its efficient design allows classification of $\sim 500$ SNe per second, making it ideal for next-generation surveys. Moreover, its intermediate inference step enables selection of transients by host environment, providing a tool not only for classification but also for probing the demographics of stellar death.

supernova classification classification ensemble methods host-sn association galaxy classification
Experimental Physics May 30, 2025

Frequentist Uncertainties on Neural Density Ratios with wifi Ensembles

Sean Benevedes, Jesse Thaler

We introduce wifi ensembles as a novel framework to obtain asymptotic frequentist uncertainties on density ratios, with a particular focus on neural ratio estimation in the context of high-energy physics. When the density ratio of interest is a likelihood ratio conditioned on parameters, wifi ensembles can be used to perform simulation-based inference on those parameters. After training the basis functions f_i(x), uncertainties on the weights w_i can be straightforwardly propagated to the estimated parameters without requiring extraneous bootstraps. To demonstrate this approach, we present an application in quantum chromodynamics at the Large Hadron Collider, using wifi ensembles to estimate the likelihood ratio between generated quark and gluon jets. We use this learned likelihood ratio to estimate the quark fraction in a synthetic mixed quark/gluon sample, showing that the resultant uncertainties empirically satisfy the desired coverage properties.

likelihood ratio uncertainty quantification wifi ensembles ensemble methods simulation-based inference
Foundational AI May 26, 2025

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Fan Chen, Zeyu Jia, Alexander Rakhlin et al.

Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving $\widetilde{O}({C_{\rm cov} H^3}/{ε^2})$ sample complexity, where $C_{\rm cov}$ is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential separation for certain MDPs. For deterministic MDPs, we show how to eliminate the completeness assumption, dramatically simplifying the algorithm. We further extend our approach to preference-based feedback settings, proving that equivalent statistical efficiency can be achieved even under more limited information. Together, these results constitute a theoretical foundation for understanding the statistical properties of outcome-based reinforcement learning.

reinforcement learning outcome-based feedback credit assignment reward optimization coverability coefficient
Foundational AI May 24, 2025

How to build a consistency model: Learning flow maps via self-distillation

Nicholas M. Boffi, Michael S. Albergo, Eric Vanden-Eijnden

Flow-based generative models achieve state-of-the-art sample quality, but require the expensive solution of a differential equation at inference time. Flow map models, commonly known as consistency models, encompass many recent efforts to improve inference-time efficiency by learning the solution operator of this differential equation. Yet despite their promise, these models lack a unified description that clearly explains how to learn them efficiently in practice. Here, building on the methodology proposed in Boffi et. al. (2024), we present a systematic algorithmic framework for directly learning the flow map associated with a flow or diffusion model. By exploiting a relationship between the velocity field underlying a continuous-time flow and the instantaneous rate of change of the flow map, we show how to convert any distillation scheme into a direct training algorithm via self-distillation, eliminating the need for pre-trained teachers. We introduce three algorithmic families based on different mathematical characterizations of the flow map: Eulerian, Lagrangian, and Progressive methods, which we show encompass and extend all known distillation and direct training schemes for consistency models. We find that the novel class of Lagrangian methods, which avoid both spatial derivatives and bootstrapping from small steps by design, achieve significantly more stable training and higher performance than more standard Eulerian and Progressive schemes. Our methodology unifies existing training schemes under a single common framework and reveals new design principles for accelerated generative modeling. Associated code is available at https://github.com/nmboffi/flow-maps.

flow matching flow map consistency models self-distillation generative models
Experimental Physics May 22, 2025

Fast Low Energy Reconstruction using Convolutional Neural Networks

IceCube Collaboration

IceCube is a Cherenkov detector instrumenting over a cubic kilometer of glacial ice deep under the surface of the South Pole. The DeepCore sub-detector lowers the detection energy threshold to a few GeV, enabling the precise measurements of neutrino oscillation parameters with atmospheric neutrinos. The reconstruction of neutrino interactions inside the detector is essential in studying neutrino oscillations. It is particularly challenging to reconstruct sub-100 GeV events with the IceCube detectors due to the relatively sparse detection units and detection medium. Convolutional neural networks (CNNs) are broadly used in physics experiments for both classification and regression purposes. This paper discusses the CNNs developed and employed for the latest IceCube-DeepCore oscillation measurements. These CNNs estimate various properties of the detected neutrinos, such as their energy, direction of arrival, interaction vertex position, flavor-related signature, and are also used for background classification.

convolutional networks event reconstruction neutrino detection neutrino oscillation reconstruction classification
Foundational AI May 21, 2025

On the creation of narrow AI: hierarchy and nonlocality of neural network skills

Eric J. Michaud, Asher Parker-Sartori, Max Tegmark

We study the problem of creating strong, yet narrow, AI systems. While recent AI progress has been driven by the training of large general-purpose foundation models, the creation of smaller models specialized for narrow domains could be valuable for both efficiency and safety. In this work, we explore two challenges involved in creating such systems, having to do with basic properties of how neural networks learn and structure their representations. The first challenge regards when it is possible to train narrow models from scratch. Through experiments on a synthetic task, we find that it is sometimes necessary to train networks on a wide distribution of data to learn certain narrow skills within that distribution. This effect arises when skills depend on each other hierarchically, and training on a broad distribution introduces a curriculum which substantially accelerates learning. The second challenge regards how to transfer particular skills from large general models into small specialized models. We find that model skills are often not perfectly localized to a particular set of prunable components. However, we find that methods based on pruning can still outperform distillation. We investigate the use of a regularization objective to align desired skills with prunable components while unlearning unnecessary skills.

curriculum learning model pruning representation learning skill nonlocality disentangled representations
Astrophysics May 19, 2025

Modeling Galaxy Surveys with Hybrid SBI

Gemma Zhang, Chirag Modi, Oliver H. E. Philcox

Simulation-based inference (SBI) has emerged as a powerful tool for extracting cosmological information from galaxy surveys deep into the non-linear regime. Despite its great promise, its application is limited by the computational cost of running simulations that can describe the increasingly-large cosmological datasets. Recent work proposed a hybrid SBI framework (HySBI), which combines SBI on small-scales with perturbation theory (PT) on large-scales, allowing information to be extracted from high-resolution observations without large-volume simulations. In this work, we lay out the HySBI framework for galaxy clustering, a key step towards its application to next-generation datasets. We study the choice of priors on the parameters for modeling galaxies in PT analysis and in simulation-based analyses, as well as investigate their cosmology dependence. By jointly modeling large- and small-scale statistics and their associated nuisance parameters, we show that HySBI can obtain 20\% and 60\% tighter constraints on $Ω_m$ and $σ_8$, respectively, compared to traditional PT analyses, thus demonstrating the efficacy of this approach to maximally extract information from upcoming spectroscopic datasets.

simulation-based inference hybrid sbi posterior estimation effective field theory cosmological simulation
Foundational AI May 19, 2025

On the normality of commuting scheme for general linear Lie algebra

Artan Sheshmani, Xiaopeng Xia, Beihui Yuan

The commuting scheme $\mathfrak{C}^{d}_{\mathfrak{g}}$ for reductive Lie algebra $\mathfrak{g}$ over an algebraically closed field $\mathbb{K}$ is the subscheme of $\mathfrak{g}^{d}$ defined by quadratic equations, whose $\mathbb{K}$-valued points are $d$-tuples of commuting elements in $\mathfrak{g}$ over $\mathbb{K}$. There is a long-standing conjecture that the commuting scheme $\mathfrak{C}^{d}_{\mathfrak{g}}$ is reduced. Moreover, a higher dimensional analog of Chevalley restriction conjecture was conjectured by Chen-Ngô. We show that the commuting scheme of $\mathfrak{C}^{2}_{\mathfrak{g}l_{n}}$ is Cohen-Macaulay and normal. As a corollary, we prove a 2-dimensional Chevalley restriction theorem for general linear group in positive characteristic.

commuting scheme cohen-macaulay normality group theory chevalley restriction theorem symmetry preservation
Foundational AI May 18, 2025

Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning

Liu Ziyin, Yizhou Xu, Isaac Chuang

With the rapid discovery of emergent phenomena in deep learning and large language models, understanding their cause has become an urgent need. Here, we propose a rigorous entropic-force theory for understanding the learning dynamics of neural networks trained with stochastic gradient descent (SGD) and its variants. Building on the theory of parameter symmetries and an entropic loss landscape, we show that representation learning is crucially governed by emergent entropic forces arising from stochasticity and discrete-time updates. These forces systematically break continuous parameter symmetries and preserve discrete ones, leading to a series of gradient balance phenomena that resemble the equipartition property of thermal systems. These phenomena, in turn, (a) explain the universal alignment of neural representations between AI models and lead to a proof of the Platonic Representation Hypothesis, and (b) reconcile the seemingly contradictory observations of sharpness- and flatness-seeking behavior of deep learning optimization. Our theory and experiments demonstrate that a combination of entropic forces and symmetry breaking is key to understanding emergent phenomena in deep learning.

entropic forces symmetry breaking representation learning loss function design stochastic processes
Foundational AI May 15, 2025

Neural Thermodynamic Laws for Large Language Model Training

Ziming Liu, Yizhou Liu, Jeff Gore et al.

Beyond neural scaling laws, little is known about the laws underlying large language models (LLMs). We introduce Neural Thermodynamic Laws (NTL) -- a new framework that offers fresh insights into LLM training dynamics. On the theoretical side, we demonstrate that key thermodynamic quantities (e.g., temperature, entropy, heat capacity, thermal conduction) and classical thermodynamic principles (e.g., the three laws of thermodynamics and the equipartition theorem) naturally emerge under river-valley loss landscape assumptions. On the practical side, this scientific perspective yields intuitive guidelines for designing learning rate schedules.

neural thermodynamic laws river-valley loss landscape learning rate schedules stochastic processes loss function design
Experimental Physics May 15, 2025

Symbolic Learning of Topological Bands in Photonic Crystals

Ali Ghorashi, Sachin Vaidya, Ziming Liu et al.

Topological photonic crystals (PhCs) that support disorder-resistant modes, protected degeneracies, and robust transport have recently been explored for applications in waveguiding, optical isolation, light trapping, and lasing. However, designing PhCs with prescribed topological properties remains challenging because of the highly nonlinear mapping from the continuous real-space design of PhCs to the discrete output space of band topology. Here, we introduce a machine learning approach to address this problem, employing Kolmogorov--Arnold networks (KANs) to predict and inversely design the band symmetries of two-dimensional PhCs with two-fold rotational (C2) symmetry. We show that a single-hidden-layer KAN, trained on a dataset of C2-symmetric unit cells, achieves high accuracy in classifying the topological classes of the lowest lying bands. We use the symbolic regression capabilities of KANs to extract algebraic formulas that express the topological classes directly in terms of the Fourier components of the dielectric function. These formulas not only retain the full predictive power of the network but also provide novel insights and enable deterministic inverse design. Using this approach, we generate photonic crystals with target topological bands, achieving high accuracy even for high-contrast, experimentally realizable structures beyond the training domain.

kolmogorov-arnold networks interpretability symbolic regression inverse problems topological photonic crystals
Foundational AI May 12, 2025

Symbolic Regression with Multimodal Large Language Models and Kolmogorov Arnold Networks

Thomas R. Harvey, Fabian Ruehle, Kit Fraser-Taliente et al.

We present a novel approach to symbolic regression using vision-capable large language models (LLMs) and the ideas behind Google DeepMind's Funsearch. The LLM is given a plot of a univariate function and tasked with proposing an ansatz for that function. The free parameters of the ansatz are fitted using standard numerical optimisers, and a collection of such ansätze make up the population of a genetic algorithm. Unlike other symbolic regression techniques, our method does not require the specification of a set of functions to be used in regression, but with appropriate prompt engineering, we can arbitrarily condition the generative step. By using Kolmogorov Arnold Networks (KANs), we demonstrate that ``univariate is all you need'' for symbolic regression, and extend this method to multivariate functions by learning the univariate function on each edge of a trained KAN. The combined expression is then simplified by further processing with a language model.

multimodal llm symbolic regression regression kolmogorov-arnold networks automated discovery interpretability
Experimental Physics May 7, 2025

AI-Driven Robotics for Optics

Shiekh Zia Uddin, Sachin Vaidya, Shrish Choudhary et al.

Optics is foundational to research in many areas of science and engineering, including nanophotonics, quantum information, materials science, biomedical imaging, and metrology. However, the design, assembly, and alignment of optical experiments remain predominantly manual, limiting throughput and reproducibility. Automating such experiments is challenging due to the strict, non-negotiable precision requirements and the diversity of optical configurations found in typical laboratories. Here, we introduce a platform that integrates generative artificial intelligence, computer vision, and robotics to automate free-space optical experiments. The platform translates user-defined goals into valid optical configurations, assembles them using a robotic arm, and performs micrometer-scale fine alignment using a robot-deployable tool. It then executes a range of automated measurements, including beam characterization, polarization mapping, and spectroscopy, with consistency surpassing that of human operators. This work demonstrates the first flexible, AI-driven automation platform for optics, offering a path towards remote operation, cloud labs, and high-throughput discovery in the optical sciences.

scientific workflows robotic optical assembly llm optical agent fine-tuning micrometer alignment tool
Astrophysics May 6, 2025

High-redshift Millennium and Astrid galaxies in effective field theory at the field level

James M. Sullivan, Carolina Cuesta-Lazaro, Mikhail M. Ivanov et al.

Effective Field Theory (EFT) modeling is expected to be a useful tool in the era of future higher-redshift galaxy surveys such as DESI-II and Spec-S5 due to its robust description of various large-scale structure tracers. However, large values of EFT bias parameters of higher-redshift galaxies could jeopardize the convergence of the perturbative expansion. In this paper we measure the bias parameters and other EFT coefficients from samples of two types of star-forming galaxies in the state-of-the-art MilleniumTNG and Astrid hydrodynamical simulations. Our measurements are based on the field-level EFT forward model that allows for precision EFT parameter measurements by virtue of cosmic variance cancellation. Specifically, we consider approximately representative samples of Lyman-break galaxies (LBGs) and Lyman-alpha emitters (LAEs) that are consistent with the observed (angular) clustering and number density of these galaxies at $z=3$. Reproducing the linear biases and number densities observed from existing LAE and LBG data, we find quadratic bias parameters that are roughly consistent with those predicted from the halo model coupled with a simple halo occupation distribution model. We also find non-perturbative velocity contributions (Fingers of God) of a similar size for LBGs to the familiar case of Luminous Red Galaxies. However, these contributions are quite small for LAEs despite their large satellite fraction values of up to $\sim 30\%$. Our results indicate that the effective momentum reach $k_{\rm{Max}}$ at $z=3$ for LAEs (LBGs) will be in the range $0.3-0.6 ~h\rm{Mpc}^{-1}$ ($0.2-0.8~h\rm{Mpc}^{-1}$), suggesting that EFT will perform well for high redshift galaxy clustering. This work provides the first step toward obtaining realistic simulation-based priors on EFT parameters for LAEs and LBGs.

effective field theory galaxy bias cosmological simulation halo occupation distribution bayesian inference
Astrophysics May 5, 2025

Variational diffusion transformers for conditional sampling of supernovae spectra

Yunyi Shen, Alexander T. Gagliano

Type Ia Supernovae (SNe Ia) have become the most precise distance indicators in astrophysics due to their incredible observational homogeneity. Increasing discovery rates, however, have revealed multiple sub-populations with spectroscopic properties that are both diverse and difficult to interpret using existing physical models. These peculiar events are hard to identify from sparsely sampled observations and can introduce systematics in cosmological analyses if not flagged early; they are also of broader importance for building a cohesive understanding of thermonuclear explosions. In this work, we introduce DiTSNe-Ia, a variational diffusion-based generative model conditioned on light curve observations and trained to reproduce the observed spectral diversity of SNe Ia. In experiments with realistic light curves and spectra from radiative transfer simulations, DiTSNe-Ia achieves significantly more accurate reconstructions than the widely used SALT3 templates across a broad range of observation phases (from 10 days before peak light to 30 days after it). DiTSNe-Ia yields a mean squared error of 0.108 across all phases-five times lower than SALT3's 0.508-and an after-peak error of just 0.0191, an order of magnitude smaller than SALT3's 0.305. Additionally, our model produces well-calibrated credible intervals with near-nominal coverage, particularly at post-peak phases. DiTSNe-Ia is a powerful tool for rapidly inferring the spectral properties of SNe Ia and other transient astrophysical phenomena for which a physical description does not yet exist.

diffusion models transformers score-based models posterior estimation attention mechanisms
Foundational AI May 5, 2025

Learning simple heuristic rules for classifying materials based on chemical composition

Andrew Ma, Marin Soljačić

In the past decade, there has been a significant interest in the use of machine learning approaches in materials science research. Conventional deep learning approaches that rely on complex, nonlinear models have become increasingly important in computational materials science due to their high predictive accuracy. In contrast to these approaches, we have shown in a recent work that a remarkably simple learned heuristic rule -- based on the concept of topogivity -- can classify whether a material is topological using only its chemical composition. In this paper, we go beyond the topology classification scenario by also studying the use of machine learning to develop simple heuristic rules for classifying whether a material is a metal based on chemical composition. Moreover, we present a framework for incorporating chemistry-informed inductive bias based on the structure of the periodic table. For both the topology classification and the metallicity classification tasks, we empirically characterize the performance of simple heuristic rules fit with and without chemistry-informed inductive bias across a wide range of training set sizes. We find evidence that incorporating chemistry-informed inductive bias can reduce the amount of training data required to reach a given level of test accuracy.

classification interpretability heuristic rule learning topogivity chemistry-informed inductive bias
Foundational AI May 4, 2025

Heterosynaptic Circuits Are Universal Gradient Machines

Liu Ziyin, Isaac Chuang, Tomaso Poggio

We propose a design principle for the learning circuits of the biological brain. The principle states that almost any dendritic weights updated via heterosynaptic plasticity can implement a generalized and efficient class of gradient-based meta-learning. The theory suggests that a broad class of biologically plausible learning algorithms, together with the standard machine learning optimizers, can be grounded in heterosynaptic circuit motifs. This principle suggests that the phenomenology of (anti-) Hebbian (HBP) and heterosynaptic plasticity (HSP) may emerge from the same underlying dynamics, thus providing a unifying explanation. It also suggests an alternative perspective of neuroplasticity, where HSP is promoted to the primary learning and memory mechanism, and HBP is an emergent byproduct. We present simulations that show that (a) HSP can explain the metaplasticity of neurons, (b) HSP can explain the flexibility of the biology circuits, and (c) gradient learning can arise quickly from simple evolutionary dynamics that do not compute any explicit gradient. While our primary focus is on biology, the principle also implies a new approach to designing AI training algorithms and physically learnable AI hardware. Conceptually, our result demonstrates that contrary to the common belief, gradient computation may be extremely easy and common in nature.

heterosynaptic plasticity meta-learning biologically plausible learning loss function design multi-task learning
Foundational AI Apr 25, 2025

Scaling Laws For Scalable Oversight

Joshua Engels, David D. Baek, Subhash Kantamneni et al.

Scalable oversight, the process by which weaker AI systems supervise stronger ones, has been proposed as a key strategy to control future superintelligent systems. However, it is still unclear how scalable oversight itself scales. To address this gap, we propose a framework that quantifies the probability of successful oversight as a function of the capabilities of the overseer and the system being overseen. Specifically, our framework models oversight as a game between capability-mismatched players; the players have oversight-specific Elo scores that are a piecewise-linear function of their general intelligence, with two plateaus corresponding to task incompetence and task saturation. We validate our framework with a modified version of the game Nim and then apply it to four oversight games: Mafia, Debate, Backdoor Code and Wargames. For each game, we find scaling laws that approximate how domain performance depends on general AI system capability. We then build on our findings in a theoretical study of Nested Scalable Oversight (NSO), a process in which trusted models oversee untrusted stronger models, which then become the trusted models in the next step. We identify conditions under which NSO succeeds and derive numerically (and in some cases analytically) the optimal number of oversight levels to maximize the probability of oversight success. We also apply our theory to our four oversight games, where we find that NSO success rates at a general Elo gap of 400 are 13.5% for Mafia, 51.7% for Debate, 10.0% for Backdoor Code, and 9.4% for Wargames; these rates decline further when overseeing stronger systems.

scalable oversight nested scalable oversight scalability double relu model oversight elo
Theoretical Physics Apr 24, 2025

Higher-Spin Currents and Flows in Auxiliary Field Sigma Models

Daniele Bielli, Christian Ferko, Michele Galli et al.

We study local, higher-spin conserved currents in integrable $2d$ sigma models that have been deformed via coupling to auxiliary fields. These currents generate integrability-preserving flows introduced by Smirnov and Zamolodchikov. For auxiliary field (AF) deformations of a free boson, we prove that local spin-$n$ currents exist for all $n$ and give recursion relations that characterize Smirnov-Zamolodchikov (SZ) flows driven by these currents. We then show how to construct spin-$2n$ currents in a unified class of auxiliary field sigma models with common structure -- including AF theories based on the principal chiral model (PCM), its non-Abelian T-dual, (bi-)Yang-Baxter deformations of the PCM, and symmetric space models -- for interaction functions of one variable, and describe SZ flows driven by any function of the stress tensor in these cases. Finally, we give perturbative solutions for spin-$3$ SZ flows in any member of our unified class of AF models with underlying $\mathfrak{su}(3)$ algebra. Part of our analysis shows that the class of AF deformations can be extended by allowing the interaction function to depend on a larger set of variables than has previously been considered.

auxiliary field deformations higher-spin currents conservation laws integrability flows quantum field theory
Foundational AI Apr 23, 2025

I-Con: A Unifying Framework for Representation Learning

Shaden Alshammari, John Hershey, Axel Feldmann et al.

As the field of representation learning grows, there has been a proliferation of different loss functions to solve different classes of problems. We introduce a single information-theoretic equation that generalizes a large collection of modern loss functions in machine learning. In particular, we introduce a framework that shows that several broad classes of machine learning methods are precisely minimizing an integrated KL divergence between two conditional distributions: the supervisory and learned representations. This viewpoint exposes a hidden information geometry underlying clustering, spectral methods, dimensionality reduction, contrastive learning, and supervised learning. This framework enables the development of new loss functions by combining successful techniques from across the literature. We not only present a wide array of proofs, connecting over 23 different approaches, but we also leverage these theoretical results to create state-of-the-art unsupervised image classifiers that achieve a +8% improvement over the prior state-of-the-art on unsupervised classification on ImageNet-1K. We also demonstrate that I-Con can be used to derive principled debiasing methods which improve contrastive representation learners.

representation learning contrastive learning kl divergence unification loss function design self-supervised learning
Foundational AI Apr 22, 2025

High-performance training and inference for deep equivariant interatomic potentials

Chuin Wei Tan, Marc L. Descoteaux, Mit Kotak et al.

Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.

equivariant neural networks interatomic potentials molecular dynamics scalability symmetry preservation
Astrophysics Apr 17, 2025

Darkness in the Crust: Searching for the truly "Dark" Subhalos with Paleo-detectors

Xiuyuan Zhang, Lina Necib, Denis Erkal

Low-mass dark matter (DM) subhalos are pivotal in understanding the small-scale structure of the universe, thereby offering a sensitive method to discriminate between different cosmological models. In this study, we estimate the local number density of cold DM subhalos in the solar neighborhood, and demonstrate that their sparse distribution makes their detection via direct detection experiments highly improbable. However, it is plausible to expect that an $\mathcal{O}(1)$ number of subhalos could be detected by Paleo-detectors, a proposed new technique to look for DM by reading out damage tracks left by past DM interactions in minerals, due to their extended exposure times. Hence, we explore how Paleo-detectors can serve as effective probes for the properties of low-mass subhalos, $\mathcal{O}(10^{-5}-10^8) M_{\odot}$. We find that Paleo-detectors might be able to constrain certain regions of the subhalo mass-concentration relation (for subhalo masses of $10-10^4 M_\odot$ if DM has a mass of $\sim5$GeV). This is a new and complementary type of study that seeks to combine information from the particle nature of DM to that of small scale structures.

dark matter paleo-detectors subhalo mass function mass-concentration relation cosmological simulation
Theoretical Physics Apr 17, 2025

Machine Learning Decoding of Circuit-Level Noise for Bivariate Bicycle Codes

John Blue, Harshil Avlani, Zhiyang He et al.

Fault-tolerant quantum computers will depend crucially on the performance of the classical decoding algorithm which takes in the results of measurements and outputs corrections to the errors inferred to have occurred. Machine learning models have shown great promise as decoders for the surface code; however, this promise has not yet been substantiated for the more challenging task of decoding quantum low-density parity-check (QLDPC) codes. In this paper, we present a recurrent, transformer-based neural network designed to decode circuit-level noise on Bivariate Bicycle (BB) codes, introduced recently by Bravyi et al (Nature 627, 778-782, 2024). For the $[[72,12,6]]$ BB code, at a physical error rate of $p=0.1\%$, our model achieves a logical error rate almost $5$ times lower than belief propagation with ordered statistics decoding (BP-OSD). Moreover, while BP-OSD has a wide distribution of runtimes with significant outliers, our model has a consistent runtime and is an order-of-magnitude faster than the worst-case times from a benchmark BP-OSD implementation. On the $[[144,12,12]]$ BB code, our model obtains worse logical error rates but maintains the speed advantage. These results demonstrate that machine learning decoders can out-perform conventional decoders on QLDPC codes, in regimes of current interest.

quantum computing transformers qldpc decoding attention mechanisms recurrent networks
Foundational AI Apr 16, 2025

Learning Topological Invariance

James Halverson, Fabian Ruehle

Two geometric spaces are in the same topological class if they are related by certain geometric deformations. We propose machine learning methods that automate learning of topological invariance and apply it in the context of knot theory, where two knots are equivalent if they are related by ambient space isotopy. Specifically, given only the knot and no information about its topological invariants, we employ contrastive and generative machine learning techniques to map different representatives of the same knot class to the same point in an embedding vector space. An auto-regressive decoder Transformer network can then generate new representatives from the same knot class. We also describe a student-teacher setup that we use to interpret which known knot invariants are learned by the neural networks to compute the embeddings, and observe a strong correlation with the Goeritz matrix in all setups that we tested. We also develop an approach to resolving the Jones Unknot Conjecture by exploring the vicinity of the embedding space of the Jones polynomial near the locus where the unknots cluster, which we use to generate braid words with simple Jones polynomials.

topological invariance learning contrastive learning embeddings representation learning transformers
Astrophysics Apr 9, 2025

Rapid inference and comparison of gravitational-wave population models with neural variational posteriors

Matthew Mould, Noah E. Wolfe, Salvatore Vitale

The LIGO-Virgo-KAGRA catalog has been analyzed with an abundance of different population models due to theoretical uncertainty in the formation of gravitational-wave sources. To expedite model exploration, we introduce an efficient and accurate variational Bayesian approach that learns the population posterior with a normalizing flow and serves as a drop-in replacement for existing samplers. With hardware acceleration, inference takes just seconds for the current set of black-hole mergers and readily scales to larger catalogs. The trained posteriors provide an arbitrary number of independent samples with exact probability densities, unlike established stochastic sampling algorithms, while requiring up to three orders of magnitude fewer likelihood evaluations and as few as $\mathcal{O}(10^3)$. Provided the posterior support is covered, discrepancies can be addressed with smoothed importance sampling, which quantifies a goodness-of-fit metric for the variational approximation while also estimating the evidence for Bayesian model selection. Neural variational inference thus enables interactive development, analysis, and comparison of population models, making it a useful tool for astrophysical interpretation of current and future gravitational-wave observations.

variational population inference normalizing flows bayesian inference gravitational waves posterior estimation
Astrophysics Apr 9, 2025

Bayesian Component Separation for DESI LAE Automated Spectroscopic Redshifts and Photometric Targeting

Ana Sofía M. Uzsoy, Andrew K. Saydjari, Arjun Dey et al.

Lyman Alpha Emitters (LAEs) are valuable high-redshift cosmological probes traditionally identified using specialized narrow-band photometric surveys. In ground-based spectroscopy, it can be difficult to distinguish the sharp LAE peak from residual sky emission lines using automated methods, leading to misclassified redshifts. We present a Bayesian spectral component separation technique to automatically determine spectroscopic redshifts for LAEs while marginalizing over sky residuals. We use visually inspected spectra of LAEs obtained using the Dark Energy Spectroscopic Instrument (DESI) to create a data-driven prior and can determine redshift by jointly inferring sky residual, LAE, and residual components for each individual spectrum. We demonstrate this method on 910 spectroscopically observed $z = 2-4$ DESI LAE candidate spectra and determine their redshifts with $>$90% accuracy when validated against visually inspected redshifts. Using the $Δχ^2$ value from our pipeline as a proxy for detection confidence, we then explore potential survey design choices and implications for targeting LAEs with medium-band photometry. This method allows for scalability and accuracy in determining redshifts from DESI spectra, and the results provide recommendations for LAE targeting in anticipation of future high-redshift spectroscopic surveys.

bayesian inference posterior estimation sky residual marginalization signal detection spectral methods
Astrophysics Apr 8, 2025

On Soft Clustering For Correlation Estimators

Edward Berman, Sneh Pandya, Jacqueline McCleary et al.

Properly estimating correlations between objects at different spatial scales necessitates $\mathcal{O}(n^2)$ distance calculations. For this reason, most widely adopted packages for estimating correlations use clustering algorithms to approximate local trends. However, methods for quantifying the error introduced by this clustering have been understudied. In response, we present an algorithm for estimating correlations that is probabilistic in the way that it clusters objects, enabling us to quantify the uncertainty caused by clustering simply through model inference. These soft clustering assignments enable correlation estimators that are theoretically differentiable with respect to their input catalogs. Thus, we also build a theoretical framework for differentiable correlation functions and describe their utility in comparison to existing surrogate models. Notably, we find that repeated normalization and distance function calls slow gradient calculations and that sparse Jacobians destabilize precision, pointing towards either approximate or surrogate methods as a necessary solution to exact gradients from correlation functions. To that end, we close with a discussion of surrogate models as proxies for correlation functions. We provide an example that demonstrates the efficacy of surrogate models to enable gradient-based optimization of astrophysical model parameters, successfully minimizing a correlation function output. Our numerical experiments cover science cases across cosmology, from point spread function (PSF) modeling efforts to gravitational simulations to galaxy intrinsic alignment (IA).

clustering uncertainty quantification soft clustering assignments surrogate modeling differentiable correlation functions
Theoretical Physics Apr 7, 2025

Quantum Mechanics and Neural Networks

Christian Ferko, James Halverson

We demonstrate that any Euclidean-time quantum mechanical theory may be represented as a neural network, ensured by the Kosambi-Karhunen-Loève theorem, mean-square path continuity, and finite two-point functions. The additional constraint of reflection positivity, which is related to unitarity, may be achieved by a number of mechanisms, such as imposing neural network parameter space splitting or the Markov property. Non-differentiability of the networks is related to the appearance of non-trivial commutators. Neural networks acting on Markov processes are no longer Markov, but still reflection positive, which facilitates the definition of deep neural network quantum systems. We illustrate these principles in several examples using numerical implementations, recovering classic quantum mechanical results such as Heisenberg uncertainty, non-trivial commutators, and the spectrum.

reflection positivity stochastic processes neural network field theory spectral methods quantum field theory
Astrophysics Apr 7, 2025

IAEmu: Learning Galaxy Intrinsic Alignment Correlations

Sneh Pandya, Yuanyuan Yang, Nicholas Van Alfen et al.

The intrinsic alignments (IA) of galaxies, a key contaminant in weak lensing analyses, arise from correlations in galaxy shapes driven by tidal interactions and galaxy formation processes. Accurate IA modeling is essential for robust cosmological inference, but current approaches rely on perturbative methods that break down on nonlinear scales or on expensive simulations. We introduce IAEmu, a neural network-based emulator that predicts the galaxy position-position ($ξ$), position-orientation ($ω$), and orientation-orientation ($η$) correlation functions and their uncertainties using mock catalogs based on the halo occupation distribution (HOD) framework. Compared to simulations, IAEmu achieves ~3% average error for $ξ$ and ~5% for $ω$, while capturing the stochasticity of $η$ without overfitting. The emulator provides both aleatoric and epistemic uncertainties, helping identify regions where predictions may be less reliable. We also demonstrate generalization to non-HOD alignment signals by fitting to IllustrisTNG hydrodynamical simulation data. As a fully differentiable neural network, IAEmu enables $\sim$10,000$\times$ speed-ups in mapping HOD parameters to correlation functions on GPUs, compared to CPU-based simulations. This acceleration facilitates inverse modeling via gradient-based sampling, making IAEmu a powerful surrogate model for galaxy bias and IA studies with direct applications to Stage IV weak lensing surveys.

surrogate modeling emulation galaxy intrinsic alignment uncertainty quantification halo occupation distribution
Foundational AI Apr 3, 2025

Do Two AI Scientists Agree?

Xinghong Fu, Ziming Liu, Max Tegmark

When two AI models are trained on the same scientific task, do they learn the same theory or two different theories? Throughout history of science, we have witnessed the rise and fall of theories driven by experimental validation or falsification: many theories may co-exist when experimental data is lacking, but the space of survived theories become more constrained with more experimental data becoming available. We show the same story is true for AI scientists. With increasingly more systems provided in training data, AI scientists tend to converge in the theories they learned, although sometimes they form distinct groups corresponding to different theories. To mechanistically interpret what theories AI scientists learn and quantify their agreement, we propose MASS, Hamiltonian-Lagrangian neural networks as AI Scientists, trained on standard problems in physics, aggregating training results across many seeds simulating the different configurations of AI scientists. Our findings suggests for AI scientists switch from learning a Hamiltonian theory in simple setups to a Lagrangian formulation when more complex systems are introduced. We also observe strong seed dependence of the training dynamics and final learned weights, controlling the rise and fall of relevant theories. We finally demonstrate that not only can our neural networks aid interpretability, it can also be applied to higher dimensional problems.

hamiltonian systems lagrangian methods ai scientist agreement interpretability physics-informed neural networks
Astrophysics Mar 27, 2025

The Type I Superluminous Supernova Catalogue II: Spectroscopic Evolution in the Photospheric Phase, Velocity Measurements, and Constraints on Diversity

Aysha Aamer, Matt Nicholl, Sebastian Gomez et al.

Hydrogen-poor superluminous supernovae (SLSNe) are among the most energetic explosions in the universe, reaching luminosities up to 100 times greater than those of normal supernovae. Detailed spectral analysis hold the potential to reveal their progenitors and underlying energy sources. This paper presents the largest compilation of SLSN photospheric spectra to date, encompassing data from ePESSTO+, the FLEET search and all published spectra up to December 2022. The dataset includes a total of 974 spectra of 234 SLSNe. By constructing average phase binned spectra, we find SLSNe initially exhibit high temperatures (10000 to 11000 K), with blue continua and weak lines. A rapid transformation follows, as temperatures drop to 5000 to 6000 K by 40 days post peak, leading to stronger P-Cygni features. These averages also suggest a fraction of SLSNe may contain some He at explosion. Variance within the dataset is slightly reduced when defining the phase of spectra relative to explosion, rather than peak, and normalising to the population's median e-folding time. Principal Component Analysis (PCA) supports this, requiring fewer components to explain the same level of variation when binning data by scaled days from explosion, suggesting a more homogeneous grouping. Using PCA and K-Means clustering, we identify outlying objects with unusual spectroscopic evolution and evidence for energy input from interaction, but find not support for groupings of two or more statistically significant subpopulations. We find Fe II λ5169 lines velocities closely track the radius implied from blackbody fits, indicating formation near the photosphere. We also confirm a correlation between velocity and velocity gradient, which can be explained if all SLSNe are in homologous expansion but with different scale velocities. This behaviour aligns with expectations for an internal powering mechanism.

supernova classification spectral time-series analysis dimensionality reduction magnetar central engine clustering
Theoretical Physics Mar 15, 2025

Extracting the distribution amplitude of pseudoscalar mesons using the HOPE method

S. -P. Alex Chang, William Detmold, Anthony V. Grebe et al.

The pseudoscalar meson light-cone distribution amplitudes (LCDAs) are essential non-perturbative inputs for a range of high-energy exclusive processes in quantum chromodynamics. In this proceedings, progress towards a determination of the low Mellin moments of the pion and kaon LCDAs by the HOPE Collaboration is reported.

hope method light-cone distribution amplitudes lattice qcd effective field theory mellin moments
Foundational AI Mar 14, 2025

Generative Modeling for Mathematical Discovery

Jordan S. Ellenberg, Cristofero S. Fraser-Taliente, Thomas R. Harvey et al.

We present a new implementation of the LLM-driven genetic algorithm {\it funsearch}, whose aim is to generate examples of interest to mathematicians and which has already had some success in problems in extremal combinatorics. Our implementation is designed to be useful in practice for working mathematicians; it does not require expertise in machine learning or access to high-performance computing resources. Applying {\it funsearch} to a new problem involves modifying a small segment of Python code and selecting a large language model (LLM) from one of many third-party providers. We benchmarked our implementation on three different problems, obtaining metrics that may inform applications of {\it funsearch} to new problems. Our results demonstrate that {\it funsearch} successfully learns in a variety of combinatorial and number-theoretic settings, and in some contexts learns principles that generalize beyond the problem originally trained on.

llm-driven genetic algorithm automated discovery program synthesis generative models combinatorial optimization
Theoretical Physics Mar 13, 2025

No-go theorem for environment-assisted invariance in non-unitary dynamics

Akira Sone, Akram Touil, Kenji Maeda et al.

We elucidate the requirements for quantum operations that achieve environment-assisted invariance (envariance), a symmetry of entanglement. While envariance has traditionally been studied within the framework of local unitary operations, we extend the analysis to consider non-unitary local operations. First, we investigate the conditions imposed on operators acting on pure bipartite entanglement to attain envariance. We show that the local operations must take a direct-sum form in their Kraus operator representations, establishing decoherence-free subspaces. Furthermore, we prove that this also holds for the multipartite scenario. As an immediate consequence, we demonstrate that environment-assisted shortcuts to adiabaticity cannot be achieved through non-unitary operations. In addition, we show that the static condition of the eternal black hole in AdS/CFT is violated when the CFTs are coupled to the external baths.

envariance quantum states entanglement decoherence-free subspaces open quantum systems
Foundational AI Mar 10, 2025

Denoising Hamiltonian Network for Physical Reasoning

Congyue Deng, Brandon Y. Feng, Cecilia Garraffo et al.

Machine learning frameworks for physical problems must capture and enforce physical constraints that preserve the structure of dynamical systems. Many existing approaches achieve this by integrating physical operators into neural networks. While these methods offer theoretical guarantees, they face two key limitations: (i) they primarily model local relations between adjacent time steps, overlooking longer-range or higher-level physical interactions, and (ii) they focus on forward simulation while neglecting broader physical reasoning tasks. We propose the Denoising Hamiltonian Network (DHN), a novel framework that generalizes Hamiltonian mechanics operators into more flexible neural operators. DHN captures non-local temporal relationships and mitigates numerical integration errors through a denoising mechanism. DHN also supports multi-system modeling with a global conditioning mechanism. We demonstrate its effectiveness and flexibility across three diverse physical reasoning tasks with distinct inputs and outputs.

hamiltonian systems neural operators denoising physical trajectories diffusion models conservation laws
Astrophysics Mar 7, 2025

Decoding the Galactic Twirl: The Downfall of Milky Way-mass Galaxies Rotation Curves in the FIRE Simulations

Xiaowei Ou, Lina Necib, Andrew Wetzel et al.

Recent measurements of the Milky Way rotation curve found a sharp decline at around $15$-$20$ kpc from the center of the Galaxy, suggesting that the Galactic dark matter halo is much less massive than predicted by other dynamical tracers. To address this tension, we study the validity of the assumptions made in calculating the Milky Way's rotation curve. To do so, we apply Jeans' equation, the current standard approach of measuring rotation curves, to three cosmological zoom-in simulations of Milky Way-like galaxies from the FIRE-2 Latte suite. Using synthetic Gaia surveys, we replicate the sample selection process and calculation employed in measuring the Milky Way rotation curve. We examine four failure modes of this calculation and find that the measured curves deviate from the true curve by $5$-$20\%$ rather than below $5\%$, as estimated by previous works. Interestingly, there is a large galaxy-to-galaxy variance, and different systematics dominate different galaxies. We rederive the Milky Way's dark matter density profile with the rotation curve while incorporating systematics from the simulations. The posterior distribution of the density profiles is consistent with a fiducial NFW profile when assuming a gNFW profile for dark matter. We find that the virial mass, $7.32^{+1.98}_{-1.53}\times10^{11}~M_{\odot}$, consistent with other probes of the Milky Way's mass. However, we recommend that the field moves away from relying solely on the rotation curve when studying the dark matter profile, and adopts methods that incorporate additional probes and/or do not heavily depend on assumptions described in this study.

dark matter jeans equation cosmological simulation rotation curve systematics uncertainty quantification
Foundational AI Mar 7, 2025

Diffusion Models for Cayley Graphs

Michael R. Douglas, Kit Fraser-Taliente

We review the problem of finding paths in Cayley graphs of groups and group actions, using the Rubik's cube as an example, and we list several more examples of significant mathematical interest. We then show how to formulate these problems in the framework of diffusion models. The exploration of the graph is carried out by the forward process, while finding the target nodes is done by the inverse backward process. This systematizes the discussion and suggests many generalizations. To improve exploration, we propose a ``reversed score'' ansatz which substantially improves over previous comparable algorithms.

diffusion models group theory cayley graph navigation reversed score ansatz score-based models
Foundational AI Mar 6, 2025

L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling

Zhuo Chen, Oriol Mayné i Comas, Zhuotao Jin et al.

We present a universal theoretical framework for understanding long-context language modeling based on a bipartite mutual information scaling law that we rigorously verify in natural language. We demonstrate that bipartite mutual information captures multi-token interactions distinct from and scaling independently of conventional two-point mutual information, and show that this provides a more complete characterization of the dependencies needed for accurately modeling long sequences. Leveraging this scaling law, we formulate the Long-context Language Modeling (L$^2$M) condition, which lower bounds the necessary scaling of a model's history state -- the latent variables responsible for storing past information -- for effective long-context modeling. We validate the framework and its predictions on transformer and state-space models. Our work provides a principled foundation to understand long-context modeling and to design more efficient architectures with stronger long-context capabilities, with potential applications beyond natural language.

bipartite mutual information scaling long-context language modeling history state capacity transformers scalability
Astrophysics Mar 5, 2025

Trial by FIRE: Probing the dark matter density profile of dwarf galaxies with GraphNPE

Tri Nguyen, Justin Read, Lina Necib et al.

The Dark Matter (DM) distribution in dwarf galaxies provides crucial insights into both structure formation and the particle nature of DM. GraphNPE (Graph Neural Posterior Estimator), first introduced in Nguyen et al. (2023), is a novel simulation-based inference framework that combines graph neural networks and normalizing flows to infer the DM density profile from line-of-sight stellar velocities. Here, we apply GraphNPE to satellite dwarf galaxies in the FIRE-2 Latte simulation suite of Milky Way-mass halos, testing it against both Cold and Self-Interacting DM scenarios. Our method demonstrates superior precision compared to conventional Jeans-based approaches, recovering DM density profiles to within the 95% confidence level even in systems with as few as 30 tracers. Moreover, we present the first evaluation of mass modeling methods in constraining two key parameters from realistic simulations: the peak circular velocity, $V_\mathrm{max}$, and the peak virial mass, $M_\mathrm{200m}^\mathrm{peak}$. Using only line-of-sight velocities, GraphNPE can reliably recover both $V_\mathrm{max}$ and $M_\mathrm{200m}^\mathrm{peak}$ within our quoted uncertainties, including those experiencing tidal effects ($\gtrsim$ 63% of systems are recovered with our 68% confidence intervals and $\gtrsim$ 92% within our 95% confidence intervals). The method achieves 10-20% accuracy in $V_\mathrm{max}$ recovery, while $M_\mathrm{200m}^\mathrm{peak}$ is recovered to 0.1-0.4 dex accuracy. This work establishes GraphNPE as a robust tool for inferring DM density profiles in dwarf galaxies, offering promising avenues for constraining DM models. The framework's potential extends beyond this study, as it can be adapted to non-spherical and disequilibrium models, showcasing the broader utility of simulation-based inference and graph-based learning in astrophysics.

simulation-based inference dark matter graph neural networks normalizing flows posterior estimation
Astrophysics Mar 4, 2025

A Deep, High-Angular Resolution 3D Dust Map of the Southern Galactic Plane

Catherine Zucker, Andrew K. Saydjari, Joshua S. Speagle et al.

We present a deep, high-angular resolution 3D dust map of the southern Galactic plane over $239^\circ < \ell < 6^\circ$ and $|b| < 10^\circ$ built on photometry from the DECaPS2 survey, in combination with photometry from VVV, 2MASS, and unWISE and parallaxes from Gaia DR3 where available. To construct the map, we first infer the distance, extinction, and stellar types of over 700 million stars using the brutus stellar inference framework with a set of theoretical MIST stellar models. Our resultant 3D dust map has an angular resolution of $1'$, roughly an order of magnitude finer than existing 3D dust maps and comparable to the angular resolution of the Herschel 2D dust emission maps. We detect complexes at the range of distances associated with the Sagittarius-Carina and Scutum-Centaurus arms in the fourth quadrant, as well as more distant structures out to a maximum reliable distance of $d \approx$ 10 kpc from the Sun. The map is sensitive up to a maximum extinction of roughly $A_V \approx 12$ mag. We publicly release both the stellar catalog and the 3D dust map, the latter of which can easily be queried via the Python package dustmaps. When combined with the existing Bayestar19 3D dust map of the northern sky, the DECaPS 3D dust map fills in the missing piece of the Galactic plane, enabling extinction corrections over the entire disk $|b| < 10^\circ$. Our map serves as a pathfinder for the future of 3D dust mapping in the era of LSST and Roman, targeting regimes accessible with deep optical and near-infrared photometry but often inaccessible with Gaia.

3d dust mapping bayesian inference interstellar extinction posterior estimation stellar evolution
Astrophysics Feb 26, 2025

Evidence for an Instability-Induced Binary Merger in the Double-Peaked, Helium-Rich Type IIn Supernova 2023zkd

A. Gagliano, V. A. Villar, T. Matsumoto et al.

We present ultraviolet to infrared observations of the extraordinary Type IIn supernova 2023zkd (SN 2023zkd). Photometrically, it exhibits persistent and luminous precursor emission spanning $\sim$4 years preceding discovery ($M_r\approx-15$ mag, 1,500~days in the observer frame), followed by a secondary stage of gradual brightening in its final year. Post-discovery, it exhibits two photometric peaks of comparable brightness ($M_r\lesssim-18.7$ mag and $M_r\approx-18.4$ mag, respectively) separated by 240 days. Spectroscopically, SN 2023zkd exhibits highly asymmetric and multi-component Balmer and He I profiles that we attribute to ejecta interaction with fast-moving ($1,\!000-2,\!000\;\mathrm{km}\;\mathrm{s}^{-1}$) He-rich polar material and slow-moving ($\sim$$400\;\mathrm{km}\;\mathrm{s}^{-1}$) equatorially-distributed H-rich material. He II features also appear during the second light curve peak and evolve rapidly. Shock-driven models fit to the multi-band photometry suggest that the event is powered by interaction with $\sim$$5-6\;M_{\odot}$ of CSM, with $2-3\;M_{\odot}$ associated with each light curve peak, expelled during mass-loss episodes $\sim$$3-4$ and $\sim$$1-2$ years prior to explosion. The observed precursor emission, combined with the extreme mass-loss rates required to power each light curve peak, favors either super-Eddington accretion onto a black hole or multiple long-lived eruptions from a massive star to luminosities that have not been previously observed. We consider multiple progenitor scenarios for SN 2023zkd, and find that the brightening optical precursor and inferred explosion properties are most consistent with a massive ($M_{\mathrm{ZAMS}}\geq30\;M_{\odot}$) and partially-stripped He star undergoing an instability-induced merger with a black hole companion.

circumstellar material interaction stellar evolution binary merger progenitor supernova classification precursor emission
Astrophysics Feb 24, 2025

Seeing the Outer Edge of the Infant Type Ia Supernova 2024epr in the Optical and Near Infrared

W. B. Hoogendam, D. O. Jones, C. Ashall et al.

We present optical-to-near-infrared (NIR) photometry and spectroscopy of the Type Ia supernova (SN Ia) 2024epr, including NIR spectra observed within two days of first light. The early-time optical spectra show strong, high-velocity Ca and Si features near rarely-observed velocities at $\sim$0.1$c$, and the NIR spectra show a C I "knee." Despite early-time, high-velocity features, SN 2024epr evolves into a normal SN Ia, albeit with stronger peak-light Ca absorption than other SNe Ia with the same light curve shape. Although we infer a normal decline rate, $Δm_{15}(B)=1.09\pm0.12$ mag, from the light-curve rise, SN 2024epr is a Branch "cool" object and has red early-time colors ($g-r\approx0.15$ mag at $-10$ days). The high velocities point to a density enhancement in the outer layers of the explosion, predicted by some models, but thick-shell He-detonation models do not match the smoothly rising light curve or apparent lack of He in our early-time NIR spectra. No current models (e.g., delayed detonation or thin He shell double detonation) appear to reproduce all observed properties, particularly the unusual early-time colors. Such constraints are only possible for SN 2024epr from the earliest optical and NIR observations, highlighting their importance for constraining SN Ia models. Finally, we identify several literature SNe Ia with intermediate mass elements at $\sim$30\,000 km s$^{-1}$ within days after the explosion that evolve into otherwise normal SNe Ia at peak light, suggesting the early-time spectra of SNe Ia may hide a broad diversity of observational characteristics.

supernova classification high-velocity ejecta early-phase spectroscopy thermonuclear detonation model validation
Experimental Physics Feb 21, 2025

Anomaly preserving contrastive neural embeddings for end-to-end model-independent searches at the LHC

Kyle Metzger, Lana Xu, Mia Sodini et al.

Anomaly detection - identifying deviations from Standard Model predictions - is a key challenge at the Large Hadron Collider due to the size and complexity of its datasets. This is typically addressed by transforming high-dimensional detector data into lower-dimensional, physically meaningful features. We tackle feature extraction for anomaly detection by learning powerful low-dimensional representations via contrastive neural embeddings. This approach preserves potential anomalies indicative of new physics and enables rare signal extraction using novel machine learning-based statistical methods for signal-independent hypothesis testing. We compare supervised and self-supervised contrastive learning methods, for both MLP- and Transformer-based neural embeddings, trained on the kinematic observables of physics objects in LHC collision events. The learned embeddings serve as input representations for signal-agnostic statistical detection methods in inclusive final states. We achieve significant improvement in discovery power for both rare new physics signals and rare Standard Model processes across diverse final states, demonstrating its applicability for efficiently searching for diverse signals simultaneously. We study the impact of architectural choices, contrastive loss formulations, supervision levels, and embedding dimensionality on anomaly detection performance. We show that the optimal representation for background classification does not always maximize sensitivity to new physics signals, revealing an inherent trade-off between background structure preservation and anomaly enhancement. We demonstrate that combining compression with domain knowledge for label encoding produces the most effective data representation for statistical discovery of anomalies.

contrastive learning anomaly detection representation learning embeddings new physics searches
Experimental Physics Feb 19, 2025

Isolating Unisolated Upsilons with Anomaly Detection in CMS Open Data

Rikab Gambhir, Radha Mastandrea, Benjamin Nachman et al.

We present the first study of anti-isolated Upsilon decays to two muons ($Υ\to μ^+ μ^-$) in proton-proton collisions at the Large Hadron Collider. Using a machine learning (ML)-based anomaly detection strategy, we "rediscover" the $Υ$ in 13 TeV CMS Open Data from 2016, despite overwhelming anti-isolated backgrounds. We elevate the signal significance to $6.4 σ$ using these methods, starting from $1.6 σ$ using the dimuon mass spectrum alone. Moreover, we demonstrate improved sensitivity from using an ML-based estimate of the multi-feature likelihood compared to traditional "cut-and-count" methods. Our work demonstrates that it is possible and practical to find real signals in experimental collider data using ML-based anomaly detection, and we distill a readily-accessible benchmark dataset from the CMS Open Data to facilitate future anomaly detection developments.

anomaly detection collider physics anti-isolated quarkonia density estimation likelihood ratio
Foundational AI Feb 17, 2025

On the Learnability of Knot Invariants: Representation, Predictability, and Neural Similarity

Audrey Lindsay, Fabian Ruehle

We analyze different aspects of neural network predictions of knot invariants. First, we investigate the impact of different knot representations on the prediction of invariants and find that braid representations work in general the best. Second, we study which knot invariants are easy to learn, with invariants derived from hyperbolic geometry and knot diagrams being very easy to learn, while invariants derived from topological or homological data are harder. Predicting the Arf invariant could not be learned for any representation. Third, we propose a cosine similarity score based on gradient saliency vectors, and a joint misclassification score to uncover similarities in neural networks trained to predict related topological invariants.

representation learning knot invariant learnability classification interpretability gradient saliency similarity
Foundational AI Feb 17, 2025

Interpretable Machine Learning for Kronecker Coefficients

Giorgi Butbaia, Kyu-Hwan Lee, Fabian Ruehle

We analyze the saliency of neural networks and employ interpretable machine learning models to predict whether the Kronecker coefficients of the symmetric group are zero or not. Our models use triples of partitions as input features, as well as b-loadings derived from the principal component of an embedding that captures the differences between partitions. Across all approaches, we achieve an accuracy of approximately 83% and derive explicit formulas for a decision function in terms of b-loadings. Additionally, we develop transformer-based models for prediction, achieving the highest reported accuracy of over 99%.

interpretability kronecker coefficients classification kolmogorov-arnold networks transformers
Foundational AI Feb 15, 2025

Bridging the Sim-to-Real Gap for Athletic Loco-Manipulation

Nolan Fey, Gabriel B. Margolis, Martin Peticco et al.

Achieving athletic loco-manipulation on robots requires moving beyond traditional tracking rewards - which simply guide the robot along a reference trajectory - to task rewards that drive truly dynamic, goal-oriented behaviors. Commands such as "throw the ball as far as you can" or "lift the weight as quickly as possible" compel the robot to exhibit the agility and power inherent in athletic performance. However, training solely with task rewards introduces two major challenges: these rewards are prone to exploitation (reward hacking), and the exploration process can lack sufficient direction. To address these issues, we propose a two-stage training pipeline. First, we introduce the Unsupervised Actuator Net (UAN), which leverages real-world data to bridge the sim-to-real gap for complex actuation mechanisms without requiring access to torque sensing. UAN mitigates reward hacking by ensuring that the learned behaviors remain robust and transferable. Second, we use a pre-training and fine-tuning strategy that leverages reference trajectories as initial hints to guide exploration. With these innovations, our robot athlete learns to lift, throw, and drag with remarkable fidelity from simulation to reality.

reinforcement learning sim-to-real transfer fine-tuning transfer learning actuator dynamics modeling
Foundational AI Feb 15, 2025

LEAPS: A discrete neural sampler via locally equivariant networks

Peter Holderrieth, Michael S. Albergo, Tommi Jaakkola

We propose "LEAPS", an algorithm to sample from discrete distributions known up to normalization by learning a rate matrix of a continuous-time Markov chain (CTMC). LEAPS can be seen as a continuous-time formulation of annealed importance sampling and sequential Monte Carlo methods, extended so that the variance of the importance weights is offset by the inclusion of the CTMC. To derive these importance weights, we introduce a set of Radon-Nikodym derivatives of CTMCs over their path measures. Because the computation of these weights is intractable with standard neural network parameterizations of rate matrices, we devise a new compact representation for rate matrices via what we call "locally equivariant" functions. To parameterize them, we introduce a family of locally equivariant multilayer perceptrons, attention layers, and convolutional networks, and provide an approach to make deep networks that preserve the local equivariance. This property allows us to propose a scalable training algorithm for the rate matrix such that the variance of the importance weights associated to the CTMC are minimal. We demonstrate the efficacy of LEAPS on problems in statistical physics.

locally equivariant networks discrete measure transport stochastic processes monte carlo methods equivariant neural networks
Foundational AI Feb 14, 2025

Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective

Zeyu Jia, Alexander Rakhlin, Tengyang Xie

As large language models have evolved, it has become crucial to distinguish between process supervision and outcome supervision -- two key reinforcement learning approaches to complex reasoning tasks. While process supervision offers intuitive advantages for long-term credit assignment, the precise relationship between these paradigms has remained an open question. Conventional wisdom suggests that outcome supervision is fundamentally more challenging due to the trajectory-level coverage problem, leading to significant investment in collecting fine-grained process supervision data. In this paper, we take steps towards resolving this debate. Our main theorem shows that, under standard data coverage assumptions, reinforcement learning through outcome supervision is no more statistically difficult than through process supervision, up to polynomial factors in horizon. At the core of this result lies the novel Change of Trajectory Measure Lemma -- a technical tool that bridges return-based trajectory measure and step-level distribution shift. Furthermore, for settings with access to a verifier or a rollout capability, we prove that any policy's advantage function can serve as an optimal process reward model, providing a direct connection between outcome and process supervision. These findings suggest that the empirically observed performance gap -- if any -- between outcome and process supervision likely stems from algorithmic limitations rather than inherent statistical difficulties, potentially transforming how we approach data collection and algorithm design for reinforcement learning.

reinforcement learning outcome vs process supervision reward optimization process reward modeling credit assignment
Theoretical Physics Feb 14, 2025

Machine learning the vanishing order of rational L-functions

Joanna Bieri, Giorgi Butbaia, Edgar Costa et al.

In this paper, we study the vanishing order of rational $L$-functions from a data scientific perspective. Each $L$-function is represented in our data by finitely many Dirichlet coefficients, the normalisation of which depends on the context. We observe murmuration-like patterns in averages across our dataset, find that PCA clusters rational $L$-functions by their vanishing order, and record that LDA and neural networks may accurately predict this quantity.

l-function vanishing order classification arithmetic l-functions dimensionality reduction clustering
Theoretical Physics Feb 12, 2025

Mathematical Data Science

Michael R. Douglas, Kyu-Hwan Lee

Can machine learning help discover new mathematical structures? In this article we discuss an approach to doing this which one can call "mathematical data science". In this paradigm, one studies mathematical objects collectively rather than individually, by creating datasets and doing machine learning experiments and interpretations. After an overview, we present two case studies: murmurations in number theory and loadings of partitions related to Kronecker coefficients in representation theory and combinatorics.

mathematical data science automated discovery interpretability murmurations scientific workflows
Foundational AI Feb 10, 2025

Debiasing Guidance for Discrete Diffusion with Sequential Monte Carlo

Cheuk Kit Lee, Paul Jeha, Jes Frellsen et al.

Discrete diffusion models are a class of generative models that produce samples from an approximated data distribution within a discrete state space. Often, there is a need to target specific regions of the data distribution. Current guidance methods aim to sample from a distribution with mass proportional to $p_0(x_0) p(ζ|x_0)^α$ but fail to achieve this in practice. We introduce a Sequential Monte Carlo algorithm that generates unbiasedly from this target distribution, utilising the learnt unconditional and guided process. We validate our approach on low-dimensional distributions, controlled images and text generations. For text generation, our method provides strong control while maintaining low perplexity compared to guidance-based approaches.

diffusion models monte carlo methods discrete diffusion guidance stochastic processes importance weighting
Foundational AI Feb 7, 2025

Parameter Symmetry Potentially Unifies Deep Learning Theory

Liu Ziyin, Yizhou Xu, Tomaso Poggio et al.

The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts akin to phase transitions observed in physical systems. While these phenomena hold promise for uncovering the mechanisms behind neural networks and language models, existing theories remain fragmented, addressing specific cases. In this position paper, we advocate for the crucial role of the research direction of parameter symmetries in unifying these fragmented theories. This position is founded on a centralizing hypothesis for this direction: parameter symmetry breaking and restoration are the unifying mechanisms underlying the hierarchical learning behavior of AI models. We synthesize prior observations and theories to argue that this direction of research could lead to a unified understanding of three distinct hierarchies in neural networks: learning dynamics, model complexity, and representation formation. By connecting these hierarchies, our position paper elevates symmetry -- a cornerstone of theoretical physics -- to become a potential fundamental principle in modern AI.

symmetry breaking parameter symmetry restoration phase transitions group theory hierarchical learning dynamics
Foundational AI Feb 3, 2025

Harmonic Loss Trains Interpretable AI Models

David D. Baek, Ziming Liu, Riya Tyagi et al.

In this paper, we introduce harmonic loss as an alternative supervisory signal for training neural networks and large language models (LLMs). Harmonic loss differs from standard cross-entropy loss by (a) replacing the usual SoftMax normalization with a scale-invariant HarMax function and (b) computing logits via Euclidean distance rather than a dot product. Harmonic loss enables improved interpretability and faster convergence, owing to its scale invariance and finite convergence point by design, which can be interpreted as a class center. We first validate the performance of harmonic models across algorithmic, vision, and language datasets. Through extensive experiments, we demonstrate that models trained with harmonic loss perform better than standard models by: (a) enhancing interpretability, (b) requiring less data for generalization, and (c) reducing grokking. Moreover, we compare a GPT-2 model trained with harmonic loss to the standard GPT-2, illustrating that the harmonic model develops more interpretable representations. Looking forward, we believe harmonic loss may become a valuable tool in domains with limited data availability or in high-stakes applications where interpretability and reliability are paramount, paving the way for more robust and efficient neural network models.

loss function design interpretability harmonic loss representation learning embeddings
Astrophysics Feb 3, 2025

A Poisson Process AutoDecoder for X-ray Sources

Yanke Song, Victoria Ashley Villar, Juan Rafael Martinez-Galarza et al.

X-ray observing facilities, such as the Chandra X-ray Observatory and the eROSITA, have detected millions of astronomical sources associated with high-energy phenomena. The arrival of photons as a function of time follows a Poisson process and can vary by orders-of-magnitude, presenting obstacles for common tasks such as source classification, physical property derivation, and anomaly detection. Previous work has either failed to directly capture the Poisson nature of the data or only focuses on Poisson rate function reconstruction. In this work, we present Poisson Process AutoDecoder (PPAD). PPAD is a neural field decoder that maps fixed-length latent features to continuous Poisson rate functions across energy band and time via unsupervised learning. PPAD reconstructs the rate function and yields a representation at the same time. We demonstrate the efficacy of PPAD via reconstruction, regression, classification and anomaly detection experiments using the Chandra Source Catalog.

autoencoders representation learning stochastic processes neural field decoding poisson rate reconstruction
Foundational AI Feb 2, 2025

Language Models Use Trigonometry to Do Addition

Subhash Kantamneni, Max Tegmark

Mathematical reasoning is an increasingly important indicator of large language model (LLM) capabilities, yet we lack understanding of how LLMs process even simple mathematical tasks. To address this, we reverse engineer how three mid-sized LLMs compute addition. We first discover that numbers are represented in these LLMs as a generalized helix, which is strongly causally implicated for the tasks of addition and subtraction, and is also causally relevant for integer division, multiplication, and modular arithmetic. We then propose that LLMs compute addition by manipulating this generalized helix using the "Clock" algorithm: to solve $a+b$, the helices for $a$ and $b$ are manipulated to produce the $a+b$ answer helix which is then read out to model logits. We model influential MLP outputs, attention head outputs, and even individual neuron preactivations with these helices and verify our understanding with causal interventions. By demonstrating that LLMs represent numbers on a helix and manipulate this helix to perform addition, we present the first representation-level explanation of an LLM's mathematical capability.

interpretability mechanistic interpretability helix number representation representation learning embeddings
Theoretical Physics Feb 1, 2025

Progress in Normalizing Flows for 4d Gauge Theories

Ryan Abbott, Denis Boyda, Daniel C. Hackett et al.

Normalizing flows have arisen as a tool to accelerate Monte Carlo sampling for lattice field theories. This work reviews recent progress in applying normalizing flows to 4-dimensional nonabelian gauge theories, focusing on two advancements: an architectural improvement referred to as learned active loops, and the application of correlated ensemble methods to QCD with $N_f=2$ dynamical fermions.

normalizing flows lattice qcd learned active loops lattice gauge theory monte carlo methods
Astrophysics Feb 1, 2025

A Fast Periodicity Detection Algorithm Sensitive to Arbitrary Waveforms

Douglas P. Finkbeiner, Thomas A. Prince, Samuel E. Whitebook

A reexamination of period finding algorithms is prompted by new large area astronomical sky surveys that can identify billions of individual sources having a thousand or more observations per source. This large increase in data necessitates fast and efficient period detection algorithms. In this paper, we provide an initial description of an algorithm that is being used for detection of periodic behavior in a sample of 1.5 billion objects using light curves generated from Zwicky Transient Facility (ZTF) data (Bellm et al. 2019; Masci et al. 2018). We call this algorithm "Fast Periodicity Weighting" (FPW), derived using a Gaussian Process (GP) formalism. A major advantage of the FPW algorithm for ZTF analysis is that it is agnostic to the details of the phase-folded waveform. Periodic sources in ZTF show a wide variety of waveforms, some quite complex, including eclipsing objects, sinusoidally varying objects also exhibiting eclipses, objects with cyclotron emission at various phases, and accreting objects with complex waveforms. We describe the FPW algorithm and its application to ZTF, and provide efficient code for both CPU and GPU.

phase-folded periodogram signal detection bayesian inference scalability stochastic processes
Foundational AI Jan 31, 2025

Low-Rank Adapting Models for Sparse Autoencoders

Matthew Chen, Joshua Engels, Max Tegmark

Sparse autoencoders (SAEs) decompose language model representations into a sparse set of linear latent vectors. Recent works have improved SAEs using language model gradients, but these techniques require many expensive backward passes during training and still cause a significant increase in cross entropy loss when SAE reconstructions are inserted into the model. In this work, we improve on these limitations by taking a fundamentally different approach: we use low-rank adaptation (LoRA) to finetune the \textit{language model itself} around a previously trained SAE. We analyze our method across SAE sparsity, SAE width, language model size, LoRA rank, and model layer on the Gemma Scope family of SAEs. In these settings, our method reduces the cross entropy loss gap by 30\% to 55\% when SAEs are inserted during the forward pass. We also find that compared to end-to-end (e2e) SAEs, our approach achieves the same downstream cross entropy loss 3$\times$ to 20$\times$ faster on \gemma and 2$\times$ to 10$\times$ faster on \llama. We further show that our technique improves downstream metrics and can adapt multiple SAEs at once without harming general language model capabilities. Our results demonstrate that improving model interpretability is not limited to post-hoc SAE training; Pareto improvements can also be achieved by directly optimizing the model itself.

sparse autoencoders low-rank adaptation interpretability mechanistic interpretability fine-tuning
Theoretical Physics Jan 28, 2025

QCD Theory meets Information Theory

Benoît Assi, Stefan Höche, Kyle Lee et al.

We present a novel technique to incorporate precision calculations from quantum chromodynamics into fully differential particle-level Monte-Carlo simulations. By minimizing an information-theoretic quantity subject to constraints, our reweighted Monte Carlo incorporates systematic uncertainties absent in individual Monte Carlo predictions, achieving consistency with the theory input in precision and its estimated systematic uncertainties. Our method can be applied to arbitrary observables known from precision calculations, including multiple observables simultaneously. It generates strictly positive weights, thus offering a clear path to statistically powerful and theoretically precise computations for current and future collider experiments. As a proof of concept, we apply our technique to event-shape observables at electron-positron colliders, leveraging existing precision calculations of thrust. Our analysis highlights the importance of logarithmic moments of event shapes, which have not been previously studied in the collider physics literature.

information-theoretic reweighting monte carlo methods collider physics uncertainty quantification quantum field theory
Astrophysics Jan 28, 2025

Central densities of dark matter halos in FIRE-2 simulations of low-mass galaxies with cold dark matter and self-interacting dark matter

Maria C. Straight, Michael Boylan-Kolchin, James S. Bullock et al.

We investigate the central density structure of dark matter halos in cold dark matter (CDM) and self-interacting dark matter (SIDM) models using simulations that are part of the Feedback In Realistic Environments (FIRE) project. For simulated halos of dwarf galaxy scale ($M_{\rm halo}(z=0)\approx 10^{10}\,M_\odot$), we study the central structure in both dissipationless simulations and simulations with full FIRE-2 galaxy formation physics. As has been demonstrated extensively in recent years, both baryonic feedback and self-interactions can convert central cusps into cores, with the former process doing so in a manner that depends sensitively on stellar mass at fixed $M_{\rm halo}$. Whether the two processes (baryonic feedback and self-interactions) are distinguishable, however, remains an open question. Here we demonstrate that, compared to feedback-induced cores, SIDM-induced cores transition more quickly from the central region of constant density to the falling density at larger radial scales. This result holds true even when including identical galaxy formation modeling in SIDM simulations as is used in CDM simulations, since self-interactions dominate over galaxy formation physics in establishing the central structure of SIDM halos in this mass regime. The change in density profile slope as a function of radius therefore holds the potential to discriminate between self-interactions and galaxy formation physics as the driver of core formation in dwarf galaxies.

dark matter cosmological simulation self-interacting dark matter cusp-core transformation baryonic feedback
Astrophysics Jan 24, 2025

Theoretical Predictions for the Inner Dark Matter Distribution in the Milky Way Informed by Simulations

Abdelaziz Hussein, Lina Necib, Manoj Kaplinghat et al.

We build a theoretical range for the Milky Way's (MW) inner dark matter (DM) distribution informed by the FIRE-2, Auriga, VINTERGATAN-GM, and TNG50 simulation suites assuming the canonical cold dark matter (CDM) model. The DM density profiles in Auriga, VINTERGATAN-GM, and TNG50 can be approximately modeled using the adiabatic contraction prescription of Gnedin et al. 2004, while FIRE-2 has stronger baryonic feedback, leading to a departure from the adiabatic contraction model. The simulated halos that are adiabatically contracted are close to spherical (axis ratio $q \in [0.75-0.9]$ at $5^\circ$), whereas halos that experience strong baryonic feedback are oblate ($q \in [0.5-0.7]$). Using the adiabatic contraction and strong baryonic feedback models, along with the observed stellar distribution of the MW, the inner logarithmic density slope for CDM in the MW is predicted to range from $ -0.5$ to $-1.3$. The $J$-factor, which determines the DM-annihilation flux, averaged over a solid angle of $5^\circ$ ($10^\circ$) is predicted to span the range $0.8$-$30$ ($0.6$-$10$) $\times 10^{23} \rm{GeV}^2/\rm{cm}^5$. The $D$-factor, which determines the flux due to DM decay, is predicted to be in the range $0.6$-$2$ ($0.5-1$) $\times10^{23} \rm{GeV}/\rm{cm}^2$. GitHub: The results for this work can be found at https://github.com/abdelazizhussein/MW-Inner-DM-Profile.

dark matter cosmological simulation adiabatic contraction baryonic feedback simulation-based inference
Astrophysics Jan 24, 2025

A theoretical approach to density-split clustering

Mathilde Pinon, Arnaud de Mattia, Étienne Burtin et al.

We present an analytical model for density-split correlation functions, that probe galaxy clustering in different density environments. Specifically, we focus on the cross-correlation between density-split regions and the tracer density field. We show that these correlation functions can be expressed in terms of the two-point probability density function (PDF) of the density field. We derive analytical predictions using three levels of approximation for the two-point PDF: a bivariate Gaussian distribution, a bivariate shifted log-normal distribution, and a prediction based on the Large Deviation Theory (LDT) framework. For count-in-cell densities, obtained through spherical top-hat smoothing, one can leverage spherical collapse dynamics and LDT to predict the density two-point PDF in the large-separation regime relative to the smoothing radius. We validate our model against dark matter N-body simulations in real space, incorporating Poisson shot noise and galaxy bias. Our results show that the LDT prediction outperforms the log-normal approximation, and agrees with simulations on large scales within the cosmic variance of a typical DESI DR1 sample, despite relying on only one degree of freedom.

large deviation theory density estimation cosmological simulation spherical collapse dynamics likelihood estimation
Foundational AI Jan 23, 2025

SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks

Sneh Pandya, Purvik Patel, Brian D. Nord et al.

Modern neural networks (NNs) often do not generalize well in the presence of a "covariate shift"; that is, in situations where the training and test data distributions differ, but the conditional distribution of classification labels remains unchanged. In such cases, NN generalization can be reduced to a problem of learning more domain-invariant features. Domain adaptation (DA) methods include a range of techniques aimed at achieving this; however, these methods have struggled with the need for extensive hyperparameter tuning, which then incurs significant computational costs. In this work, we introduce SIDDA, an out-of-the-box DA training algorithm built upon the Sinkhorn divergence, that can achieve effective domain alignment with minimal hyperparameter tuning and computational overhead. We demonstrate the efficacy of our method on multiple simulated and real datasets of varying complexity, including simple shapes, handwritten digits, and real astronomical observations. SIDDA is compatible with a variety of NN architectures, and it works particularly well in improving classification accuracy and model calibration when paired with equivariant neural networks (ENNs). We find that SIDDA enhances the generalization capabilities of NNs, achieving up to a $\approx40\%$ improvement in classification accuracy on unlabeled target data. We also study the efficacy of DA on ENNs with respect to the varying group orders of the dihedral group $D_N$, and find that the model performance improves as the degree of equivariance increases. Finally, we find that SIDDA enhances model calibration on both source and target data--achieving over an order of magnitude improvement in the ECE and Brier score. SIDDA's versatility, combined with its automated approach to domain alignment, has the potential to advance multi-dataset studies by enabling the development of highly generalizable models.

equivariant neural networks optimal transport covariate shift adaptation transfer learning calibration
Foundational AI Jan 21, 2025

Physics of Skill Learning

Ziming Liu, Yizhou Liu, Eric J. Michaud et al.

We aim to understand physics of skill learning, i.e., how skills are learned in neural networks during training. We start by observing the Domino effect, i.e., skills are learned sequentially, and notably, some skills kick off learning right after others complete learning, similar to the sequential fall of domino cards. To understand the Domino effect and relevant behaviors of skill learning, we take physicists' approach of abstraction and simplification. We propose three models with varying complexities -- the Geometry model, the Resource model, and the Domino model, trading between reality and simplicity. The Domino effect can be reproduced in the Geometry model, whose resource interpretation inspires the Resource model, which can be further simplified to the Domino model. These models present different levels of abstraction and simplification; each is useful to study some aspects of skill learning. The Geometry model provides interesting insights into neural scaling laws and optimizers; the Resource model sheds light on the learning dynamics of compositional tasks; the Domino model reveals the benefits of modularity. These models are not only conceptually interesting -- e.g., we show how Chinchilla scaling laws can emerge from the Geometry model, but also are useful in practice by inspiring algorithmic development -- e.g., we show how simple algorithmic changes, motivated by these toy models, can speed up the training of deep learning models.

skill learning dynamics multi-task learning interpretability neural scaling laws scalability
Foundational AI Jan 15, 2025

Inferring Transition Dynamics from Value Functions

Jacob Adamczyk

In reinforcement learning, the value function is typically trained to solve the Bellman equation, which connects the current value to future values. This temporal dependency hints that the value function may contain implicit information about the environment's transition dynamics. By rearranging the Bellman equation, we show that a converged value function encodes a model of the underlying dynamics of the environment. We build on this insight to propose a simple method for inferring dynamics models directly from the value function, potentially mitigating the need for explicit model learning. Furthermore, we explore the challenges of next-state identifiability, discussing conditions under which the inferred dynamics model is well-defined. Our work provides a theoretical foundation for leveraging value functions in dynamics modeling and opens a new avenue for bridging model-free and model-based reinforcement learning.

reinforcement learning dynamics model inference inverse problems next-state identifiability reward optimization
Foundational AI Jan 15, 2025

Average-Reward Soft Actor-Critic

Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin et al.

The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years for its ability to solve temporally-extended problems without relying on discounting. Meanwhile, in the discounted setting, algorithms with entropy regularization have been developed, leading to improvements over deterministic methods. Despite the distinct benefits of these approaches, deep RL algorithms for the entropy-regularized average-reward objective have not been developed. While policy-gradient based approaches have recently been presented for the average-reward literature, the corresponding actor-critic framework remains less explored. In this paper, we introduce an average-reward soft actor-critic algorithm to address these gaps in the field. We validate our method by comparing with existing average-reward algorithms on standard RL benchmarks, achieving superior performance for the average-reward criterion.

reinforcement learning average-reward mdp entropy regularization actor-critic framework reward optimization
Astrophysics Jan 14, 2025

An Updated Detection Pipeline for Precursor Emission in Type II Supernova 2020tlf

Wynn Jacobson-Galán, Sebastian Gonzalez, Shreyas Patel et al.

We present a new photometric pipeline for the detection of pre-supernova (pre-SN) emission in the Young Supernova Experiment (YSE) sky survey. The method described is applied to SN 2020tlf, a type II SN (SN II) with precursor emission in the last ~100 days before first light. We re-analyze the YSE griz-band light curves of SN 2020tlf and provide revised pre-explosion photometry that includes a robust list of confident detection and limiting magnitudes. Compared to the results of Jacobson-Galan et al. 2022a, this new analysis yields fewer total r/i/z-band pre-SN detections at phases > -100 days. Furthermore, we discourage the use of the blackbody modeling of the pre-explosion spectral energy distribution, the pre-SN bolometric light curve and the blackbody model parameters presented in Jacobson-Galan et al. 2022a. Nevertheless, binned photometry of SN 2020tlf confirms a consistent progenitor luminosity of ~10$^{40}$ erg s$^{-1}$ before explosion.

signal detection pre-supernova precursor emission scientific workflows artificial source injection difference image photometry
Experimental Physics Jan 14, 2025

Lake- and Surface-Based Detectors for Forward Neutrino Physics

Nicholas W. Kamp, Carlos A. Argüelles, Albrecht Karle et al.

We propose two medium-baseline, kiloton-scale neutrino experiments to study neutrinos from LHC proton-proton collisions: SINE, a surface-based scintillator panel detector observing muon neutrinos from the CMS interaction point, and UNDINE, a water Cherenkov detector submerged in lake Geneva observing all-flavor neutrinos from LHCb. Using a Monte Carlo simulation, we estimate millions of neutrino interactions during the high-luminosity LHC era. We show that these datasets can constrain neutrino cross sections, charm production in $pp$ collisions, and strangeness enhancement as a solution to the cosmic-ray muon puzzle. SINE and UNDINE thus offer a cost-effective medium-baseline complement to the proposed short-baseline forward physics facility.

neutrino detection collider physics forward neutrino beams monte carlo methods detector simulation
Foundational AI Jan 13, 2025

Synthesis and Analysis of Data as Probability Measures with Entropy-Regularized Optimal Transport

Brendan Mallery, James M. Murphy, Shuchin Aeron

We consider synthesis and analysis of probability measures using the entropy-regularized Wasserstein-2 cost and its unbiased version, the Sinkhorn divergence. The synthesis problem consists of computing the barycenter, with respect to these costs, of reference measures given a set of coefficients belonging to the simplex. The analysis problem consists of finding the coefficients for the closest barycenter in the Wasserstein-2 distance to a given measure. Under the weakest assumptions on the measures thus far in the literature, we compute the derivative of the entropy-regularized Wasserstein-2 cost. We leverage this to establish a characterization of barycenters with respect to the entropy-regularized Wasserstein-2 cost as solutions that correspond to a fixed point of an average of the entropy-regularized displacement maps. This characterization yields a finite-dimensional, convex, quadratic program for solving the analysis problem when the measure being analyzed is a barycenter with respect to the entropy-regularized Wasserstein-2 cost. We show that these coefficients, as well as the value of the barycenter functional, can be estimated from samples with dimension-independent rates of convergence, and that barycentric coefficients are stable with respect to perturbations in the Wasserstein-2 metric. We employ the barycentric coefficients as features for classification of corrupted point cloud data, and show that compared to neural network baselines, our approach is more efficient in small training data regimes.

optimal transport wasserstein barycenters barycentric coordinates dimension-free convergence classification
Astrophysics Jan 7, 2025

Effects of galactic environment on size and dark matter content in low-mass galaxies

Francisco J. Mercado, Jorge Moreno, Robert Feldmann et al.

We utilize the cosmological volume simulation, FIREbox, to investigate how a galaxy's environment influences its size and dark matter content. Our study focuses on approximately 1,200 galaxies (886 central and 332 satellite halos) in the low-mass regime, with stellar masses between $10^6$ to $10^9$ $M_{\odot}$. We analyze the size-mass relation ($r_{50} - M_{\star}$), inner dark matter mass-stellar mass ($M^{50}_{\rm DM} - M_{\star}$) relation, and the halo mass-stellar mass ($M_{\rm halo} - M_{\star}$) relation. At fixed stellar mass, we find the galaxies experiencing stronger tidal influences, indicated by higher Perturbation Indices (PI $>$ 1) are generally larger and have lower masses relative to their counterparts with lower Perturbation Indices (PI $<$ 1). Applying a Random Forest regression model, we show that both the environment (PI) and halo mass ($M_{rm halo}$) are significant predictors of a galaxy's relative size and dark matter content. Notably, because $M_{\rm halo}$ is also strongly affected by the environment, our findings indicate that environmental conditions not only influence galactic sizes and relative inner dark matter content directly, but also indirectly through their impact on halo mass. Our results highlight a critical interplay between environmental factors and halo mass in shaping galaxy properties, affirming the environment as a fundamental driver in galaxy formation and evolution.

cosmological simulation dark matter galaxy size-mass relation halo mass-stellar mass relation regression
Foundational AI Jan 6, 2025

Predicting band gap from chemical composition: A simple learned model for a material property with atypical statistics

Andrew Ma, Owen Dugan, Marin Soljačić

In solid-state materials science, substantial efforts have been devoted to the calculation and modeling of the electronic band gap. While a wide range of ab initio methods and machine learning algorithms have been created that can predict this quantity, the development of new computational approaches for studying the band gap remains an active area of research. Here we introduce a simple machine learning model for predicting the band gap using only the chemical composition of the crystalline material. To motivate the form of the model, we first analyze the empirical distribution of the band gap, which sheds new light on its atypical statistics. Specifically, our analysis enables us to frame band gap prediction as a task of modeling a mixed random variable, and we design our model accordingly. Our model formulation incorporates thematic ideas from chemical heuristic models for other material properties in a manner that is suited towards the band gap modeling task. The model has exactly one parameter corresponding to each element, which is fit using data. To predict the band gap for a given material, the model computes a weighted average of the parameters associated with its constituent elements and then takes the maximum of this quantity and zero. The model provides heuristic chemical interpretability by intuitively capturing the associations between the band gap and individual chemical elements.

mixed random variable modeling elemental parameter model regression interpretability materials discovery
Theoretical Physics Jan 3, 2025

Learning Fricke signs from Maass form Coefficients

Joanna Bieri, Giorgi Butbaia, Edgar Costa et al.

In this paper, we conduct a data-scientific investigation of Maass forms. We find that averaging the Fourier coefficients of Maass forms with the same Fricke sign reveals patterns analogous to the recently discovered "murmuration" phenomenon, and that these patterns become more pronounced when parity is incorporated as an additional feature. Approximately 43% of the forms in our dataset have an unknown Fricke sign. For the remaining forms, we employ Linear Discriminant Analysis (LDA) to machine learn their Fricke sign, achieving 96% (resp. 94%) accuracy for forms with even (resp. odd) parity. We apply the trained LDA model to forms with unknown Fricke signs to make predictions. The average values based on the predicted Fricke signs are computed and compared to those for forms with known signs to verify the reasonableness of the predictions. Additionally, a subset of these predictions is evaluated against heuristic guesses provided by Hejhal's algorithm, showing a match approximately 95% of the time. We also use neural networks to obtain results comparable to those from the LDA model.

maass forms classification murmuration phenomenon feature extraction semi-supervised learning
Astrophysics Jan 3, 2025

Exoplanet Detection via Differentiable Rendering

Brandon Y. Feng, Rodrigo Ferrer-Chávez, Aviad Levis et al.

Direct imaging of exoplanets is crucial for advancing our understanding of planetary systems beyond our solar system, but it faces significant challenges due to the high contrast between host stars and their planets. Wavefront aberrations introduce speckles in the telescope science images, which are patterns of diffracted starlight that can mimic the appearance of planets, complicating the detection of faint exoplanet signals. Traditional post-processing methods, operating primarily in the image intensity domain, do not integrate wavefront sensing data. These data, measured mainly for adaptive optics corrections, have been overlooked as a potential resource for post-processing, partly due to the challenge of the evolving nature of wavefront aberrations. In this paper, we present a differentiable rendering approach that leverages these wavefront sensing data to improve exoplanet detection. Our differentiable renderer models wave-based light propagation through a coronagraphic telescope system, allowing gradient-based optimization to significantly improve starlight subtraction and increase sensitivity to faint exoplanets. Simulation experiments based on the James Webb Space Telescope configuration demonstrate the effectiveness of our approach, achieving substantial improvements in contrast and planet detection limits. Our results showcase how the computational advancements enabled by differentiable rendering can revitalize previously underexploited wavefront data, opening new avenues for enhancing exoplanet imaging and characterization.

exoplanets inverse problems differentiable rendering wavefront aberration estimation signal detection
Experimental Physics Jan 3, 2025

Robust resonant anomaly detection with NPLM

Gaia Grosso, Debajyoti Sengupta, Tobias Golling et al.

In this study, we investigate the application of the New Physics Learning Machine (NPLM) algorithm as an alternative to the standard CWoLa method with Boosted Decision Trees (BDTs), particularly for scenarios with rare signal events. NPLM offers an end-to-end approach to anomaly detection and hypothesis testing by utilizing an in-sample evaluation of a binary classifier to estimate a log-density ratio, which can improve detection performance without prior assumptions on the signal model. We examine two approaches: (1) a end-to-end NPLM application in cases with reliable background modelling and (2) an NPLM-based classifier used for signal selection when accurate background modelling is unavailable, with subsequent performance enhancement through a hyper-test on multiple values of the selection threshold. Our findings show that NPLM-based methods outperform BDT-based approaches in detection performance, particularly in low signal injection scenarios, while significantly reducing epistemic variance due to hyperparameter choices. This work highlights the potential of NPLM for robust resonant anomaly detection in particle physics, setting a foundation for future methods that enhance sensitivity and consistency under signal variability.

anomaly detection hypothesis testing likelihood ratio new physics searches signal detection
Astrophysics Jan 3, 2025

Cosmological constraints from the Minkowski functionals of the BOSS CMASS galaxy sample

Wei Liu, Enrique Paillas, Carolina Cuesta-Lazaro et al.

For the first time, we develop a simulation-based model for the Minkowski functionals (MFs) of large-scale structure, which allows us to extract the full information available from the MFs (including both the Gaussian and non-Gaussian part), and apply it to the BOSS DR12 CMASS galaxy sample. Our model is based on high-fidelity mock galaxy catalogs constructed from the \textsc{Abacus}\textsc{Summit} simulations using the halo occupation distribution (HOD) framework, which include the redshift-space distortions and Alcock-Paczynski distortions, incorporate survey realism, including survey geometry and veto masks, and account for angular plus radial selection effects. The cosmological and HOD parameter dependence of the MFs is captured with a neural network emulator trained from the galaxy mocks with various cosmological and HOD parameters. To benchmark the constraining power of the MFs, we also train an emulator for the galaxy 2-point correlation function (2PCF) using the same pipeline. Having validated our approach through successful parameter recovery tests on both internal and external mocks, including non-HOD forward models of the halo-galaxy connection, we apply our forward model to analyze the CMASS data in the redshift range $0.45<z<0.58$. We find the MFs provide stronger constraints on the cosmological parameters than the 2PCF. The combination of the two gives $ω_{\rm cdm}=0.1172^{+0.0020}_{-0.0023}$, $σ_8=0.783\pm 0.026$, and $n_s=0.966^{+0.019}_{-0.015}$, which are tighter by a factor of 2.0, 1.9, and 1.6 than the 2PCF alone. The derived constraint $fσ_8=0.453 \pm 0.016$ is also improved by a factor of 1.9, compared to the 2PCF, and agrees well with Planck 2018 predictions and other results from a series of studies in the literature.

minkowski functionals simulation-based inference emulation non-gaussian statistics surrogate modeling
Astrophysics Jan 2, 2025

ORACLE: A Real-Time, Hierarchical, Deep-Learning Photometric Classifier for the LSST

Ved G. Shah, Alex Gagliano, Konstantin Malanchev et al.

We present ORACLE, the first hierarchical deep-learning model for real-time, context-aware classification of transient and variable astrophysical phenomena. ORACLE is a recurrent neural network with Gated Recurrent Units (GRUs), and has been trained using a custom hierarchical cross-entropy loss function to provide high-confidence classifications along an observationally-driven taxonomy with as little as a single photometric observation. Contextual information for each object, including host galaxy photometric redshift, offset, ellipticity and brightness, is concatenated to the light curve embedding and used to make a final prediction. Training on $\sim$0.5M events from the Extended LSST Astronomical Time-Series Classification Challenge, we achieve a top-level (Transient vs Variable) macro-averaged precision of 0.96 using only 1 day of photometric observations after the first detection in addition to contextual information, for each event; this increases to $>$0.99 once 64 days of the light curve has been obtained, and 0.83 at 1024 days after first detection for 19-way classification (including supernova sub-types, active galactic nuclei, variable stars, microlensing events, and kilonovae). We also compare ORACLE with other state-of-the-art classifiers and report comparable performance for the 19-way classification task, in addition to delivering accurate top-level classifications much earlier. The code and model weights used in this work are publicly available at our associated GitHub repository (https://github.com/uiucsn/ELAsTiCC-Classification).

hierarchical classification recurrent networks classification loss function design photometric light curves
Astrophysics Jan 2, 2025

A Near-IR Search for Helium in the Superluminous Supernova SN 2024ahr

Harsh Kumar, Edo Berger, Peter K. Blanchard et al.

We present a detailed study of SN 2024ahr, a hydrogen-poor superluminous supernova (SLSN-I), for which we determine a redshift of $z=0.0861$. SN 2024ahr has a peak absolute magnitude of $M_g\approx M_r\approx -21$ mag, rest-frame rise and decline times (50$\%$ of peak) of about 40 and 80 days, respectively, and typical spectroscopic evolution in the optical band. Similarly, modeling of the UV/optical light curves with a magnetar spin-down engine leads to typical parameters: an initial spin period of $\approx 3.3$ ms, a magnetic field strength of $\approx 6\times 10^{13}$ G, and an ejecta mass of $\approx 9.5$ M$_\odot$. Due to its relatively low redshift we obtained a high signal-to-noise ratio near-IR spectrum about 43 rest-frame days post-peak to search for the presence of helium. We do not detect any significant feature at the location of the He I $\,λ2.058$ $μ$m feature, and place a conservative upper limit of $\sim 0.05$ M$_\odot$ on the mass of helium in the outer ejecta. We detect broad features of Mg I $\,λ1.575$ $μ$m and a blend of Co II $\,λ2.126$ $μ$m and Mg II, $λ2.136$ $μ$m, which are typical of Type Ic SNe, but with higher velocities. Examining the sample of SLSNe-I with NIR spectroscopy, we find that, unlike SN 2024ahr, these events are generally peculiar. This highlights the need for a large sample of prototypical SLSNe-I with NIR spectroscopy to constrain the fraction of progenitors with helium (Ib-like) and without helium (Ic-like) at the time of the explosion, and hence the evolutionary path(s) leading to the rare outcome of SLSNe-I.

supernova classification nir spectroscopy magnetar spin-down helium shell detection stellar evolution
Foundational AI Jan 2, 2025

Bootstrapped Reward Shaping

Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin et al.

In reinforcement learning, especially in sparse-reward domains, many environment steps are required to observe reward information. In order to increase the frequency of such observations, "potential-based reward shaping" (PBRS) has been proposed as a method of providing a more dense reward signal while leaving the optimal policy invariant. However, the required "potential function" must be carefully designed with task-dependent knowledge to not deter training performance. In this work, we propose a "bootstrapped" method of reward shaping, termed BSRS, in which the agent's current estimate of the state-value function acts as the potential function for PBRS. We provide convergence proofs for the tabular setting, give insights into training dynamics for deep RL, and show that the proposed method improves training speed in the Atari suite.

reinforcement learning bootstrapped reward shaping reward optimization potential-based reward shaping sparse reward
Astrophysics Dec 27, 2024

A Neural Network-Based Search for Unmodeled Transients in LIGO-Virgo-KAGRA's Third Observing Run

Ryan Raikman, Eric A. Moreno, Katya Govorkova et al.

This paper presents the results of a Neural Network (NN)-based search for short-duration gravitational-wave transients in data from the third observing run of LIGO, Virgo, and KAGRA. The search targets unmodeled transients with durations of milliseconds to a few seconds in the 30-1500 Hz frequency band, without assumptions about the incoming signal direction, polarization, or morphology. Using the Gravitational Wave Anomalous Knowledge (GWAK) method, three compact binary coalescences (CBCs) identified by existing pipelines are successfully detected, along with a range of detector glitches. The algorithm constructs a low-dimensional embedded space to capture the physical features of signals, enabling the detection of CBCs, detector glitches, and unmodeled transients. This study demonstrates GWAK's ability to enhance gravitational-wave searches beyond the limits of existing pipelines, laying the groundwork for future detection strategies.

gravitational waves anomaly detection autoencoders signal detection semi-supervised learning
Foundational AI Dec 19, 2024

Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning

Simon Frieder, Jonas Bayer, Sam Looi et al.

The datasets and benchmarks commonly used to train and evaluate the mathematical capabilities of AI-based mathematical copilots (primarily large language models) exhibit several shortcomings and misdirections. These range from a restricted scope of mathematical complexity to limited fidelity in capturing aspects beyond the final, written proof (e.g. motivating the proof, or representing the thought processes leading to a proof). These issues are compounded by a dynamic reminiscent of Goodhart's law: as benchmark performance becomes the primary target for model development, the benchmarks themselves become less reliable indicators of genuine mathematical capability. We systematically explore these limitations and contend that enhancing the capabilities of large language models, or any forthcoming advancements in AI-based mathematical assistants (copilots or ``thought partners''), necessitates a course correction both in the design of mathematical datasets and the evaluation criteria of the models' mathematical ability. In particular, it is necessary for benchmarks to move beyond the existing result-based datasets that map theorem statements directly to proofs, and instead focus on datasets that translate the richer facets of mathematical research practice into data that LLMs can learn from. This includes benchmarks that supervise the proving process and the proof discovery process itself, and we advocate for mathematical dataset developers to consider the concept of "motivated proof", introduced by G. Pólya in 1949, which can serve as a blueprint for datasets that offer a better proof learning signal, alleviating some of the mentioned limitations.

motivated proof mathematical benchmarking proof process supervision scientific workflows transformers
Experimental Physics Dec 9, 2024

Product Manifold Machine Learning for Physics

Nathaniel S. Woodward, Sang Eon Park, Gaia Grosso et al.

Physical data are representations of the fundamental laws governing the Universe, hiding complex compositional structures often well captured by hierarchical graphs. Hyperbolic spaces are endowed with a non-Euclidean geometry that naturally embeds those structures. To leverage the benefits of non-Euclidean geometries in representing natural data we develop machine learning on $\mathcal P \mathcal M$ spaces, Cartesian products of constant curvature Riemannian manifolds. As a use case we consider the classification of "jets", sprays of hadrons and other subatomic particles produced by the hadronization of quarks and gluons in collider experiments. We compare the performance of $\mathcal P \mathcal M$-MLP and $\mathcal P \mathcal M$-Transformer models across several possible representations. Our experiments show that $\mathcal P \mathcal M$ representations generally perform equal or better to fully Euclidean models of similar size, with the most significant gains found for highly hierarchical jets and small models. We discover significant correlation between the degree of hierarchical structure at a per-jet level and classification performance with the $\mathcal P \mathcal M$-Transformer in top tagging benchmarks. This is a promising result highlighting a potential direction for further improving machine learning model performance through tailoring geometric representation at a per-sample level in hierarchical datasets. These results reinforce the view of geometric representation as a key parameter in maximizing both performance and efficiency of machine learning on natural data.

product manifold spaces manifold learning geometric deep learning jet physics hyperbolic embeddings
Foundational AI Dec 3, 2024

Grayscale to Hyperspectral at Any Resolution Using a Phase-Only Lens

Dean Hazineh, Federico Capasso, Todd Zickler

We consider the problem of reconstructing a HxWx31 hyperspectral image from a HxW grayscale snapshot measurement that is captured using only a single diffractive optic and a filterless panchromatic photosensor. This problem is severely ill-posed, but we present the first model that produces high-quality results. We make efficient use of limited data by training a conditional denoising diffusion model that operates on small patches in a shift-invariant manner. During inference, we synchronize per-patch hyperspectral predictions using guidance derived from the optical point spread function. Surprisingly, our experiments reveal that patch sizes as small as the PSFs support achieve excellent results, and they show that local optical cues are sufficient to capture full spectral information. Moreover, by drawing multiple samples, our model provides per-pixel uncertainty estimates that strongly correlate with reconstruction error. Our work lays the foundation for a new class of high-resolution snapshot hyperspectral imagers that are compact and light-efficient.

diffusion models inverse problems psf-guided diffusion hyperspectral reconstruction uncertainty quantification
Astrophysics Dec 2, 2024

The Millennium and Astrid galaxies in effective field theory: comparison with galaxy-halo connection models at the field level

Mikhail M. Ivanov, Carolina Cuesta-Lazaro, Andrej Obuljen et al.

Cosmological analyses of redshift space clustering data are primarily based on using luminous ``red'' galaxies (LRGs) and ``blue'' emission line galaxies (ELGs) to trace underlying dark matter. Using the large high-fidelity high-resolution MillenniumTNG (MTNG) and Astrid simulations, we study these galaxies with the effective field theory (EFT)-based field level forward model. We confirm that both red and blue galaxies can be accurately modeled with EFT at the field level and their parameters match those of the phenomenological halo-based models. Specifically, we consider the state of the art Halo Occupation Distribution (HOD) and High Mass Quenched (HMQ) models for the red and blue galaxies, respectively. Our results explicitly confirm the validity of the halo-based models on large scales beyond the two-point statistics. In addition, we validate the field-level HOD/HMQ-based priors for EFT full-shape analysis. We find that the local bias parameters of the ELGs are in tension with the predictions of the LRG-like HOD models and present a simple analytic argument explaining this phenomenology. We also confirm that ELGs exhibit weaker non-linear redshift-space distortions (``fingers-of-God''), suggesting that a significant fraction of their data should be perturbative. We find that the response of EFT parameters to galaxy selection is sensitive to assumptions about baryonic feedback, suggesting that a detailed understanding of feedback processes is necessary for robust predictions of EFT parameters. Finally, using neural density estimation based on paired HOD-EFT parameter samples, we obtain optimal HOD models that reproduce the clustering of Astrid and MTNG galaxies.

effective field theory cosmological simulation galaxy-halo connection models galaxy bias expansion density estimation
Astrophysics Dec 1, 2024

Probing primordial non-Gaussianity by reconstructing the initial conditions

Xinyi Chen, Nikhil Padmanabhan, Daniel J. Eisenstein

We propose to constrain the primordial (local-type) non-Gaussianity signal by first reconstructing the initial density field to remove the late time non-Gaussianities introduced by gravitational evolution. Our reconstruction algorithm combines perturbation theory on large scales with a convolutional neural network on small scales. We reconstruct the squared potential (that sources the non-Gaussian signal) out to $k=0.2\ h$/Mpc to an accuracy of 99.8%. We cross-correlate this squared potential field with the reconstructed density field and verify that this computationally inexpensive estimator has the same information content as the full matter bispectrum. As a proof of concept, our approach can yield up to a factor of three improvement in the $f_{\rm NL}$ constraints, although it does not yet include the complications of galaxy bias or imperfections in the reconstruction. These potential improvements make it a promising alternative to current approaches to constraining primordial non-Gaussianity.

primordial non-gaussianity initial conditions reconstruction bispectrum estimation inverse problems convolutional networks
Experimental Physics Nov 15, 2024

SymbolFit: Automatic Parametric Modeling with Symbolic Regression

Ho Fung Tsoi, Dylan Rankin, Cecile Caillol et al.

We introduce SymbolFit, a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we develop a framework that automates and streamlines the process by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without requiring a predefined functional form because the functional form itself is treated as a trainable parameter, making the process far more efficient and effortless than traditional regression methods. We demonstrate the framework in high-energy physics experiments at the CERN Large Hadron Collider (LHC) using five real proton-proton collision datasets from new physics searches, including background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We show that our framework can flexibly and efficiently generate a wide range of candidate functions that fit a nontrivial distribution well using a simple fit configuration that varies only by random seed, and that the same fit configuration, which defines a vast function space, can also be applied to distributions of different shapes, whereas achieving a comparable result with traditional methods would have required extensive manual effort.

symbolic regression uncertainty quantification regression new physics searches background parametric modeling
Astrophysics Nov 11, 2024

Type IIn Supernovae. I. Uniform Light Curve Characterization and a Bimodality in the Radiated Energy Distribution

Daichi Hiramatsu, Edo Berger, Sebastian Gomez et al.

We present the largest uniform study to date of Type IIn supernovae (SNe IIn), focusing in this first paper on the multi-band optical light curves of $487$ SNe IIn. The sample, constructed from multiple surveys, extends to $z \approx 0.8$, with the majority of events at $z \lesssim 0.3$. We construct uniform multi-band and bolometric light curves using Gaussian process regression, and determine key observed properties in the rest-frame (e.g., peak luminosity, timescales, radiated energy). We find that SNe IIn span broad ranges in peak luminosity ($\sim 10^{42}-10^{44}$ erg s$^{-1}$) and timescales ($\sim 20-300$ days above 50% of peak luminosity), but the sample divides into two clear groups in the luminosity-timescale phase-space around the median peak luminosity ($\approx 10^{43}$ erg s$^{-1}$): faint-fast and luminous-slow groups. This leads to a strong bimodality in the radiated energy distribution, with peaks at $\sim 10^{49}$ and $\sim 2\times10^{50}$ erg, with the latter events having a characteristic timescale of $\sim 100$ days, and the former appearing to bifurcate into two branches with timescales of $\sim 40$ and $\sim 70$ days. Therefore, SNe IIn exhibit at least two dominant groupings, and perhaps three, which are likely reflective of different progenitor and/or circumstellar medium formation pathways. We do not find any obvious transition in SN IIn properties at the arbitrary cut-off of $\approx -20$ mag used for the designation "Type II Superluminous Supernovae", and we argue that this classification should be abandoned. The absence of SNe IIn with timescales of $\lesssim 14$ days defines the region occupied by fast transients with evidence for interaction with hydrogen-poor circumstellar medium.

supernova classification bimodal energy distribution circumstellar interaction regression photometric survey analysis
Theoretical Physics Nov 8, 2024

A Twist on Heterotic Little String Duality

Hamza Ahmed, Paul-Konstantin Oehlmann, Fabian Ruehle

In this work, we significantly expand the web of T-dualities among heterotic NS5-brane theories with eight supercharges. This is achieved by introducing twists involving outer automorphisms of discrete gauge/flavor factors and tensor multiplet permutations along the compactification circle. We assemble field theory data that we propose as invariants across T-dual theories, comprised of twisted Coulomb branch dimensions, higher group structures and flavor symmetry ranks. Using this data, we establish a detailed field theory correspondence between singularities of the compactification space, the number five-branes in the theory, and the flavor symmetry factors. The twisted theories are realized via M-theory compactifications on non-compact genus-one fibered Calabi-Yau threefolds without section. This approach allows us to prove duality of twisted and (un-)twisted theories by leveraging M/F-theory duality and identifying inequivalent torus fibrations in the same geometry. We construct several new 5D theories, including a novel type of CHL-like twisted theory where the two M9 branes are identified. Using their field theory invariants, we also construct their dual theories.

t-duality string theory calabi-yau compactification brane physics quantum field theory
Astrophysics Nov 7, 2024

Conversations and Deliberations: Non-Standard Cosmological Epochs and Expansion Histories

Brian Batell, Keith R. Dienes, Brooks Thomas et al.

This document summarizes the discussions which took place during the PITT-PACC Workshop entitled "Non-Standard Cosmological Epochs and Expansion Histories," held in Pittsburgh, Pennsylvania, Sept. 5-7, 2024. Much like the non-standard cosmological epochs that were the subject of these discussions, the format of this workshop was also non-standard. Rather than consisting of a series of talks from participants, with each person presenting their own work, this workshop was instead organized around free-form discussion blocks, with each centered on a different overall theme and guided by a different set of Discussion Leaders. This document is not intended to serve as a comprehensive review of these topics, but rather as an informal record of the discussions that took place during the workshop, in the hope that the content and free-flowing spirit of these discussions may inspire new ideas and research directions.

early matter domination gravitational waves cosmological stasis primordial black holes cosmic microwave background
Theoretical Physics Nov 5, 2024

Structure factor and topological bound of twisted bilayer semiconductors at fractional fillings

Timothy Zaklama, Di Luo, Liang Fu

The structure factor is a useful observable for probing charge density correlations in real materials, and its long-wavelength behavior encapsulated by ``quantum weight'' has recently gained prominence in the study of quantum geometry and topological phases of matter. Here we employ the static structure factor, S(q), to explore the phase diagram of twisted transition metal dichalcogenides (TMDs), specifically tMoTe2, at filling factors n=1/3, 2/3 under varying displacement fields. Our results reveal a topological phase transition between a fractional Chern insulator (FCI) and a generalized Wigner crystal (GWC). This transition is marked by the appearance of Bragg peaks at charge-density-wave vectors, and simultaneously, large decrease of S(q) at small q which lowers the interaction energy. We further calculate the quantum weight of various FCI states, verifying the universal topological bound. Our findings provide new insights into the phase diagram of twisted TMDs and establish a general framework for characterizing topological phases through structure factor analysis.

fractional chern insulator quantum weight phase transitions moiré quantum geometry quantum states
Foundational AI Nov 4, 2024

Adaptive Length Image Tokenization via Recurrent Allocation

Shivam Duggal, Phillip Isola, Antonio Torralba et al.

Current vision systems typically assign fixed-length representations to images, regardless of the information content. This contrasts with human intelligence - and even large language models - which allocate varying representational capacities based on entropy, context and familiarity. Inspired by this, we propose an approach to learn variable-length token representations for 2D images. Our encoder-decoder architecture recursively processes 2D image tokens, distilling them into 1D latent tokens over multiple iterations of recurrent rollouts. Each iteration refines the 2D tokens, updates the existing 1D latent tokens, and adaptively increases representational capacity by adding new tokens. This enables compression of images into a variable number of tokens, ranging from 32 to 256. We validate our tokenizer using reconstruction loss and FID metrics, demonstrating that token count aligns with image entropy, familiarity and downstream task requirements. Recurrent token processing with increasing representational capacity in each iteration shows signs of token specialization, revealing potential for object / part discovery.

representation learning adaptive token allocation recurrent networks autoencoders attention mechanisms
Foundational AI Nov 4, 2024

The $\mathcal{D}$-Geometric Hilbert Scheme -- Part II: Hilbert and Quot DG-Schemes

Jacob Kryczka, Artan Sheshmani

This is the second in a series of two papers developing a moduli-theoretic framework for differential ideal sheaves associated with formally integrable, involutive systems of algebraic partial differential equations (PDEs). Building on earlier work, which established the existence of moduli stacks for such systems with prescribed regularity and stability conditions, we now construct a derived enhancement of these moduli spaces. We prove the derived $\mathcal{D}$-Quot functor admits a global differential graded refinement representable by a suitable differential graded $\mathcal{D}$-manifold. We further analyze the finiteness, representability, and functoriality properties of these derived moduli spaces, establishing foundations for a derived deformation theory of algebraic differential equations.

derived d-geometry moduli of pdes spencer cohomology group theory symmetry preservation
Foundational AI Nov 4, 2024

DexHub and DART: Towards Internet Scale Robot Data Collection

Younghyo Park, Jagdeep Singh Bhatia, Lars Ankile et al.

The quest to build a generalist robotic system is impeded by the scarcity of diverse and high-quality data. While real-world data collection effort exist, requirements for robot hardware, physical environment setups, and frequent resets significantly impede the scalability needed for modern learning frameworks. We introduce DART, a teleoperation platform designed for crowdsourcing that reimagines robotic data collection by leveraging cloud-based simulation and augmented reality (AR) to address many limitations of prior data collection efforts. Our user studies highlight that DART enables higher data collection throughput and lower physical fatigue compared to real-world teleoperation. We also demonstrate that policies trained using DART-collected datasets successfully transfer to reality and are robust to unseen visual disturbances. All data collected through DART is automatically stored in our cloud-hosted database, DexHub, which will be made publicly available upon curation, paving the path for DexHub to become an ever-growing data hub for robot learning. Videos are available at: https://dexhub.ai/project

crowdsourced teleoperation augmented reality interface scalability sim-to-real transfer robot demonstration dataset
Theoretical Physics Nov 4, 2024

Small-scale Hamiltonian optimization of interpolating operators for Lagrangian lattice quantum field theory

Artur Avkhadiev, Lena Funcke, Karl Jansen et al.

Lattice quantum field theory calculations may potentially combine the advantages of Hamiltonian formulations with the scalability and control of conventional Lagrangian frameworks. However, such hybrid approaches need to consider (1) the differences in renormalized coupling values between the two formulations, and (2) finite-volume and discretization effects when the Hamiltonian component of the calculation is characterized by a smaller volume or coarser lattice spacing than the Lagrangian component. This work investigates the role of both factors in the application of Hamiltonian-optimized interpolating operator constructions for the conventional Lagrangian framework. The numerical investigation is realized for the pseudoscalar meson in the Schwinger model, using tensor-network and Monte-Carlo calculations. It is demonstrated that tensor-network-optimized constructions are robust to both (1) and (2). In particular, accurate optimized constructions for the pseudoscalar meson can be obtained from calculations with a smaller number of Hamiltonian lattice sites, even when the meson mass itself receives significant finite-volume corrections. To the extent that these results generalize to theories with more complicated spectra, the method holds promise for near-term applications in large-scale calculations of lattice quantum field theory.

lattice gauge theory interpolating operator optimization quantum field theory tensor networks hamiltonian systems
Theoretical Physics Nov 1, 2024

Not So Flat Metrics

Kit Fraser-Taliente, Thomas R. Harvey, Manki Kim

In order to be in control of the $α'$ derivative expansion, geometric string compactifications are understood in the context of a large volume approximation. In this letter, we consider the reduction of these higher derivative terms, and propose an improved estimate on the large volume approximation using numerical Calabi-Yau metrics obtained via machine learning methods. Further to this, we consider the $α'^3$ corrections to numerical Calabi-Yau metrics in the context of IIB string theory. This correction represents one of several important contributions for realistic string compactifications -- alongside, for example, the backreaction of fluxes and local sources -- all of which have important consequences for string phenomenology. As a simple application of the corrected metric, we compute the change to the spectrum of the scalar Laplacian.

string theory calabi-yau metrics alpha-prime corrections effective field theory physics-informed neural networks
Experimental Physics Nov 1, 2024

A Lorentz-Equivariant Transformer for All of the LHC

Johann Brehmer, Víctor Bresó, Pim de Haan et al.

We show that the Lorentz-Equivariant Geometric Algebra Transformer (L-GATr) yields state-of-the-art performance for a wide range of machine learning tasks at the Large Hadron Collider. L-GATr represents data in a geometric algebra over space-time and is equivariant under Lorentz transformations. The underlying architecture is a versatile and scalable transformer, which is able to break symmetries if needed. We demonstrate the power of L-GATr for amplitude regression and jet classification, and then benchmark it as the first Lorentz-equivariant generative network. For all three LHC tasks, we find significant improvements over previous architectures.

equivariant neural networks geometric deep learning transformers symmetry preservation spacetime geometric algebra
Foundational AI Oct 31, 2024

Linearized Wasserstein Barycenters: Synthesis, Analysis, Representational Capacity, and Applications

Matthew Werenski, Brendan Mallery, Shuchin Aeron et al.

We propose the linear barycentric coding model (LBCM) which utilizes the linear optimal transport (LOT) metric for analysis and synthesis of probability measures. We provide a closed-form solution to the variational problem characterizing the probability measures in the LBCM and establish equivalence of the LBCM to the set of 2-Wasserstein barycenters in the special case of compatible measures. Computational methods for synthesizing and analyzing measures in the LBCM are developed with finite sample guarantees. One of our main theoretical contributions is to identify an LBCM, expressed in terms of a simple family, which is sufficient to express all probability measures on the closed unit interval. We show that a natural analogous construction of an LBCM in 2 dimensions fails, and we leave it as an open problem to identify the proper extension in more than 1 dimension. We conclude by demonstrating the utility of LBCM for covariance estimation and data imputation.

optimal transport linearized wasserstein embedding barycentric coding representation learning dimensionality reduction
Astrophysics Oct 30, 2024

ASURA-FDPS-ML: Star-by-star Galaxy Simulations Accelerated by Surrogate Modeling for Supernova Feedback

Keiya Hirashima, Kana Moriwaki, Michiko S. Fujii et al.

We introduce new high-resolution galaxy simulations accelerated by a surrogate model that reduces the computation cost by approximately 75 percent. Massive stars with a Zero Age Main Sequence mass of more than about 10 $\mathrm{M_\odot}$ explode as core-collapse supernovae (CCSNe), which play a critical role in galaxy formation. The energy released by CCSNe is essential for regulating star formation and driving feedback processes in the interstellar medium (ISM). However, the short integration timesteps required for SNe feedback have presented significant bottlenecks in astrophysical simulations across various scales. Overcoming this challenge is crucial for enabling star-by-star galaxy simulations, which aim to capture the dynamics of individual stars and the inhomogeneous shell's expansion within the turbulent ISM. To address this, our new framework combines direct numerical simulations and surrogate modeling, including machine learning and Gibbs sampling. The star formation history and the time evolution of outflow rates in the galaxy match those obtained from resolved direct numerical simulations. Our new approach achieves high-resolution fidelity while reducing computational costs, effectively bridging the physical scale gap and enabling multi-scale simulations.

surrogate modeling cosmological simulation emulation stellar evolution multi-scale simulation
Foundational AI Oct 29, 2024

Multi-rigidity of Schubert classes in partial flag varieties

Yuxiang Liu, Artan Sheshmani, Shing-Tung Yau

In this paper, we study the multi-rigidity problem in rational homogeneous spaces. A Schubert class is called multi-rigid if every multiple of it can only be represented by a union of Schubert varieties. We prove the multi-rigidity of Schubert classes in rational homogeneous spaces. In particular, we characterize the multi-rigid Schubert classes in partial flag varieties of type A, B and D. Moreover, for a general rational homogeneous space $G/P$, we deduce the rigidity and multi-rigidity from the corresponding generalized Grassmannians (correspond to maximal parabolics). When $G$ is semi-simple, we also deduce the rigidity and multi-rigidity from the simple cases.

schubert calculus flag variety geometry cohomology rigidity group theory symmetry preservation
Astrophysics Oct 28, 2024

Inferring the Morphology of the Galactic Center Excess with Gaussian Processes

Edward D. Ramirez, Yitian Sun, Matthew R. Buckley et al.

Descriptions of the Galactic Center using Fermi gamma-ray data have so far modeled the Galactic Center Excess (GCE) as a template with fixed spatial morphology or as a linear combination of such templates. Although these templates are informed by various physical expectations, the morphology of the excess is a priori unknown. For the first time, we describe the GCE using a flexible, non-parametric machine learning model -- the Gaussian process (GP). We assess our model's performance on synthetic data, demonstrating that the model can recover the templates used to generate the data. We then fit the \Fermi data with our model in a single energy bin from 2-20 GeV (leaving a spectral GP analysis of the GCE for future work) using a variety of template models of diffuse gamma-ray emission to quantify our fits' systematic uncertainties associated with diffuse emission modeling. We interpret our best-fit GP in terms of GCE templates consisting of an NFW squared template and a bulge component to determine which bulge models can best describe the fitted GP and to what extent the best-fit GP is described better by an NFW squared template versus a bulge template. The best-fit GP contains morphological features that are typically not associated with traditional GCE studies. These include a localized bright source at around $(\ell,b) = (20^{\circ}, 0^{\circ})$ and a diagonal arm extending Northwest from the Galactic Center. In spite of these novel features, the fitted GP is explained best by a template-based model consisting of the bulge presented in Coleman et al. (2020) and a squared NFW component. Our results suggest that the physical interpretation of the GCE in terms of stellar bulge and NFW-like components is highly sensitive to the assumed morphologies, background models, and the region of the sky used for inference.

stochastic processes galactic center excess bayesian inference posterior estimation dark matter
Foundational AI Oct 28, 2024

Relative Monoidal Bondal-Orlov

Artan Sheshmani, Angel Toledo

In this article we study a relative monoidal version of the Bondal-Orlov reconstruction theorem. We establish an uniqueness result for tensor triangulated category structures $(\boxtimes,\mathbb{1})$ on the derived category $D^{b}(X)$ of a variety $X$ which is smooth projective and faithfully flat over a quasi-compact quasi-separated base scheme $S$ in the case where the fibers $X_{s}$ over any point $s\in S$ all have ample (anti-)canonical bundles. To do so we construct a stack $Γ$ of dg-bifunctors which parametrize the local homotopical behaviour of $\boxtimes$, and we study some of its properties around the derived categories of the fibers $X_{s}$.

tensor triangulated categories derived categories algebraic geometry stacks dg-enhancements group theory
Foundational AI Oct 27, 2024

A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

Julia Balla, Siddharth Mishra-Sharma, Carolina Cuesta-Lazaro et al.

Efficiently processing structured point cloud data while preserving multiscale information is a key challenge across domains, from graphics to atomistic modeling. Using a curated dataset of simulated galaxy positions and properties, represented as point clouds, we benchmark the ability of graph neural networks to simultaneously capture local clustering environments and long-range correlations. Given the homogeneous and isotropic nature of the Universe, the data exhibits a high degree of symmetry. We therefore focus on evaluating the performance of Euclidean symmetry-preserving ($E(3)$-equivariant) graph neural networks, showing that they can outperform non-equivariant counterparts and domain-specific information extraction techniques in downstream performance as well as simulation-efficiency. However, we find that current architectures fail to capture information from long-range correlations as effectively as domain-specific baselines, motivating future work on architectures better suited for extracting long-range information.

equivariant neural networks graph neural networks symmetry preservation galaxy clustering benchmark cosmological simulation
Theoretical Physics Oct 23, 2024

Fermion Masses and Mixing in String-Inspired Models

Andrei Constantin, Kit Fraser-Taliente, Thomas R. Harvey et al.

We study a class of supersymmetric Froggatt-Nielsen (FN) models with multiple U(1) symmetries and Standard Model (SM) singlets inspired by heterotic string compactifications on Calabi-Yau threefolds. The string-theoretic origin imposes a particular charge pattern on the SM fields and FN singlets, dividing the latter into perturbative and non-perturbative types. Employing systematic and heuristic search strategies, such as genetic algorithms, we identify charge assignments and singlet VEVs that replicate the observed mass and mixing hierarchies in the quark sector, and subsequently refine the Yukawa matrix coefficients to accurately match the observed values for the Higgs VEV, the quark and charged lepton masses and the CKM matrix. This bottom-up approach complements top-down string constructions and our results demonstrate that string FN models possess a sufficiently rich structure to account for flavour physics. On the other hand, the limited number of distinct viable charge patterns identified here indicates that flavour physics imposes tight constraints on string theory models, adding new constraints on particle spectra that are essential for achieving a realistic phenomenology.

froggatt-nielsen mechanism string theory effective field theory flavour hierarchy calabi-yau compactification
Astrophysics Oct 22, 2024

Blast: a Web Application for Characterizing the Host Galaxies of Astrophysical Transients

D. O. Jones, P. McGill, T. A. Manning et al.

Characterizing the host galaxies of astrophysical transients is important to many areas of astrophysics, including constraining the progenitor systems of core-collapse supernovae, correcting Type Ia supernova distances, and probabilistically classifying transients without photometric or spectroscopic data. Given the increasing transient discovery rate in the coming years, there is substantial utility in providing public, transparent, reproducible, and automatic characterization for large samples of transient host galaxies. Here we present Blast, a web application that ingests live streams of transient alerts, matches transients to their host galaxies, and performs photometry on coincident archival imaging data of the host galaxy. The photometry is then used to infer both global host-galaxy properties and galaxy properties within 2 kpc of the transient location by using the Prospector Bayesian inference framework, with an acceleration in evaluation speed achieved via simulation-based inference. Blast provides host-galaxy properties to users via a web browser or an application program interface. The software can be extended to support alternative photometric or SED-fitting algorithms, and can be scaled via an asynchronous worker queue across multiple compute nodes to handle the processing of large volumes of transient alerts for upcoming transient surveys. Blast has been ingesting newly discovered transients from the Transient Name Server since mid-2024, and has currently measured SED parameters for more than 6000 transients. The service is publicly available at https://blast.scimma.org/.

transient host matching sed fitting simulation-based inference bayesian inference posterior estimation
Foundational AI Oct 18, 2024

Decomposing The Dark Matter of Sparse Autoencoders

Joshua Engels, Logan Riggs, Max Tegmark

Sparse autoencoders (SAEs) are a promising technique for decomposing language model activations into interpretable linear features. However, current SAEs fall short of completely explaining model performance, resulting in "dark matter": unexplained variance in activations. This work investigates dark matter as an object of study in its own right. Surprisingly, we find that much of SAE dark matter -- about half of the error vector itself and >90% of its norm -- can be linearly predicted from the initial activation vector. Additionally, we find that the scaling behavior of SAE error norms at a per token level is remarkably predictable: larger SAEs mostly struggle to reconstruct the same contexts as smaller SAEs. We build on the linear representation hypothesis to propose models of activations that might lead to these observations. These insights imply that the part of the SAE error vector that cannot be linearly predicted ("nonlinear" error) might be fundamentally different from the linearly predictable component. To validate this hypothesis, we empirically analyze nonlinear SAE error and show that 1) it contains fewer not yet learned features, 2) SAEs trained on it are quantitatively worse, and 3) it is responsible for a proportional amount of the downstream increase in cross entropy loss when SAE activations are inserted into the model. Finally, we examine two methods to reduce nonlinear SAE error: inference time gradient pursuit, which leads to a very slight decrease in nonlinear error, and linear transformations from earlier layer SAE outputs, which leads to a larger reduction.

autoencoders sae dark matter sparse models interpretability linear representation hypothesis
Experimental Physics Oct 17, 2024

Learning Efficient Representations of Neutrino Telescope Events

Felix J. Yu, Nicholas Kamp, Carlos A. Argüelles

Neutrino telescopes detect rare interactions of particles produced in some of the most extreme environments in the Universe. This is accomplished by instrumenting a cubic-kilometer scale volume of naturally occurring transparent medium with light sensors. Given their substantial size and the high frequency of background interactions, these telescopes amass an enormous quantity of large variance, high-dimensional data. These attributes create substantial challenges for analyzing and reconstructing interactions, particularly when utilizing machine learning (ML) techniques. In this paper, we present a novel approach, called om2vec, that employs transformer-based variational autoencoders to efficiently represent the detected photon arrival time distributions of neutrino telescope events by learning compact and descriptive latent representations. We demonstrate that these latent representations offer enhanced flexibility and improved computational efficiency, thereby facilitating downstream tasks in data analysis.

variational autoencoders representation learning photon arrival time encoding transformers neutrino detection
Theoretical Physics Oct 16, 2024

Hodge Theory for Entanglement Cohomology

Christian Ferko, Eashan Iyer, Kasra Mossayebi et al.

We explore and extend the application of homological algebra to describe quantum entanglement, initiated in arXiv:1901.02011, focusing on the Hodge-theoretic structure of entanglement cohomology in finite-dimensional quantum systems. We construct analogues of the Hodge star operator, inner product, codifferential, and Laplacian for entanglement $k$-forms. We also prove that such $k$-forms obey versions of the Hodge isomorphism theorem and Hodge decomposition, and that they exhibit Hodge duality. As a corollary, we conclude that the dimensions of the $k$-th and $(n-k)$-th cohomologies coincide for entanglement in $n$-partite pure states, which explains a symmetry property ("Poincare duality") of the associated Poincare polynomials.

entanglement entanglement cohomology hodge decomposition homological algebra poincaré duality
Astrophysics Oct 11, 2024

Auriga Streams II: orbital properties of tidally disrupting satellites of Milky Way-mass galaxies

Nora Shipp, Alexander H. Riley, Christine M. Simpson et al.

Galaxies like the Milky Way are surrounded by complex populations of satellites at all stages of tidal disruption. In this paper, we present a dynamical study of the disrupting satellite galaxies in the Auriga simulations that are orbiting 28 distinct Milky Way-mass hosts across three resolutions. We find that the satellite galaxy populations are highly disrupted. The majority of satellites that remain fully intact at present day were accreted recently without experiencing more than one pericentre ($n_{\rm peri} \lesssim 1$) and have large apocentres ($r_{\rm apo} \gtrsim 200$ kpc) and pericentres ($r_{\rm peri} \gtrsim 50$ kpc). The remaining satellites have experienced significant tidal disruption and, given full knowledge of the system, would be classified as stellar streams. We find stellar streams in Auriga across the range of pericentres and apocentres of the known Milky Way dwarf galaxy streams and, interestingly, overlapping significantly with the Milky Way intact satellite population. We find no significant change in satellite orbital distributions across resolution. However, we do see substantial halo-to-halo variance of $(r_\text{peri}, r_\text{apo})$ distributions across host galaxies, as well as a dependence of satellite orbits on host halo mass - systems disrupt at larger pericentres and apocentres in more massive hosts. Our results suggest that either cosmological simulations (including, but not limited to, Auriga) are disrupting satellites far too readily, or that the Milky Way's satellites are more disrupted than current imaging surveys have revealed. Future observing facilities and careful mock observations of these systems will be key to revealing the nature of this apparent discrepancy.

cosmological simulation tidal disruption stellar streams satellite orbital dynamics halo-to-halo variance
Foundational AI Oct 11, 2024

Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics

Josiah C. Kratz, Jacob Adamczyk

Many organisms and cell types, from bacteria to cancer cells, exhibit a remarkable ability to adapt to fluctuating environments. Additionally, cells can leverage a memory of past environments to better survive previously-encountered stressors. From a control perspective, this adaptability poses significant challenges in driving cell populations toward extinction, and thus poses an open question with great clinical significance. In this work, we focus on drug dosing in cell populations exhibiting phenotypic plasticity. For specific dynamical models switching between resistant and susceptible states, exact solutions are known. However, when the underlying system parameters are unknown, and for complex memory-based systems, obtaining the optimal solution is currently intractable. To address this challenge, we apply reinforcement learning (RL) to identify informed dosing strategies to control cell populations evolving under novel non-Markovian dynamics. We find that model-free deep RL is able to recover exact solutions and control cell populations even in the presence of long-range temporal dynamics. To further test our approach in more realistic settings, we demonstrate robust RL-based control strategies in environments with measurement noise and dynamic memory strength.

reinforcement learning non-markovian control phenotypic switching stochastic processes bang-bang control
Foundational AI Oct 10, 2024

Efficient Dictionary Learning with Switch Sparse Autoencoders

Anish Mudide, Joshua Engels, Eric J. Michaud et al.

Sparse autoencoders (SAEs) are a recent technique for decomposing neural network activations into human-interpretable features. However, in order for SAEs to identify all features represented in frontier models, it will be necessary to scale them up to very high width, posing a computational challenge. In this work, we introduce Switch Sparse Autoencoders, a novel SAE architecture aimed at reducing the compute cost of training SAEs. Inspired by sparse mixture of experts models, Switch SAEs route activation vectors between smaller "expert" SAEs, enabling SAEs to efficiently scale to many more features. We present experiments comparing Switch SAEs with other SAE architectures, and find that Switch SAEs deliver a substantial Pareto improvement in the reconstruction vs. sparsity frontier for a given fixed training compute budget. We also study the geometry of features across experts, analyze features duplicated across experts, and verify that Switch SAE features are as interpretable as features found by other SAE architectures.

autoencoders mixture of experts sparse models expert routing interpretability
Foundational AI Oct 10, 2024

The Geometry of Concepts: Sparse Autoencoder Feature Structure

Yuxiao Li, Eric J. Michaud, David D. Baek et al.

Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: 1) The "atomic" small-scale structure contains "crystals" whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man-woman-king-queen). We find that the quality of such parallelograms and associated function vectors improves greatly when projecting out global distractor directions such as word length, which is efficiently done with linear discriminant analysis. 2) The "brain" intermediate-scale structure has significant spatial modularity; for example, math and code features form a "lobe" akin to functional lobes seen in neural fMRI images. We quantify the spatial locality of these lobes with multiple metrics and find that clusters of co-occurring features, at coarse enough scale, also cluster together spatially far more than one would expect if feature geometry were random. 3) The "galaxy" scale large-scale structure of the feature point cloud is not isotropic, but instead has a power law of eigenvalues with steepest slope in middle layers. We also quantify how the clustering entropy depends on the layer.

autoencoders sparse models feature geometry representation learning interpretability
Theoretical Physics Oct 10, 2024

Dirac Traces and the Tutte Polynomial

Joshua Lin

Perturbative calculations involving fermion loops in quantum field theories require tracing over Dirac matrices. A simple way to regulate the divergences that generically appear in these calculations is dimensional regularisation, which has the consequence of replacing 4-dimensional Dirac matrices with d-dimensional counterparts for arbitrary complex values of d. In this work, a connection between traces of d-dimensional Dirac matrices and computations of the Tutte polynomial of associated graphs is proven. The time complexity of computing Dirac traces is analysed by this connection, and improvements to algorithms for computing Dirac traces are proposed.

tutte polynomial quantum field theory deletion-contraction dimensional regularisation scattering amplitudes
Foundational AI Oct 10, 2024

Investigating Representation Universality: Case Study on Genealogical Representations

David D. Baek, Yuxiao Li, Max Tegmark

Motivated by interpretability and reliability, we investigate whether large language models (LLMs) deploy universal geometric structures to encode discrete, graph-structured knowledge. To this end, we present two complementary experimental evidence that might support universality of graph representations. First, on an in-context genealogy Q&A task, we train a cone probe to isolate a tree-like subspace in residual stream activations and use activation patching to verify its causal effect in answering related questions. We validate our findings across five different models. Second, we conduct model stitching experiments across models of diverse architectures and parameter counts (OPT, Pythia, Mistral, and LLaMA, 410 million to 8 billion parameters), quantifying representational alignment via relative degradation in the next-token prediction loss. Generally, we conclude that the lack of ground truth representations of graphs makes it challenging to study how LLMs represent them. Ultimately, improving our understanding of LLM representations could facilitate the development of more interpretable, robust, and controllable AI systems.

representation learning embeddings cone embedding interpretability activation patching
Foundational AI Oct 8, 2024

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Tianyuan Zhang, Zhengfei Kuang, Haian Jin et al.

We propose RelitLRM, a Large Reconstruction Model (LRM) for generating high-quality Gaussian splatting representations of 3D objects under novel illuminations from sparse (4-8) posed images captured under unknown static lighting. Unlike prior inverse rendering methods requiring dense captures and slow optimization, often causing artifacts like incorrect highlights or shadow baking, RelitLRM adopts a feed-forward transformer-based model with a novel combination of a geometry reconstructor and a relightable appearance generator based on diffusion. The model is trained end-to-end on synthetic multi-view renderings of objects under varying known illuminations. This architecture design enables to effectively decompose geometry and appearance, resolve the ambiguity between material and lighting, and capture the multi-modal distribution of shadows and specularity in the relit appearance. We show our sparse-view feed-forward RelitLRM offers competitive relighting results to state-of-the-art dense-view optimization-based baselines while being significantly faster. Our project page is available at: https://relit-lrm.github.io/.

transformers diffusion models 3d gaussian splatting disentangled representations generative models
Foundational AI Oct 7, 2024

Transformers are Efficient Compilers, Provably

Xiyu Zhai, Runlong Zhou, Liao Zhang et al.

Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks, including programming language understanding and generation. In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective. To this end, we introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages. We show that if the input code sequence has a bounded depth in both the Abstract Syntax Tree (AST) and type inference (reasonable assumptions based on the clean code principle), then the number of parameters required by transformers depends only on the logarithm of the input sequence length to handle compilation tasks, such as AST construction, symbol resolution, and type analysis. A significant technical challenge stems from the fact that transformers operate at a low level, where each layer processes the input sequence as raw vectors without explicitly associating them with predefined structure or meaning. In contrast, high-level compiler tasks necessitate managing intricate relationships and structured program information. Our primary technical contribution is the development of a domain-specific language, Cybertron, which generates formal proofs of the transformer's expressive power, scaling to address compiler tasks. We further establish that recurrent neural networks (RNNs) require at least a linear number of parameters relative to the input sequence, leading to an exponential separation between transformers and RNNs. Finally, we empirically validate our theoretical results by comparing transformers and RNNs on compiler tasks within Mini-Husky.

transformers expressive power theory domain-specific language for proofs formal expressivity bounds attention mechanisms
Theoretical Physics Oct 7, 2024

SPECTER: Efficient Evaluation of the Spectral EMD

Rikab Gambhir, Andrew J. Larkoski, Jesse Thaler

The Energy Mover's Distance (EMD) has seen use in collider physics as a metric between events and as a geometric method of defining infrared and collinear safe observables. Recently, the Spectral Energy Mover's Distance (SEMD) has been proposed as a more analytically tractable alternative to the EMD. In this work, we obtain a closed-form expression for the Riemannian-like p = 2 SEMD metric between events, eliminating the need to numerically solve an optimal transport problem. Additionally, we show how the SEMD can be used to define event and jet shape observables by minimizing the distance between events and parameterized energy flows (similar to the EMD), and we obtain closed-form expressions for several of these observables. We also present the SPECTER framework, an efficient and highly parallelized implementation of the SEMD metric and SEMD-derived shape observables as an analogue of the previously-introduced SHAPER for EMD-based computations. We demonstrate that computing the SEMD with SPECTER can be up to a thousand times faster than computing the EMD with standard optimal transport libraries.

optimal transport collider physics spectral event metric jet physics spectral methods
Theoretical Physics Oct 4, 2024

Exploring gauge-fixing conditions with gradient-based optimization

William Detmold, Gurtej Kanwar, Yin Lin et al.

Lattice gauge fixing is required to compute gauge-variant quantities, for example those used in RI-MOM renormalization schemes or as objects of comparison for model calculations. Recently, gauge-variant quantities have also been found to be more amenable to signal-to-noise optimization using contour deformations. These applications motivate systematic parameterization and exploration of gauge-fixing schemes. This work introduces a differentiable parameterization of gauge fixing which is broad enough to cover Landau gauge, Coulomb gauge, and maximal tree gauges. The adjoint state method allows gradient-based optimization to select gauge-fixing schemes that minimize an arbitrary target loss function.

lattice gauge theory adjoint state method differentiable gauge fixing loss function design renormalization
Foundational AI Oct 3, 2024

Formation of Representations in Neural Networks

Liu Ziyin, Isaac Chuang, Tomer Galanti et al.

Understanding neural representations will help open the black box of neural networks and advance our scientific understanding of modern AI systems. However, how complex, structured, and transferable representations emerge in modern neural networks has remained a mystery. Building on previous results, we propose the Canonical Representation Hypothesis (CRH), which posits a set of six alignment relations to universally govern the formation of representations in most hidden layers of a neural network. Under the CRH, the latent representations (R), weights (W), and neuron gradients (G) become mutually aligned during training. This alignment implies that neural networks naturally learn compact representations, where neurons and weights are invariant to task-irrelevant transformations. We then show that the breaking of CRH leads to the emergence of reciprocal power-law relations between R, W, and G, which we refer to as the Polynomial Alignment Hypothesis (PAH). We present a minimal-assumption theory proving that the balance between gradient noise and regularization is crucial for the emergence of the canonical representation. The CRH and PAH lead to an exciting possibility of unifying major key deep learning phenomena, including neural collapse and the neural feature ansatz, in a single framework.

representation learning canonical representation hypothesis polynomial alignment hypothesis interpretability feature extraction
Theoretical Physics Sep 27, 2024

${T\overline{T}}$-like Flows of Yang-Mills Theories

Christian Ferko, Jue Hou, Tommaso Morone et al.

We study ${T\overline{T}}$-like deformations of $d>2$ Yang-Mills theories. The standard ${T\overline{T}}$ flows lead to multi-trace Lagrangians, and the non-Abelian gauge structures make it challenging to find Lagrangians in a closed form. However, within the geometric approach to ${T\overline{T}}$, we obtain the closed-form solution to the metric flow and stress-energy tensor, and show that instanton solutions are undeformed. We also introduce new symmetrised single-trace ${T\overline{T}}$-like deformations, whose solutions in $d=4$ include the non-Abelian Born-Infeld Lagrangian proposed by Tseytlin in 1997.

tt-bar deformation quantum field theory lagrangian methods born-infeld theory group theory
Experimental Physics Sep 26, 2024

Optimal Quantum Purity Amplification

Zhaoyi Li, Honghao Fu, Takuya Isogawa et al.

Quantum purity amplification (QPA) provides a novel approach to counteracting the pervasive noise that degrades quantum states. We present the optimal QPA protocol for general quantum systems and global noise, resolving a two-decade open problem. Under strong depolarization, our protocol achieves an exponential reduction in sample complexity over the best-known methods. We provide an efficient implementation of the protocol based on generalized quantum phase estimation. Additionally, we introduce SWAPNET, a sparse and shallow circuit that enables QPA for near-term experiments. Simulations in both digital and analog quantum settings, along with experiments on superconducting quantum processors, confirm the protocol's robustness and practical utility. Our findings suggest that QPA could improve the performance of quantum information processing tasks, particularly in the context of Noisy Intermediate-Scale Quantum (NISQ) devices, where reducing the effect of noise with limited resources is critical.

quantum computing quantum states quantum purity amplification depolarizing noise sample complexity
Foundational AI Sep 24, 2024

Seeing Faces in Things: A Model and Dataset for Pareidolia

Mark Hamilton, Simon Stent, Vasha DuTell et al.

The human visual system is well-tuned to detect faces of all shapes and sizes. While this brings obvious survival advantages, such as a better chance of spotting unknown predators in the bush, it also leads to spurious face detections. ``Face pareidolia'' describes the perception of face-like structure among otherwise random stimuli: seeing faces in coffee stains or clouds in the sky. In this paper, we study face pareidolia from a computer vision perspective. We present an image dataset of ``Faces in Things'', consisting of five thousand web images with human-annotated pareidolic faces. Using this dataset, we examine the extent to which a state-of-the-art human face detector exhibits pareidolia, and find a significant behavioral gap between humans and machines. We find that the evolutionary need for humans to detect animal faces, as well as human faces, may explain some of this gap. Finally, we propose a simple statistical model of pareidolia in images. Through studies on human subjects and our pareidolic face detectors we confirm a key prediction of our model regarding what image conditions are most likely to induce pareidolia. Dataset and Website: https://aka.ms/faces-in-things

face pareidolia human-machine perceptual gap convolutional networks fine-tuning transfer learning
Astrophysics Sep 20, 2024

StreamGen: Connecting Populations of Streams and Shells to Their Host Galaxies

Adriana Dropulic, Nora Shipp, Stacy Kim et al.

In this work, we study how the abundance and dynamics of populations of disrupting satellite galaxies change systematically as a function of host galaxy properties. We apply a theoretical model of the phase-mixing process to classify intact satellite galaxies, stellar stream-like and shell-like debris in ~1500 Milky Way-mass systems generated by a semi-analytic galaxy formation code, SatGen. In particular, we test the effect of host galaxy halo mass, disk mass, ratio of disk scale height to length, and stellar feedback model on disrupting satellite populations. We find that the counts of tidal debris are consistent across all host galaxy models, within a given host mass range, and that all models can have stream-like debris on low-energy orbits, consistent with those observed around the Milky Way. However, we find a preference for stream-like debris on lower-energy orbits in models with a thicker (lower-density) host disk or on higher-energy orbits in models with a more-massive host disk. Importantly, we observe significant halo-to-halo variance across all models. These results highlight the importance of simulating and observing large samples of Milky Way-mass galaxies and accounting for variations in host properties when using disrupting satellites in studies of near-field cosmology.

cosmological simulation stellar tidal streams galaxy classification phase-space mixing dark matter
Theoretical Physics Sep 20, 2024

A Field Guide to Event-Shape Observables Using Optimal Transport

Cari Cesarotti, Matt LeBlanc

We lay out the phenomenological behavior of event-shape observables evaluated by solving optimal transport problems between collider events and reference geometries -- which we name 'manifold distances' -- to provide guidance regarding their use in future studies. This discussion considers several choices related to the metric used to quantify these distances. We explore the differences between the various options, using a combination of analytical studies and simulated minimum-bias and multi-jet events. Making judicious choices when defining the metric and reference geometry can improve sensitivity to interesting signal features and reduce sensitivity to non-perturbative effects in QCD. The goal of this article is to provide a 'field guide' that can inform how choices made when defining a manifold distance can be tailored for the analysis at-hand.

optimal transport manifold distances collider physics energy mover's distance jet physics
Theoretical Physics Sep 18, 2024

Conformal Fields from Neural Networks

James Halverson, Joydeep Naskar, Jiahua Tian

We use the embedding formalism to construct conformal fields in $D$ dimensions, by restricting Lorentz-invariant ensembles of homogeneous neural networks in $(D+2)$ dimensions to the projective null cone. Conformal correlators may be computed using the parameter space description of the neural network. Exact four-point correlators are computed in a number of examples, and we perform a 4D conformal block decomposition that elucidates the spectrum. In some examples the analysis is facilitated by recent approaches to Feynman integrals. Generalized free CFTs are constructed using the infinite-width Gaussian process limit of the neural network, enabling a realization of the free boson. The extension to deep networks constructs conformal fields at each subsequent layer, with recursion relations relating their conformal dimensions and four-point functions. Numerical approaches are discussed.

conformal field theory neural network field theory quantum field theory symmetry preservation conformal block decomposition
Astrophysics Sep 16, 2024

Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly

Mikhail M. Ivanov, Andrej Obuljen, Carolina Cuesta-Lazaro et al.

We explore full-shape analysis with simulation-based priors, which is the simplest approach to galaxy clustering data analysis that combines effective field theory (EFT) on large scales and numerical simulations on small scales. The core ingredient of our approach is the prior density of EFT parameters which we extract from a suite of 10500 galaxy simulations based on the halo occupation distribution (HOD) model. We measure the EFT parameters with the field-level forward model, which enables us to cancel cosmic variance. On the theory side, we develop a new efficient approach to calculate field-level transfer functions using time-sliced perturbation theory and the logarithmic fast Fourier transform. We find that the cosmology dependence of EFT parameters of galaxies is approximately degenerate with the HOD parameters, and hence it can be ignored for the purpose of prior generation. We use neural density estimation to model the measured distribution of EFT parameters. Our distribution model is then used as a prior in a reanalysis of the BOSS full-shape galaxy power spectrum data. Assuming the $Λ$CDM model, we find significant ($\approx 30\%$ and $\approx 60\%$) improvements for the matter density fraction and the mass fluctuation amplitude, which are constrained to $Ω_{m}= 0.315 \pm 0.010$ and $σ_8 = 0.671 \pm 0.027$. The value of the Hubble constant does not change, $H_0= 68.7\pm 1.1$~km/s/Mpc. This reaffirms earlier reports of the structure growth tension from the BOSS data. Finally, we use the measured EFT parameters to constrain the galaxy-dark matter connection.

simulation-based inference effective field theory cosmological simulation density estimation bayesian inference
Astrophysics Sep 16, 2024

Unveiling the Diversity of Type IIn Supernovae via Systematic Light Curve Modeling

C. L. Ransome, V. A. Villar

Type IIn supernovae (SNeIIn) are a highly heterogeneous subclass of core-collapse supernovae, spectroscopically characterized by signatures of interaction with a dense circumstellar medium (CSM). Here we systematically model the light curves of 142 archival SNeIIn using MOSFiT (the Modular Open Source Fitter for Transients). We find that the observed and inferred properties of SNIIn are diverse, but there are some trends. The typical SN CSM is dense ($\sim$10$^{-12}$gcm$^{-3}$) with highly diverse CSM geometry, with a median CSM mass of $\sim$1M$_\odot$. The ejecta are typically massive ($\gtrsim10$M$_\odot$), suggesting massive progenitor systems. We find positive correlations between the CSM mass and the rise and fall times of SNeIIn. Furthermore there are positive correlations between the rise time and fall times and the $r$-band luminosity. We estimate the mass-loss rates of our sample (where spectroscopy is available) and find a high median mass-loss rate of $\sim$10$^{-2}$M$_\odot$yr$^{-1}$, with a range between 10$^{-4}$--1M$_\odot$yr$^{-1}$. These mass-loss rates are most similar to the mass loss from great eruptions of luminous blue variables, consistent with the direct progenitor detections in the literature. We also discuss the role that binary interactions may play, concluding that at least some of our SNeIIn may be from massive binary systems. Finally, we estimate a detection rate of 1.6$\times$10$^5$yr$^{-1}$ in the upcoming Legacy Survey of Space and Time at the Vera C. Rubin Observatory.

supernova classification csm interaction modeling mass-loss rate inference stellar evolution bayesian inference
Theoretical Physics Sep 9, 2024

Auxiliary Field Deformations of (Semi-)Symmetric Space Sigma Models

Daniele Bielli, Christian Ferko, Liam Smith et al.

We generalize the auxiliary field deformations of the principal chiral model (PCM) introduced in arXiv:2405.05899 and arXiv:2407.16338 to sigma models whose target manifolds are symmetric or semi-symmetric spaces, including a Wess-Zumino term in the latter case. This gives rise to a new infinite family of classically integrable $\mathbb{Z}_2$ and $\mathbb{Z}_4$ coset models of the form which are of interest in applications of integrability to worldsheet string theory and holography. We demonstrate that every theory in this infinite class admits a zero-curvature representation for its equations of motion by exhibiting a Lax connection.

integrable deformations lax connection string theory wess-zumino term quantum field theory
Astrophysics Sep 4, 2024

How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

Tri Nguyen, Francisco Villaescusa-Navarro, Siddharth Mishra-Sharma et al.

The connection between galaxies and their host dark matter (DM) halos is critical to our understanding of cosmology, galaxy formation, and DM physics. To maximize the return of upcoming cosmological surveys, we need an accurate way to model this complex relationship. Many techniques have been developed to model this connection, from Halo Occupation Distribution (HOD) to empirical and semi-analytic models to hydrodynamic. Hydrodynamic simulations can incorporate more detailed astrophysical processes but are computationally expensive; HODs, on the other hand, are computationally cheap but have limited accuracy. In this work, we present NeHOD, a generative framework based on variational diffusion model and Transformer, for painting galaxies/subhalos on top of DM with an accuracy of hydrodynamic simulations but at a computational cost similar to HOD. By modeling galaxies/subhalos as point clouds, instead of binning or voxelization, we can resolve small spatial scales down to the resolution of the simulations. For each halo, NeHOD predicts the positions, velocities, masses, and concentrations of its central and satellite galaxies. We train NeHOD on the TNG-Warm DM suite of the DREAMS project, which consists of 1024 high-resolution zoom-in hydrodynamic simulations of Milky Way-mass halos with varying warm DM mass and astrophysical parameters. We show that our model captures the complex relationships between subhalo properties as a function of the simulation parameters, including the mass functions, stellar-halo mass relations, concentration-mass relations, and spatial clustering. Our method can be used for a large variety of downstream applications, from galaxy clustering to strong lensing studies.

diffusion models dark matter emulation galaxy-halo connection generative models
Experimental Physics Sep 3, 2024

Double "acct": a distinct double-peaked supernova matching pulsational pair-instability models

C. R. Angus, S. E. Woosley, R. J. Foley et al.

We present multi-wavelength data of SN2020acct, a double-peaked stripped-envelope supernova (SN) in NGC2981 at ~150 Mpc. The two peaks are temporally distinct, with maxima separated by 58 rest-frame days, and a factor of 20 reduction in flux between. The first is luminous (M$_{r}$ = -18.00 $\pm$ 0.02 mag), blue (g - r = 0.27 $\pm$ 0.03 mag), and displays spectroscopic signatures of interaction with hydrogen-free circumstellar material. The second peak is fainter (M$_{r}$ = -17.29 $\pm$ 0.03 mag), and spectroscopically similar to an evolved stripped-envelope SNe, with strong blended forbidden [Ca II] and [O II] features. No other known double-peak SN exhibits a light curve similar to that of SN 2020acct. We find the likelihood of two individual SNe occurring in the same star-forming region within that time to be highly improbable, while an implausibly fine-tuned configuration would be required to produce two SNe from a single binary system. We find that the peculiar properties of SN2020acct match models of pulsational pair instability (PPI), in which the initial peak is produced by collisions of shells of ejected material, shortly followed by a terminal explosion. Pulsations from a star with a 72 M$_{\odot}$ helium core provide an excellent match to the double-peaked light curve. The local galactic environment has a metallicity of 0.4 Z$_{\odot}$, a level where massive single stars are not expected retain enough mass to encounter the PPI. However, late binary mergers or a low-metallicity pocket may allow the required core mass. We measure the rate of SN 2020acct-like events to be $<3.3\times10^{-8}$ Mpc$^{-3}$ yr$^{-1}$ at z = 0.07, or <0.1% of the total core-collapse SN rate.

pulsational pair instability stellar evolution light curve modeling supernova classification circumstellar interaction
Experimental Physics Sep 2, 2024

SN 2021foa: The "Flip-Flop" Type IIn / Ibn supernova

D. Farias, C. Gall, G. Narayan et al.

We present a comprehensive analysis of the photometric and spectroscopic evolution of SN~2021foa, unique among the class of transitional supernovae for repeatedly changing its spectroscopic appearance from hydrogen-to-helium-to-hydrogen-dominated (IIn-to-Ibn-to-IIn) within 50 days past peak brightness. The spectra exhibit multiple narrow ($\approx$ 300--600~km~s$^{-1}$) absorption lines of hydrogen, helium, calcium and iron together with broad helium emission lines with a full-width-at-half-maximum (FWHM) of $\sim 6000$~km~s$^{-1}$. For a steady, wind-mass loss regime, light curve modeling results in an ejecta mass of $\sim 8$ M$_{\odot}$ and CSM mass below 1 M$_{\odot}$, and an ejecta velocity consistent with the FWHM of the broad helium lines. We obtain a mass-loss rate of $\approx 2$ M$_{\odot} {\rm yr}^{-1}$. This mass-loss rate is three orders of magnitude larger than derived for normal Type II SNe. We estimate that the bulk of the CSM of SN~2021foa must have been expelled within half a year, about 15 years ago. Our analysis suggests that SN~2021foa had a helium rich ejecta which swept up a dense shell of hydrogen rich CSM shortly after explosion. At about 60 days past peak brightness, the photosphere recedes through the dense ejecta-CSM region, occulting much of the red-shifted emission of the hydrogen and helium lines, which results in observed blue-shift ($\sim -3000$~km~s$^{-1}$). Strong mass loss activity prior to explosion, such as those seen in SN~2009ip-like objects and SN~2021foa as precursor emission, are the likely origin of a complex, multiple-shell CSM close to the progenitor star.

supernova classification transitional supernovae circumstellar interaction pre-explosion mass loss stellar evolution
Astrophysics Aug 29, 2024

Maven: A Multimodal Foundation Model for Supernova Science

Gemma Zhang, Thomas Helfer, Alexander T. Gagliano et al.

A common setting in astronomy is the availability of a small number of high-quality observations, and larger amounts of either lower-quality observations or synthetic data from simplified models. Time-domain astrophysics is a canonical example of this imbalance, with the number of supernovae observed photometrically outpacing the number observed spectroscopically by multiple orders of magnitude. At the same time, no data-driven models exist to understand these photometric and spectroscopic observables in a common context. Contrastive learning objectives, which have grown in popularity for aligning distinct data modalities in a shared embedding space, provide a potential solution to extract information from these modalities. We present Maven, the first foundation model for supernova science. To construct Maven, we first pre-train our model to align photometry and spectroscopy from 0.5M synthetic supernovae using a constrastive objective. We then fine-tune the model on 4,702 observed supernovae from the Zwicky Transient Facility. Maven reaches state-of-the-art performance on both classification and redshift estimation, despite the embeddings not being explicitly optimized for these tasks. Through ablation studies, we show that pre-training with synthetic data improves overall performance. In the upcoming era of the Vera C. Rubin Observatory, Maven serves as a Rosetta Stone for leveraging large, unlabeled and multimodal time-domain datasets.

contrastive learning multimodal foundation model representation learning synthetic pre-training supernova classification
Foundational AI Aug 28, 2024

Remove Symmetries to Control Model Expressivity and Improve Optimization

Liu Ziyin, Yizhou Xu, Isaac Chuang

When symmetry is present in the loss function, the model is likely to be trapped in a low-capacity state that is sometimes known as a "collapse". Being trapped in these low-capacity states can be a major obstacle to training across many scenarios where deep learning technology is applied. We first prove two concrete mechanisms through which symmetries lead to reduced capacities and ignored features during training and inference. We then propose a simple and theoretically justified algorithm, syre, to remove almost all symmetry-induced low-capacity states in neural networks. When this type of entrapment is especially a concern, removing symmetries with the proposed method is shown to correlate well with improved optimization or performance. A remarkable merit of the proposed method is that it is model-agnostic and does not require any knowledge of the symmetry.

symmetry removal group theory symmetry breaking capacity collapse loss function design
Foundational AI Aug 27, 2024

Low-Budget Simulation-Based Inference with Bayesian Neural Networks

Arnaud Delaunoy, Maxence de la Brassinne Bonardeaux, Siddharth Mishra-Sharma et al.

Simulation-based inference methods have been shown to be inaccurate in the data-poor regime, when training simulations are limited or expensive. Under these circumstances, the inference network is particularly prone to overfitting, and using it without accounting for the computational uncertainty arising from the lack of identifiability of the network weights can lead to unreliable results. To address this issue, we propose using Bayesian neural networks in low-budget simulation-based inference, thereby explicitly accounting for the computational uncertainty of the posterior approximation. We design a family of Bayesian neural network priors that are tailored for inference and show that they lead to well-calibrated posteriors on tested benchmarks, even when as few as $O(10)$ simulations are available. This opens up the possibility of performing reliable simulation-based inference using very expensive simulators, as we demonstrate on a problem from the field of cosmology where single simulations are computationally expensive. We show that Bayesian neural networks produce informative and well-calibrated posterior estimates with only a few hundred simulations.

simulation-based inference bayesian inference uncertainty quantification calibration posterior estimation
Astrophysics Aug 26, 2024

LINX: A Fast, Differentiable, and Extensible Big Bang Nucleosynthesis Package

Cara Giovanetti, Mariangela Lisanti, Hongwan Liu et al.

We introduce LINX (Light Isotope Nucleosynthesis with JAX), a new differentiable public Big Bang Nucleosynthesis (BBN) code designed for fast parameter estimation. By leveraging JAX, LINX achieves both speed and differentiability, enabling the use of Bayesian inference, including gradient-based methods. We discuss the formalism used in LINX for rapid primordial elemental abundance predictions and give examples of how LINX can be used. When combined with differentiable Cosmic Microwave Background (CMB) power spectrum emulators, LINX can be used for joint CMB and BBN analyses without requiring extensive computational resources, including on personal hardware.

primordial nucleosynthesis differentiable bbn bayesian inference cosmic microwave background monte carlo methods
Astrophysics Aug 26, 2024

Cosmological Parameter Estimation with a Joint-Likelihood Analysis of the Cosmic Microwave Background and Big Bang Nucleosynthesis

Cara Giovanetti, Mariangela Lisanti, Hongwan Liu et al.

We present the first joint-likelihood analysis of Big Bang Nucleosynthesis (BBN) and Cosmic Microwave Background (CMB) data. Bayesian inference is performed on the baryon abundance and the effective number of neutrino species, $N_{\rm eff}$, using a CMB Boltzmann solver in combination with LINX, a new flexible and efficient BBN code. We marginalize over Planck nuisance parameters and nuclear rates to find $N_{\rm{eff}} = 3.08_{-0.13}^{+0.13},\,2.94 _{-0.16}^{+0.16},$ or $2.98_{-0.13}^{+0.14}$, for three separate reaction networks. This framework enables robust testing of the Lambda Cold Dark Matter paradigm and its variants with CMB and BBN data.

cosmic microwave background bayesian inference big bang nucleosynthesis joint likelihood analysis monte carlo methods
Experimental Physics Aug 23, 2024

Finding the Fuse: Prospects for the Detection and Characterization of Hydrogen-Rich Core-Collapse Supernova Precursor Emission with the LSST

A. Gagliano, E. Berger, V. A. Villar et al.

Enhanced emission in the months to years preceding explosion has been detected for several core-collapse supernovae (SNe). Though the physical mechanisms driving the emission remain hotly debated, the light curves of detected events show long-lived ($\geq$50 days), plateau-like behavior, suggesting hydrogen recombination may significantly contribute to the total energy budget. The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will provide a decade-long photometric baseline to search for this emission, both in binned pre-explosion observations after an SN is detected and in single-visit observations prior to the SN explosion. In anticipation of these searches, we simulate a range of eruptive precursor models to core-collapse SNe and forecast the discovery rates of these phenomena in LSST data. We find a detection rate of ~40-130 yr$^{-1}$ for SN IIP/IIL precursors and ~110 yr$^{-1}$ for SN IIn precursors in single-epoch photometry. Considering the first three years of observations with the effects of rolling and observing triplets included, this number grows to a total of 150-400 in binned photometry, with the highest number recovered when binning in 100-day bins for 2020tlf-like precursors and in 20-day bins for other recombination-driven models from the literature. We quantify the impact of using templates contaminated by residual light (from either long-lived or separate precursor emission) on these detection rates, and explore strategies for estimating baseline flux to mitigate these issues. Spectroscopic follow-up of the eruptions preceding core-collapse SNe and detected with LSST will offer important clues to the underlying drivers of terminal-stage mass loss in massive stars.

precursor emission circumstellar material signal detection stellar evolution monte carlo methods
Experimental Physics Aug 22, 2024

Multiple testing for signal-agnostic searches of new physics with machine learning

Gaia Grosso, Marco Letizia

In this work, we address the question of how to enhance signal-agnostic searches by leveraging multiple testing strategies. Specifically, we consider hypothesis tests relying on machine learning, where model selection can introduce a bias towards specific families of new physics signals. We show that it is beneficial to combine different tests, characterised by distinct choices of hyperparameters, and that performances comparable to the best available test are generally achieved while providing a more uniform response to various types of anomalies. Focusing on the New Physics Learning Machine, a methodology to perform a signal-agnostic likelihood-ratio test, we explore a number of approaches to multiple testing, such as combining p-values and aggregating test statistics.

hypothesis testing multiple testing likelihood ratio new physics searches anomaly detection
Foundational AI Aug 19, 2024

KAN 2.0: Kolmogorov-Arnold Networks Meet Science

Ziming Liu, Pingchuan Ma, Yixuan Wang et al.

A major challenge of AI + Science lies in their inherent incompatibility: today's AI is primarily based on connectionism, while science depends on symbolism. To bridge the two worlds, we propose a framework to seamlessly synergize Kolmogorov-Arnold Networks (KANs) and science. The framework highlights KANs' usage for three aspects of scientific discovery: identifying relevant features, revealing modular structures, and discovering symbolic formulas. The synergy is bidirectional: science to KAN (incorporating scientific knowledge into KANs), and KAN to science (extracting scientific insights from KANs). We highlight major new functionalities in the pykan package: (1) MultKAN: KANs with multiplication nodes. (2) kanpiler: a KAN compiler that compiles symbolic formulas into KANs. (3) tree converter: convert KANs (or any neural networks) to tree graphs. Based on these tools, we demonstrate KANs' capability to discover various types of physical laws, including conserved quantities, Lagrangians, symmetries, and constitutive laws.

kolmogorov-arnold networks interpretability symbolic regression conservation laws lagrangian methods
Experimental Physics Aug 16, 2024

Enhancing Events in Neutrino Telescopes through Deep Learning-Driven Super-Resolution

Felix J. Yu, Nicholas Kamp, Carlos A. Argüelles

Recent discoveries by neutrino telescopes, such as the IceCube Neutrino Observatory, relied extensively on machine learning (ML) tools to infer physical quantities from the raw photon hits detected. Neutrino telescope reconstruction algorithms are limited by the sparse sampling of photons by the optical modules due to the relatively large spacing ($10-100\,{\rm m})$ between them. In this letter, we propose a novel technique that learns photon transport through the detector medium through the use of deep learning-driven super-resolution of data events. These ``improved'' events can then be reconstructed using traditional or ML techniques, resulting in improved resolution. Our strategy arranges additional ``virtual'' optical modules within an existing detector geometry and trains a convolutional neural network to predict the hits on these virtual optical modules. We show that this technique improves the angular reconstruction of muons in a generic ice-based neutrino telescope. Our results readily extend to water-based neutrino telescopes and other event morphologies.

superresolution neutrino detection convolutional networks event reconstruction variational autoencoders
Astrophysics Aug 13, 2024

Improving Radial Velocities by Marginalizing over Stars and Sky: Achieving 30 m/s RV Precision for APOGEE in the Plate Era

Andrew K. Saydjari, Douglas P. Finkbeiner, Adam J. Wheeler et al.

The radial velocity catalog from the Apache Point Observatory Galactic Evolution Experiment (APOGEE) is unique in its simultaneously large volume and high precision as a result of its decade-long survey duration, multiplexing (600 fibers), and spectral resolution of $R \sim 22,500$. However, previous data reductions of APOGEE have not fully realized the potential radial velocity (RV) precision of the instrument. Here we present an RV catalog based on a new reduction of all 2.6 million visits of APOGEE DR17 and validate it against improved estimates for the theoretical RV performance. The core ideas of the new reduction are the simultaneous modeling of all components in the spectra, rather than a separate subtraction of point estimates for the sky, and a marginalization over stellar types, rather than a grid search for an optimum. We show that this catalog, when restricted to RVs measured with the same fiber, achieves noise-limited precision down to 30 m/s and delivers well-calibrated uncertainties. We also introduce a general method for calibrating fiber-to-fiber constant RV offsets and demonstrate its importance for high RV precision work in multi-fiber spectrographs. After calibration, we achieve 47 m/s RV precision on the combined catalog with RVs measured with different fibers. This degradation in precision relative to measurements with only a single fiber suggests that refining line spread function models should be a focus in SDSS-V to improve the fiber-unified RV catalog.

bayesian inference spectral component separation calibration uncertainty quantification fiber systematics calibration
Theoretical Physics Aug 8, 2024

Learning the Simplicity of Scattering Amplitudes

Clifford Cheung, Aurélien Dersy, Matthew D. Schwartz

The simplification and reorganization of complex expressions lies at the core of scientific progress, particularly in theoretical high-energy physics. This work explores the application of machine learning to a particular facet of this challenge: the task of simplifying scattering amplitudes expressed in terms of spinor-helicity variables. We demonstrate that an encoder-decoder transformer architecture achieves impressive simplification capabilities for expressions composed of handfuls of terms. Lengthier expressions are implemented in an additional embedding network, trained using contrastive learning, which isolates subexpressions that are more likely to simplify. The resulting framework is capable of reducing expressions with hundreds of terms - a regular occurrence in quantum field theory calculations - to vastly simpler equivalent expressions. Starting from lengthy input expressions, our networks can generate the Parke-Taylor formula for five-point gluon scattering, as well as new compact expressions for five-point amplitudes involving scalars and gravitons. An interactive demonstration can be found at https://spinorhelicity.streamlit.app .

scattering amplitudes spinor-helicity formalism transformers symbolic expression simplification contrastive learning
Theoretical Physics Aug 8, 2024

Finite Temperature at Finite Places

An Huang, Christian Baadsgaard Jepsen

This paper studies AdS/CFT in its $p$-adic version (at the ``finite place") in the setting where the bulk geometry is made up of the Tate curve, a discrete quotient of the Bruhat-Tits tree. Generalizing a classic result due to Zabrodin, the boundary dual of the free massive bulk theory is explicitly derived. Introducing perturbative interactions, the Wittens diagrams for the two-point and three-point correlators are computed for generic scaling dimensions at one-loop and tree level respectively. The answers obtained demonstrate how $p$-adic AdS/CFT on the Tate curve provides a useful toy model for real CFTs at finite temperature.

conformal field theory p-adic ads/cft tate curve quantum field theory bruhat-tits tree
Astrophysics Aug 2, 2024

AT2023vto: An Exceptionally Luminous Helium Tidal Disruption Event from a Massive Star

Harsh Kumar, Edo Berger, Daichi Hiramatsu et al.

We present optical/UV observations and the spectroscopic classification of the transient AT2023vto as a tidal disruption event (TDE) at z = 0.4846. The spectrum is dominated by a broad He II $λ$4686 emission line, with a width of ~ $3.76 \times 10^4$ km/s and a blueshift of ~ $1.05 \times 10^4$ km/s, classifying it as a member of the TDE-He class. The light curve exhibits a long rise and decline timescale, with a large peak absolute magnitude of M$_g$ ~ -23.6, making it the most luminous of the classical optical TDEs (H, H+He, He) discovered to date by about 2 mag (and ~ 4 mag compared to the mean of the population). The light curve exhibits a persistent blue color of g - r ~ -0.4 mag throughout its evolution, similar to other TDEs, but distinct from supernovae. We identify the host galaxy of AT2023vto in archival Pan-STARRS images and find that the transient is located at the galaxy center, and that its inferred central black hole mass is ~ $10^7~M_{\odot}$. Modeling the light curves of AT2023vto, we find that it resulted from the disruption of a ~ 9 $M_{\odot}$ star by a ~$10^7~M_{\odot}$ supermassive black hole. The star mass is about 5 times larger than the highest star masses previously inferred in TDEs, and the black hole mass is at the high end of the distribution. AT2023vto is comparable in luminosity and timescale to some putative TDEs (with a blue featureless continuum), as well as to the mean of the recently identified population of ambiguous nuclear transients (ANTs), although the latter are spectroscopically distinct and tend to have longer timescales. ANTs have been speculated to arise from tidal disruptions of massive stars, perhaps in active galactic nuclei, and AT2023vto may represent a similar case but in a dormant black hole, thereby bridging the TDE and ANT populations. We anticipate that Rubin Observatory / LSST will uncover similar luminous TDEs to z ~ 3.

tidal disruption event stellar evolution accretion disk physics signal detection bayesian inference
Astrophysics Aug 1, 2024

On the Generality and Persistence of Cosmological Stasis

James Halverson, Sneh Pandya

Hierarchical decays of $N$ matter species to radiation may balance against Hubble expansion to yield stasis, a new phase of cosmological evolution with constant matter and radiation abundances. We analyze stasis with various machine learning techniques on the full $2N$-dimensional space of decay rates and abundances, which serve as inputs to the system of Boltzmann equations that governs the dynamics. We construct a differentiable Boltzmann solver to maximize the number of stasis $e$-folds $\mathcal{N}$. High-stasis configurations obtained by gradient ascent motivate log-uniform distributions on rates and abundances to accompany power-law distributions of previous works. We demonstrate that random configurations drawn from these families of distributions regularly exhibit many $e$-folds of stasis. We additionally use them as priors in a Bayesian analysis conditioned on stasis, using stochastic variational inference with normalizing flows to model the posterior. All three numerical analyses demonstrate the generality of stasis and point to a new model in which the rates and abundances are exponential in the species index. We show that the exponential model solves the exact stasis equations, is an attractor, and satisfies $\mathcal{N}\propto N$, exhibiting inflation-level $e$-folding with a relatively low number of species. This is contrasted with the $\mathcal{N}\propto \log(N)$ scaling of power-law models. Finally, we discuss implications for the emergent string conjecture and string axiverse.

cosmological stasis normalizing flows bayesian inference differentiable boltzmann solver posterior estimation
Theoretical Physics Aug 1, 2024

Attractors, Geodesics, and the Geometry of Moduli Spaces

Fabian Ruehle, Benjamin Sung

We connect recent conjectures and observations pertaining to geodesics, attractor flows, Laplacian eigenvalues and the geometry of moduli spaces by using that attractor flows are geodesics. For toroidal compactifications, attractor points are related to (degenerate) masses of the Laplacian on the target space, and also to the Laplacian on the moduli space. We also explore compactifications of M-Theory to $5$D on a Calabi-Yau threefold and argue that geodesics are unique in a special set of classes, providing further evidence for a recent conjecture by Raman and Vafa. Finally, we describe the role of the marked moduli space in $4$d $\mathcal{N} = 2$ compactifications. We study split attractor flows in an explicit example of the one-parameter family of quintics and discuss setups where flops to isomorphic Calabi-Yau manifolds exist.

attractor flows string theory calabi-yau moduli space swampland conjecture spectral methods
Theoretical Physics Jul 31, 2024

TASI Lectures on Physics for Machine Learning

Jim Halverson

These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.

quantum field theory neural tangent kernel stochastic processes neural network field theory kernel methods
Astrophysics Jul 30, 2024

A Generative Modeling Approach to Reconstructing 21-cm Tomographic Data

Nashwan Sabti, Ram Reddy, Julian B. Muñoz et al.

Analyses of the cosmic 21-cm signal are hampered by astrophysical foregrounds that are far stronger than the signal itself. These foregrounds, typically confined to a wedge-shaped region in Fourier space, often necessitate the removal of a vast majority of modes, thereby degrading the quality of the data anisotropically. To address this challenge, we introduce a novel deep generative model based on stochastic interpolants to reconstruct the 21-cm data lost to wedge filtering. Our method leverages the non-Gaussian nature of the 21-cm signal to effectively map wedge-filtered 3D lightcones to samples from the conditional distribution of wedge-recovered lightcones. We demonstrate how our method is able to restore spatial information effectively, considering both varying cosmological initial conditions and astrophysics. Furthermore, we discuss a number of future avenues where this approach could be applied in analyses of the 21-cm signal, potentially offering new opportunities to improve our understanding of the Universe during the epochs of cosmic dawn and reionization.

generative models flow matching 21-cm foreground wedge stochastic processes posterior estimation
Foundational AI Jul 30, 2024

Relaxed Equivariant Graph Neural Networks

Elyssa Hofgard, Rui Wang, Robin Walters et al.

3D Euclidean symmetry equivariant neural networks have demonstrated notable success in modeling complex physical systems. We introduce a framework for relaxed $E(3)$ graph equivariant neural networks that can learn and represent symmetry breaking within continuous groups. Building on the existing e3nn framework, we propose the use of relaxed weights to allow for controlled symmetry breaking. We show empirically that these relaxed weights learn the correct amount of symmetry breaking.

equivariant neural networks relaxed equivariance group theory symmetry breaking relaxed weights
Foundational AI Jul 23, 2024

Automatic Environment Shaping is the Next Frontier in RL

Younghyo Park, Gabriel B. Margolis, Pulkit Agrawal

Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It's our position that algorithmic improvements in policy optimization and other ideas should be guided towards resolving the primary bottleneck of shaping the training environment, i.e., designing observations, actions, rewards and simulation dynamics. Most practitioners don't tune the RL algorithm, but other environment parameters to obtain a desirable controller. We posit that scaling RL to diverse robotic tasks will only be achieved if the community focuses on automating environment shaping procedures.

reinforcement learning environment shaping sim-to-real transfer reward optimization automated discovery
Experimental Physics Jul 15, 2024

Moment Unfolding

Krish Desai, Benjamin Nachman, Jesse Thaler

Deconvolving ("unfolding'') detector distortions is a critical step in the comparison of cross section measurements with theoretical predictions in particle and nuclear physics. However, most existing approaches require histogram binning while many theoretical predictions are at the level of statistical moments. We develop a new approach to directly unfold distribution moments as a function of another observable without having to first discretize the data. Our Moment Unfolding technique uses machine learning and is inspired by Generative Adversarial Networks (GANs). We demonstrate the performance of this approach using jet substructure measurements in collider physics. With this illustrative example, we find that our Moment Unfolding protocol is more precise than bin-based approaches and is as or more precise than completely unbinned methods.

unfolding boltzmann reweighting moment estimation inverse problems generative adversarial networks
Astrophysics Jul 10, 2024

The Type I Superluminous Supernova Catalog I: Light Curve Properties, Models, and Catalog Description

Sebastian Gomez, Matt Nicholl, Edo Berger et al.

We present the most comprehensive catalog to date of Type I Superluminous Supernovae (SLSNe), a class of stripped envelope supernovae (SNe) characterized by exceptionally high luminosities. We have compiled a sample of 262 SLSNe reported through 2022 December 31. We verified the spectroscopic classification of each SLSN and collated an exhaustive data set of UV, optical and IR photometry from both publicly available data and our own FLEET observational follow-up program, totaling over 30,000 photometric detections. Using these data we derive observational parameters such as the peak absolute magnitudes, rise and decline timescales, as well as bolometric luminosities, temperature and photospheric radius evolution for all SLSNe. Additionally, we model all light curves using a hybrid model that includes contributions from both a magnetar central engine and the radioactive decay of $^{56}$Ni. We explore correlations among various physical and observational parameters, and recover the previously found relation between ejecta mass and magnetar spin, as well as the overall progenitor pre-explosion mass distribution with a peak at $\approx 6.5$ M$_\odot$. We find no significant redshift dependence for any parameter, and no evidence for distinct sub-types of SLSNe. We find that $< 3$\% of SLSNe are best fit with a significant contribution from radioactive decay $\gtrsim 50$\%, representing a set of relatively dim and slowly declining SNe. We provide several analytical tools designed to simulate typical SLSN light curves across a broad range of wavelengths and phases, enabling accurate K-corrections, bolometric scaling calculations, and inclusion of SLSNe in survey simulations or future comparison works. The complete catalog, including all of the photometry, models, and derived parameters, is made available as an open-source resource on GitHub.

supernova classification magnetar central engine bolometric light curves bayesian inference stellar evolution
Astrophysics Jul 4, 2024

Find the haystacks, then look for needles: The rate of strongly lensed transients in galaxy-galaxy strong gravitational lenses

Ana Sainz de Murieta, Thomas E. Collett, Mark R. Magee et al.

The time delay between appearances of multiple images of a gravitationally lensed supernova (glSN) is sensitive to the Hubble constant, $H_0$. As well as time delays, a lensed host galaxy is needed to enable precise inference of $H_0$. In this work we investigate the connection between discoverable lensed transients and their host galaxies. We find that LSST will discover 88 glSNe per year, of which $54\%$ will also have a strongly lensed host. The rates can change by approximately 30 percent uncertainty depending primarily on the choice of unlensed SN population and uncertainties in the redshift evolution of the deflector population, but the fraction of glSNe with a lensed host is consistently around a half. LSST will discover 20 glSNe per year in systems that could plausibly have been identified by Euclid as galaxy-galaxy lenses before the discovery of the glSN. Such systems have preferentially longer time delays and therefore are well suited for cosmography. We define a golden sample of glSNe Ia with time delays over 10 days, image separations greater than 0.8 arcseconds, and a multiply imaged host. For this golden sample, we find $91\%$ occur in systems that should already be discoverable as galaxy-galaxy lenses in Euclid. For cosmology with glSNe, monitoring Euclid lenses is a plausible alternative to searching the entire LSST alert stream.

gravitational lensing time delays strong gravitational lensing rates survey optimization cosmological simulation monte carlo methods
Foundational AI Jul 3, 2024

A multicategory jet image classification framework using deep neural network

Jairo Orozco Sandoval, Vidya Manian, Sudhir Malik

Jet point cloud images are high dimensional data structures that needs to be transformed to a separable feature space for machine learning algorithms to distinguish them with simple decision boundaries. In this article, the authors focus on jet category separability by particle and jet feature extraction, resulting in more efficient training of a simple deep neural network, resulting in a computational efficient interpretable model for jet classification. The methodology is tested with three to five categories of jets from the JetNet benchmark jet tagging dataset, resulting in comparable performance to particle flow network. This work demonstrates that high dimensional datasets represented in separable latent spaces lead to simpler architectures for jet classification.

jet physics feature extraction classification convolutional networks point cloud classification
Experimental Physics Jul 1, 2024

Anomaly-aware summary statistic from data batches

Gaia Grosso

Signal-agnostic data exploration based on machine learning could unveil very subtle statistical deviations of collider data from the expected Standard Model of particle physics. The beneficial impact of a large training sample on machine learning solutions motivates the exploration of increasingly large and inclusive samples of acquired data with resource efficient computational methods. In this work we consider the New Physics Learning Machine (NPLM), a multivariate goodness-of-fit test built on the Neyman-Pearson maximum-likelihood-ratio construction, and we address the problem of testing large size samples under computational and storage resource constraints. We propose to perform parallel NPLM routines over batches of the data, and to combine them by locally aggregating over the data-to-reference density ratios learnt by each batch. The resulting data hypothesis defining the likelihood-ratio test is thus shared over the batches, and complies with the assumption that the expected rate of new physical processes is time invariant. We show that this method outperforms the simple sum of the independent tests run over the batches, and can recover, or even surpass, the sensitivity of the single test run over the full data. Beside the significant advantage for the offline application of NPLM to large size samples, the proposed approach offers new prospects toward the use of NPLM to construct anomaly-aware summary statistics in quasi-online data streaming scenarios.

anomaly detection batch aggregation goodness-of-fit testing likelihood ratio hypothesis testing
Foundational AI Jun 26, 2024

Boosting Soft Q-Learning by Bounding

Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin et al.

An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the Q-function, leading to boosted performance.

reinforcement learning value function bounding entropy-regularized rl soft q-learning reward optimization
Theoretical Physics Jun 25, 2024

Simulating moiré quantum matter with neural network

Di Luo, David D. Dai, Liang Fu

Moiré materials provide an ideal platform for exploring quantum phases of matter. However, solving the many-electron problem in moiré systems is challenging due to strong correlation effects. We introduce a powerful variational representation of quantum states, many-body neural Bloch wavefunction, to solve many-electron problems in moiré materials accurately and efficiently. Applying our method to the semiconductor heterobilayer WSe2/WS2 , we obtain a generalized Wigner crystal at filling factor n = 1/3, a Mott insulator n = 1, and a correlated insulator with local magnetic moments and antiferromagnetic spin correlation at n = 2. Our neural network approach improves the simulation accuracy of strongly interacting moiré materials and paves the way for discovery of new quantum phases with variational learning principle in a unified framework.

moiré neural wavefunction quantum states quantum simulation correlated electron phases hamiltonian systems
Foundational AI Jun 24, 2024

Derived Moduli Spaces of Nonlinear PDEs II: Variational Tricomplex and BV Formalism

Jacob Kryczka, Artan Sheshmani, Shing-Tung Yau

This paper is the second in a series of works dedicated to studying non-linear partial differential equations via derived geometric methods. We study a natural derived enhancement of the de Rham complex of a non-linear PDE via algebro-geometric techniques and examine its consequences for the functional differential calculus on the space of solutions. Applications to the BV-formalism with and without boundary conditions are discussed.

derived algebraic geometry variational tricomplex bv formalism quantum field theory lagrangian methods
Experimental Physics Jun 15, 2024

Demonstration of neutron identification in neutrino interactions in the MicroBooNE liquid argon time projection chamber

MicroBooNE collaboration, P. Abratenko, O. Alterkait et al.

A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data constraining their production rates and kinematics. We present the first demonstration of tagging neutrino-induced neutrons in liquid argon time projection chambers using secondary protons emitted from neutron-argon interactions in the MicroBooNE detector. We describe the method developed to identify neutrino-induced neutrons and demonstrate its performance using neutrons produced in muon-neutrino charged current interactions. The method is validated using a small subset of MicroBooNE's total dataset. The selection yields a sample with $60\%$ of selected tracks corresponding to neutron-induced secondary protons.

neutron tagging neutrino detection event reconstruction neutrino-nucleus interactions missing energy reconstruction
Experimental Physics Jun 14, 2024

Improving neutrino energy estimation of charged-current interaction events with recurrent neural networks in MicroBooNE

MicroBooNE collaboration, P. Abratenko, O. Alterkait et al.

We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstructing and summing visible energies, often experience sizable biases and resolution smearing because of the complex nature of neutrino interactions and the detector response. The estimation of neutrino energy can be improved after considering the kinematics information of reconstructed final-state particles. Utilizing kinematic information of reconstructed particles, the deep learning-based approach shows improved resolution and reduced bias for the muon neutrino Monte Carlo simulation sample compared to the traditional approach. In order to address the common concern about the effectiveness of this method on experimental data, the RNN-based energy estimator is further examined and validated with dedicated data-simulation consistency tests using MicroBooNE data. We also assess its potential impact on a neutrino oscillation study after accounting for all statistical and systematic uncertainties and show that it enhances physics sensitivity. This method has good potential to improve the performance of other physics analyses.

recurrent networks neutrino energy reconstruction event reconstruction neutrino detection regression
Theoretical Physics Jun 13, 2024

QCD constraints on isospin-dense matter and the nuclear equation of state

Ryan Abbott, William Detmold, Marc Illa et al.

Understanding the behavior of dense hadronic matter is a central goal in nuclear physics as it governs the nature and dynamics of astrophysical objects such as supernovae and neutron stars. Because of the non-perturbative nature of quantum chromodynamics (QCD), little is known rigorously about hadronic matter in these extreme conditions. Here, lattice QCD calculations are used to compute thermodynamic quantities and the equation of state of QCD over a wide range of isospin chemical potentials with controlled systematic uncertainties. Agreement is seen with chiral perturbation theory when the chemical potential is small. Comparison to perturbative QCD at large chemical potential allows for an estimate of the gap in the superconducting phase, and this quantity is seen to agree with perturbative determinations. Since the partition function for an isospin chemical potential, $μ_I$, bounds the partition function for a baryon chemical potential $μ_B=3μ_I/2$, these calculations also provide rigorous non-perturbative QCD bounds on the symmetric nuclear matter equation of state over a wide range of baryon densities for the first time.

lattice qcd isospin chemical potential nuclear equation of state effective field theory quantum field theory
Foundational AI Jun 10, 2024

An Elliptic Kernel Unsupervised Autoencoder-Graph Convolutional Network Ensemble Model for Hyperspectral Unmixing

Estefania Alfaro-Mejia, Carlos J Delgado, Vidya Manian

Spectral Unmixing is an important technique in remote sensing used to analyze hyperspectral images to identify endmembers and estimate abundance maps. Over the past few decades, performance of techniques for endmember extraction and fractional abundance map estimation have significantly improved. This article presents an ensemble model workflow called Autoencoder Graph Ensemble Model (AEGEM) designed to extract endmembers and fractional abundance maps. An elliptical kernel is applied to measure spectral distances, generating the adjacency matrix within the elliptical neighborhood. This information is used to construct an elliptical graph, with centroids as senders and remaining pixels within the geometry as receivers. The next step involves stacking abundance maps, senders, and receivers as inputs to a Graph Convolutional Network, which processes this input to refine abundance maps. Finally, an ensemble decision-making process determines the best abundance maps based on root mean square error metric. The proposed AEGEM is assessed with benchmark datasets such as Samson, Jasper, and Urban, outperforming results obtained by baseline algorithms. For the Samson dataset, AEGEM excels in three abundance maps: water, tree and soil yielding values of 0.081, 0.158, and 0.182, respectively. For the Jasper dataset, results are improved for the tree and water endmembers with values of 0.035 and 0.060 in that order, as well as for the mean average of the spectral angle distance metric 0.109. For the Urban dataset, AEGEM outperforms previous results for the abundance maps of roof and asphalt, achieving values of 0.135 and 0.240, respectively. Additionally, for the endmembers of grass and roof, AEGEM achieves values of 0.063 and 0.094.

hyperspectral unmixing autoencoders graph neural networks ensemble methods elliptical graph construction
Foundational AI Jun 9, 2024

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Mark Hamilton, Andrew Zisserman, John R. Hershey et al.

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visually aligned features solely through watching videos. We show that DenseAV can discover the ``meaning'' of words and the ``location'' of sounds without explicit localization supervision. Furthermore, it automatically discovers and distinguishes between these two types of associations without supervision. We show that DenseAV's localization abilities arise from a new multi-head feature aggregation operator that directly compares dense image and audio representations for contrastive learning. In contrast, many other systems that learn ``global'' audio and video representations cannot localize words and sound. Finally, we contribute two new datasets to improve the evaluation of AV representations through speech and sound prompted semantic segmentation. On these and other datasets we show DenseAV dramatically outperforms the prior art on speech and sound prompted semantic segmentation. DenseAV outperforms the previous state-of-the-art, ImageBind, on cross-modal retrieval using fewer than half of the parameters. Project Page: \href{https://aka.ms/denseav}{https://aka.ms/denseav}

audio-visual grounding self-supervised learning contrastive learning dense cross-modal alignment representation learning
Theoretical Physics Jun 6, 2024

A Heterotic Kähler Gravity and the Distance Conjecture

Javier José Murgas Ibarra, Paul-Konstantin Oehlmann, Fabian Ruehle et al.

Deformations of the heterotic superpotential give rise to a topological holomorphic theory with similarities to both Kodaira-Spencer gravity and holomorphic Chern-Simons theory. Although the action is cubic, it is only quadratic in the complex structure deformations (the Beltrami differential). Treated separately, for large fluxes, or alternatively at large distances in the background complex structure moduli space, these fields can be integrated out to obtain a new field theory in the remaining fields, which describe the complexified hermitian and gauge degrees of freedom. We investigate properties of this new holomorphic theory, and in particular connections to the swampland distance conjecture in the context of heterotic string theory. In the process, we define a new type of symplectic cohomology theory, where the background complex structure Beltrami differential plays the role of the symplectic form.

string theory swampland distance conjecture effective field theory heterotic kähler gravity quantum field theory
Theoretical Physics Jun 6, 2024

Stochastic logic in biased coupled photonic probabilistic bits

Michael Horodynski, Charles Roques-Carmes, Yannick Salamin et al.

Optical computing often employs tailor-made hardware to implement specific algorithms, trading generality for improved performance in key aspects like speed and power efficiency. An important computing approach that is still missing its corresponding optical hardware is probabilistic computing, used e.g. for solving difficult combinatorial optimization problems. In this study, we propose an experimentally viable photonic approach to solve arbitrary probabilistic computing problems. Our method relies on the insight that coherent Ising machines composed of coupled and biased optical parametric oscillators can emulate stochastic logic. We demonstrate the feasibility of our approach by using numerical simulations equivalent to the full density matrix formulation of coupled optical parametric oscillators.

optical parametric oscillators probabilistic computing stochastic processes coherent ising machines hamiltonian systems
Foundational AI Jun 4, 2024

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

Owen Dugan, Donato Manuel Jimenez Beneto, Charlotte Loh et al.

Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. Language model systems often enable LLMs to generate code for arithmetic operations to achieve accurate calculations. However, this approach compromises speed and security, and fine-tuning risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in a single autoregressive step, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of a LLM to control a symbolic architecture that performs arithmetic. Our implementation using Llama 3 with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+,-,\times,÷,\sin{},\cos{},\log{},\exp{},\sqrt{}$), outperforming GPT 4o with and without a code interpreter. Furthermore, OccamLlama outperforms GPT 4o with and without a code interpreter on average across a range of mathematical problem solving benchmarks, demonstrating that OccamLLMs can excel in arithmetic tasks, even surpassing much larger models. We will make our code public shortly.

single-step arithmetic neurosymbolic integration hidden state control interpretability sparse models
Experimental Physics Jun 3, 2024

Towards Universal Unfolding of Detector Effects in High-Energy Physics using Denoising Diffusion Probabilistic Models

Camila Pazos, Shuchin Aeron, Pierre-Hugues Beauchemin et al.

Correcting for detector effects in experimental data, particularly through unfolding, is critical for enabling precision measurements in high-energy physics. However, traditional unfolding methods face challenges in scalability, flexibility, and dependence on simulations. We introduce a novel approach to multidimensional object-wise unfolding using conditional Denoising Diffusion Probabilistic Models (cDDPM). Our method utilizes the cDDPM for a non-iterative, flexible posterior sampling approach, incorporating distribution moments as conditioning information, which exhibits a strong inductive bias that allows it to generalize to unseen physics processes without explicitly assuming the underlying distribution. Our results highlight the potential of this method as a step towards a "universal" unfolding tool that reduces dependence on truth-level assumptions, while enabling the unfolding of a wide range of measured distributions with improved adaptability and accuracy.

diffusion models unfolding posterior estimation inverse problems generative models
Foundational AI May 31, 2024

Sheaf stable pairs, Quot-schemes, and birational geometry

Caucher Birkar, Jia Jia, Artan Sheshmani

In this paper we build bridges between moduli theory of sheaf stable pairs on one hand and birational geometry on the other hand. We will in particular treat moduli of sheaf stable pairs on smooth projective curves in detail and present some calculations in low degrees. We will also outline problems in various directions.

moduli spaces birational geometry quot-schemes group theory string theory
Theoretical Physics May 31, 2024

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

Zhuo Chen, Rumen Dangovski, Charlotte Loh et al.

We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for complicated downstream tasks. Our approach is theoretically supported by the universality theorem and the rank representation theorem to achieve efficient high-rank adaptations. Experiments demonstrate that QuanTA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods. Furthermore, QuanTA shows superior performance with fewer trainable parameters compared to other approaches and can be designed to integrate with existing fine-tuning algorithms for further improvement, providing a scalable and efficient solution for fine-tuning large language models and advancing state-of-the-art in natural language processing.

fine-tuning high-rank adaptation tensor networks parameter-efficient fine-tuning quantum computing
Foundational AI May 31, 2024

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Tengyang Xie, Dylan J. Foster, Akshay Krishnamurthy et al.

Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to produce diverse, maximally informative responses. By allowing RLHF to confidently stray from the pre-trained model, online exploration offers the possibility of novel, potentially super-human capabilities, but its full potential as a paradigm for language model training has yet to be realized, owing to computational and statistical bottlenecks in directly adapting existing reinforcement learning techniques. We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO), which is simple and practical -- a one-line change to (online) Direct Preference Optimization (DPO; Rafailov et al., 2023) -- yet enjoys the strongest known provable guarantees and promising empirical performance. XPO augments the DPO objective with a novel and principled exploration bonus, empowering the algorithm to explore outside the support of the initial model and human feedback data. In theory, we show that XPO is provably sample-efficient and converges to a near-optimal language model policy under natural exploration conditions, irrespective of whether the initial model has good coverage. Our analysis, which builds on the observation that DPO implicitly performs a form of $Q^{\star}$-approximation (or, Bellman error minimization), combines previously disparate techniques from language modeling and theoretical reinforcement learning in a serendipitous fashion through the perspective of KL-regularized Markov decision processes. Empirically, we find that XPO is more sample-efficient than non-exploratory DPO variants in a preliminary evaluation.

reinforcement learning exploratory preference optimization kl-regularized mdp q*-approximation reward optimization
Theoretical Physics May 29, 2024

Harmonic $1$-forms on real loci of Calabi-Yau manifolds

Michael R. Douglas, Daniel Platt, Yidi Qi et al.

We numerically study whether there exist nowhere vanishing harmonic $1$-forms on the real locus of some carefully constructed examples of Calabi-Yau manifolds, which would then give rise to potentially new examples of $G_2$-manifolds and an explicit description of their metrics. We do this in two steps: first, we use a neural network to compute an approximate Calabi-Yau metric on each manifold. Second, we use another neural network to compute an approximately harmonic $1$-form with respect to the approximate metric, and then inspect the found solution. On two manifolds existence of a nowhere vanishing harmonic $1$-form can be ruled out using differential geometry. The real locus of a third manifold is diffeomorphic to $S^1 \times S^2$, and our numerics suggest that when the Calabi-Yau metric is close to a singular limit, then it admits a nowhere vanishing harmonic $1$-form. We explain how such an approximate solution could potentially be used in a numerically verified proof for the fact that our example manifold must admit a nowhere vanishing harmonic $1$-form.

calabi-yau metric learning g2-manifold construction physics-informed neural networks string theory monte carlo methods
Experimental Physics May 27, 2024

From Neurons to Neutrons: A Case Study in Interpretability

Ouail Kitouni, Niklas Nolte, Víctor Samuel Pérez-Díaz et al.

Mechanistic Interpretability (MI) promises a path toward fully understanding how neural networks make their predictions. Prior work demonstrates that even when trained to perform simple arithmetic, models can implement a variety of algorithms (sometimes concurrently) depending on initialization and hyperparameters. Does this mean neuron-level interpretability techniques have limited applicability? We argue that high-dimensional neural networks can learn low-dimensional representations of their training data that are useful beyond simply making good predictions. Such representations can be understood through the mechanistic interpretability lens and provide insights that are surprisingly faithful to human-derived domain knowledge. This indicates that such approaches to interpretability can be useful for deriving a new understanding of a problem from models trained to solve it. As a case study, we extract nuclear physics concepts by studying models trained to reproduce nuclear data.

interpretability representation learning mechanistic interpretability dimensionality reduction disentangled representations
Foundational AI May 27, 2024

Survival of the Fittest Representation: A Case Study with Modular Addition

Xiaoman Delores Ding, Zifan Carl Guo, Eric J. Michaud et al.

When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representations and algorithms), which compete with each other under pressure from resource constraints, with the "fittest" ultimately prevailing. To investigate this Survival of the Fittest hypothesis, we conduct a case study on neural networks performing modular addition, and find that these networks' multiple circular representations at different Fourier frequencies undergo such competitive dynamics, with only a few circles surviving at the end. We find that the frequencies with high initial signals and gradients, the "fittest," are more likely to survive. By increasing the embedding dimension, we also observe more surviving frequencies. Inspired by the Lotka-Volterra equations describing the dynamics between species, we find that the dynamics of the circles can be nicely characterized by a set of linear differential equations. Our results with modular addition show that it is possible to decompose complicated representations into simpler components, along with their basic interactions, to offer insight on the training dynamics of representations.

training dynamics representation learning representation competition interpretability embeddings
Foundational AI May 23, 2024

Improved Distribution Matching Distillation for Fast Image Synthesis

Tianwei Yin, Michaël Gharbi, Taesung Park et al.

Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss computed using a large set of noise-image pairs generated by the teacher with many steps of a deterministic sampler. This is costly for large-scale text-to-image synthesis and limits the student's quality, tying it too closely to the teacher's original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. We show that the resulting instability is due to the fake critic not estimating the distribution of generated samples accurately and propose a two time-scale update rule as a remedy. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images. This lets us train the student model on real data, mitigating the imperfect real score estimation from the teacher model, and enhancing quality. Lastly, we modify the training procedure to enable multi-step sampling. We identify and address the training-inference input mismatch problem in this setting, by simulating inference-time generator samples during training time. Taken together, our improvements set new benchmarks in one-step image generation, with FID scores of 1.28 on ImageNet-64x64 and 8.35 on zero-shot COCO 2014, surpassing the original teacher despite a 500X reduction in inference cost. Further, we show our approach can generate megapixel images by distilling SDXL, demonstrating exceptional visual quality among few-step methods.

distribution matching distillation diffusion models loss function design generative adversarial networks score-based models
Foundational AI May 23, 2024

Not All Language Model Features Are One-Dimensionally Linear

Joshua Engels, Eric J. Michaud, Isaac Liao et al.

Recent work has proposed that language models perform computation by manipulating one-dimensional representations of concepts ("features") in activation space. In contrast, we explore whether some language model representations may be inherently multi-dimensional. We begin by developing a rigorous definition of irreducible multi-dimensional features based on whether they can be decomposed into either independent or non-co-occurring lower-dimensional features. Motivated by these definitions, we design a scalable method that uses sparse autoencoders to automatically find multi-dimensional features in GPT-2 and Mistral 7B. These auto-discovered features include strikingly interpretable examples, e.g. circular features representing days of the week and months of the year. We identify tasks where these exact circles are used to solve computational problems involving modular arithmetic in days of the week and months of the year. Next, we provide evidence that these circular features are indeed the fundamental unit of computation in these tasks with intervention experiments on Mistral 7B and Llama 3 8B, and we examine the continuity of the days of the week feature in Mistral 7B. Overall, our work argues that understanding multi-dimensional features is necessary to mechanistically decompose some model behaviors.

multi-dimensional features representation learning circular representations autoencoders interpretability
Foundational AI May 23, 2024

Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics

Jonas Spinner, Victor Bresó, Pim de Haan et al.

Extracting scientific understanding from particle-physics experiments requires solving diverse learning problems with high precision and good data efficiency. We propose the Lorentz Geometric Algebra Transformer (L-GATr), a new multi-purpose architecture for high-energy physics. L-GATr represents high-energy data in a geometric algebra over four-dimensional space-time and is equivariant under Lorentz transformations, the symmetry group of relativistic kinematics. At the same time, the architecture is a Transformer, which makes it versatile and scalable to large systems. L-GATr is first demonstrated on regression and classification tasks from particle physics. We then construct the first Lorentz-equivariant generative model: a continuous normalizing flow based on an L-GATr network, trained with Riemannian flow matching. Across our experiments, L-GATr is on par with or outperforms strong domain-specific baselines.

equivariant neural networks geometric deep learning geometric algebra transformers flow matching
Foundational AI May 23, 2024

Multistable Shape from Shading Emerges from Patch Diffusion

Xinran Nicole Han, Todd Zickler, Ko Nishino

Models for inferring monocular shape of surfaces with diffuse reflection -- shape from shading -- ought to produce distributions of outputs, because there are fundamental mathematical ambiguities of both continuous (e.g., bas-relief) and discrete (e.g., convex/concave) types that are also experienced by humans. Yet, the outputs of current models are limited to point estimates or tight distributions around single modes, which prevent them from capturing these effects. We introduce a model that reconstructs a multimodal distribution of shapes from a single shading image, which aligns with the human experience of multistable perception. We train a small denoising diffusion process to generate surface normal fields from $16\times 16$ patches of synthetic images of everyday 3D objects. We deploy this model patch-wise at multiple scales, with guidance from inter-patch shape consistency constraints. Despite its relatively small parameter count and predominantly bottom-up structure, we show that multistable shape explanations emerge from this model for ambiguous test images that humans experience as being multistable. At the same time, the model produces veridical shape estimates for object-like images that include distinctive occluding contours and appear less ambiguous. This may inspire new architectures for stochastic 3D shape perception that are more efficient and better aligned with human experience.

diffusion models multistable perception shape from shading patch-based diffusion inverse problems
Foundational AI May 23, 2024

How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator

Subhash Kantamneni, Ziming Liu, Max Tegmark

How do transformers model physics? Do transformers model systems with interpretable analytical solutions, or do they create "alien physics" that are difficult for humans to decipher? We take a step in demystifying this larger puzzle by investigating the simple harmonic oscillator (SHO), $\ddot{x}+2γ\dot{x}+ω_0^2x=0$, one of the most fundamental systems in physics. Our goal is to identify the methods transformers use to model the SHO, and to do so we hypothesize and evaluate possible methods by analyzing the encoding of these methods' intermediates. We develop four criteria for the use of a method within the simple testbed of linear regression, where our method is $y = wx$ and our intermediate is $w$: (1) Can the intermediate be predicted from hidden states? (2) Is the intermediate's encoding quality correlated with model performance? (3) Can the majority of variance in hidden states be explained by the intermediate? (4) Can we intervene on hidden states to produce predictable outcomes? Armed with these two correlational (1,2), weak causal (3) and strong causal (4) criteria, we determine that transformers use known numerical methods to model trajectories of the simple harmonic oscillator, specifically the matrix exponential method. Our analysis framework can conveniently extend to high-dimensional linear systems and nonlinear systems, which we hope will help reveal the "world model" hidden in transformers.

transformers interpretability mechanistic interpretability in-context learning regression
Astrophysics May 21, 2024

Unsupervised Searches for Cosmological Parity Violation: Improving Detection Power with the Neural Field Scattering Transform

Matthew Craigie, Peter L. Taylor, Yuan-Sen Ting et al.

Recent studies using four-point correlations suggest a parity violation in the galaxy distribution, though the significance of these detections is sensitive to the choice of simulation used to model the noise properties of the galaxy distribution. In a recent paper, we introduce an unsupervised learning approach which offers an alternative method that avoids the dependence on mock catalogs, by learning parity violation directly from observational data. However, the Convolutional Neural Network (CNN) model utilized by our previous unsupervised approach struggles to extend to more realistic scenarios where data is limited. We propose a novel method, the Neural Field Scattering Transform (NFST), which enhances the Wavelet Scattering Transform (WST) technique by adding trainable filters, parameterized as a neural field. We first tune the NFST model to detect parity violation in a simplified dataset, then compare its performance against WST and CNN benchmarks across varied training set sizes. We find the NFST can detect parity violation with $4\times$ less data than the CNN and $32\times$ less than the WST. Furthermore, in cases with limited data the NFST can detect parity violation with up to $6σ$ confidence, where the WST and CNN fail to make any detection. We identify that the added flexibility of the NFST, and particularly the ability to learn asymmetric filters, as well as the specific symmetries built into the NFST architecture, contribute to its improved performance over the benchmark models. We further demonstrate that the NFST is readily interpretable, which is valuable for physical applications such as the detection of parity violation.

parity violation wavelet scattering transform signal detection neural field symmetry preservation
Experimental Physics May 20, 2024

Resonant Neutrino Flavor Conversion in the Atmosphere

Connor Sponsler, Matheus Hostert, Ivan Martinez-Soler et al.

Neutrinos produced in the atmosphere traverse a column density of air before being detected at neutrino observatories like IceCube or KM3NeT. In this work, we extend the neutrino flavor evolution in the {nuSQuIDS} code accounting for the varying height of neutrino production and the variable air density in the atmosphere. These effects can lead to sizeable spectral distortions in standard neutrino oscillations and are crucial to accurately describe some new physics scenarios. As an example, we study a model of quasi-sterile neutrinos that induce resonant flavor conversions at neutrino energies of ${O}(300)\text{ MeV}$ in matter densities of $1 \text{ g/cm}^3$. In atmospheric air densities, the same resonance is then realized at neutrino energies of ${O}(300- 700)$~GeV. We find that the new resonance can deplete the $ν_μ+ \overlineν_μ$ flux at the IceCube Neutrino Observatory by as much as $10\%$ in the direction of the horizon.

msw resonance atmospheric neutrino oscillations neutrino detection quasi-sterile neutrinos new physics searches
Astrophysics May 8, 2024

Diffusion-HMC: Parameter Inference with Diffusion-model-driven Hamiltonian Monte Carlo

Nayantara Mudur, Carolina Cuesta-Lazaro, Douglas P. Finkbeiner

Diffusion generative models have excelled at diverse image generation and reconstruction tasks across fields. A less explored avenue is their application to discriminative tasks involving regression or classification problems. The cornerstone of modern cosmology is the ability to generate predictions for observed astrophysical fields from theory and constrain physical models from observations using these predictions. This work uses a single diffusion generative model to address these interlinked objectives -- as a surrogate model or emulator for cold dark matter density fields conditional on input cosmological parameters, and as a parameter inference model that solves the inverse problem of constraining the cosmological parameters of an input field. The model is able to emulate fields with summary statistics consistent with those of the simulated target distribution. We then leverage the approximate likelihood of the diffusion generative model to derive tight constraints on cosmology by using the Hamiltonian Monte Carlo method to sample the posterior on cosmological parameters for a given test image. Finally, we demonstrate that this parameter inference approach is more robust to small perturbations of noise to the field than baseline parameter inference networks.

diffusion models monte carlo methods posterior estimation score-based models bayesian inference
Foundational AI May 7, 2024

OptPDE: Discovering Novel Integrable Systems via AI-Human Collaboration

Subhash Kantamneni, Ziming Liu, Max Tegmark

Integrable partial differential equation (PDE) systems are of great interest in natural science, but are exceedingly rare and difficult to discover. To solve this, we introduce OptPDE, a first-of-its-kind machine learning approach that Optimizes PDEs' coefficients to maximize their number of conserved quantities, $n_{\rm CQ}$, and thus discover new integrable systems. We discover four families of integrable PDEs, one of which was previously known, and three of which have at least one conserved quantity but are new to the literature to the best of our knowledge. We investigate more deeply the properties of one of these novel PDE families, $u_t = (u_x+a^2u_{xxx})^3$. Our paper offers a promising schema of AI-human collaboration for integrable system discovery: machine learning generates interpretable hypotheses for possible integrable systems, which human scientists can verify and analyze, to truly close the discovery loop.

integrable system discovery conservation laws conserved quantity optimization automated discovery eigenvalue decomposition
Theoretical Physics May 6, 2024

Bounds and Dualities of Type II Little String Theories

Florent Baume, Paul-Konstantin Oehlmann, Fabian Ruehle

We explore the symmetry structure of Type II Little String Theories and their T-dualities. We construct these theories both from the bottom-up perspective starting with seed Superconformal Field Theories, and from the top-down using F-/M-theory. By exploiting anomaly inflow and unitarity of the LST worldsheet theory, we derive strong conditions on the possible 6D bulk theories and their flavor algebras. These constraints continue to apply if gravity is coupled to the theory. We also study the higher form symmetry structure of these theories and show how they get exchanged under T-duality. Finally, we comment on seemingly consistent bottom-up Little String Theories that cannot be constructed from the top-down approach.

little string theory string theory t-duality higher form symmetry conformal field theory
Astrophysics May 3, 2024

A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering Statistics

Beyond-2pt Collaboration, :, Elisabeth Krause et al.

The last few years have seen the emergence of a wide array of novel techniques for analyzing high-precision data from upcoming galaxy surveys, which aim to extend the statistical analysis of galaxy clustering data beyond the linear regime and the canonical two-point (2pt) statistics. We test and benchmark some of these new techniques in a community data challenge "Beyond-2pt", initiated during the Aspen 2022 Summer Program "Large-Scale Structure Cosmology beyond 2-Point Statistics," whose first round of results we present here. The challenge dataset consists of high-precision mock galaxy catalogs for clustering in real space, redshift space, and on a light cone. Participants in the challenge have developed end-to-end pipelines to analyze mock catalogs and extract unknown ("masked") cosmological parameters of the underlying $Λ$CDM models with their methods. The methods represented are density-split clustering, nearest neighbor statistics, BACCO power spectrum emulator, void statistics, LEFTfield field-level inference using effective field theory (EFT), and joint power spectrum and bispectrum analyses using both EFT and simulation-based inference. In this work, we review the results of the challenge, focusing on problems solved, lessons learned, and future research needed to perfect the emerging beyond-2pt approaches. The unbiased parameter recovery demonstrated in this challenge by multiple statistics and the associated modeling and inference frameworks supports the credibility of cosmology constraints from these methods. The challenge data set is publicly available and we welcome future submissions from methods that are not yet represented.

beyond-2pt statistics cosmological simulation simulation-based inference effective field theory bayesian inference
Experimental Physics May 3, 2024

Measurement of atmospheric neutrino oscillation parameters using convolutional neural networks with 9.3 years of data in IceCube DeepCore

IceCube Collaboration

The DeepCore sub-detector of the IceCube Neutrino Observatory provides access to neutrinos with energies above approximately 5 GeV. Data taken between 2012-2021 (3,387 days) are utilized for an atmospheric $ν_μ$ disappearance analysis that studied 150,257 neutrino-candidate events with reconstructed energies between 5-100 GeV. An advanced reconstruction based on a convolutional neural network is applied, providing increased signal efficiency and background suppression, resulting in a measurement with both significantly increased statistics compared to previous DeepCore oscillation results and high neutrino purity. For the normal neutrino mass ordering, the atmospheric neutrino oscillation parameters and their 1$σ$ errors are measured to be $Δ$m$^2_{32}$ = $2.40\substack{+0.05 \\ -0.04} \times 10^{-3} \textrm{ eV}^2$ and sin$^2$$θ_{23}$=$0.54\substack{+0.04 \\ -0.03}$. The results are the most precise to date using atmospheric neutrinos, and are compatible with measurements from other neutrino detectors including long-baseline accelerator experiments.

neutrino detection neutrino oscillation parameters convolutional networks event reconstruction classification
Astrophysics May 2, 2024

Multi-filter UV to NIR Data-driven Light Curve Templates for Stripped Envelope Supernovae

Somayeh Khakpash, Federica B. Bianco, Maryam Modjaz et al.

While the spectroscopic classification scheme for Stripped envelope supernovae (SESNe) is clear, and we know that they originate from massive stars that lost some or all their envelopes of Hydrogen and Helium, the photometric evolution of classes within this family is not fully characterized. Photometric surveys, like the Vera C. Rubin Legacy Survey of Space and Time, will discover tens of thousands of transients each night and spectroscopic follow-up will be limited, prompting the need for photometric classification and inference based solely on photometry. We have generated 54 data-driven photometric templates for SESNe of subtypes IIb, Ib, Ic, Ic-bl, and Ibn in U/u, B, g, V, R/r, I/i, J, H, Ks, and Swift w2, m2, w1 bands using Gaussian Processes and a multi-survey dataset composed of all well-sampled open-access light curves (165 SESNe, 29531 data points) from the Open Supernova Catalog. We use our new templates to assess the photometric diversity of SESNe by comparing final per-band subtype templates with each other and with individual, unusual and prototypical SESNe. We find that SNe Ibns and Ic-bl exhibit a distinctly faster rise and decline compared to other subtypes. We also evaluate the behavior of SESNe in the PLAsTiCC and ELAsTiCC simulations of LSST light curves highlighting differences that can bias photometric classification models trained on the simulated light curves. Finally, we investigate in detail the behavior of fast-evolving SESNe (including SNe Ibn) and the implications of the frequently observed presence of two peaks in their light curves.

supernova classification multi-band light curve templates gaussian processes classification photometric survey bias
Foundational AI May 2, 2024

Learning Force Control for Legged Manipulation

Tifanny Portela, Gabriel B. Margolis, Yandong Ji et al.

Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing. We showcase our method on a whole-body control platform of a quadruped robot with an arm. Such force control enables us to perform gravity compensation and impedance control, unlocking compliant whole-body manipulation. The learned whole-body controller with variable compliance makes it intuitive for humans to teleoperate the robot by only commanding the manipulator, and the robot's body adjusts automatically to achieve the desired position and force. Consequently, a human teleoperator can easily demonstrate a wide variety of loco-manipulation tasks. To the best of our knowledge, we provide the first deployment of learned whole-body force control in legged manipulators, paving the way for more versatile and adaptable legged robots.

reinforcement learning contact force regulation whole-body loco-manipulation sim-to-real transfer reward optimization
Astrophysics May 1, 2024

Introducing the DREAMS Project: DaRk mattEr and Astrophysics with Machine learning and Simulations

Jonah C. Rose, Paul Torrey, Francisco Villaescusa-Navarro et al.

We introduce the DREAMS project, an innovative approach to understanding the astrophysical implications of alternative dark matter models and their effects on galaxy formation and evolution. The DREAMS project will ultimately comprise thousands of cosmological hydrodynamic simulations that simultaneously vary over dark matter physics, astrophysics, and cosmology in modeling a range of systems -- from galaxy clusters to ultra-faint satellites. Such extensive simulation suites can provide adequate training sets for machine-learning-based analyses. This paper introduces two new cosmological hydrodynamical suites of Warm Dark Matter, each comprised of 1024 simulations generated using the Arepo code. One suite consists of uniform-box simulations covering a $(25~h^{-1}~{\rm M}_\odot)^3$ volume, while the other consists of Milky Way zoom-ins with sufficient resolution to capture the properties of classical satellites. For each simulation, the Warm Dark Matter particle mass is varied along with the initial density field and several parameters controlling the strength of baryonic feedback within the IllustrisTNG model. We provide two examples, separately utilizing emulators and Convolutional Neural Networks, to demonstrate how such simulation suites can be used to disentangle the effects of dark matter and baryonic physics on galactic properties. The DREAMS project can be extended further to include different dark matter models, galaxy formation physics, and astrophysical targets. In this way, it will provide an unparalleled opportunity to characterize uncertainties on predictions for small-scale observables, leading to robust predictions for testing the particle physics nature of dark matter on these scales.

dark matter cosmological simulation warm dark matter models baryonic feedback emulation
Foundational AI Apr 30, 2024

KAN: Kolmogorov-Arnold Networks

Ziming Liu, Yixuan Wang, Sachin Vaidya et al.

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

kolmogorov-arnold networks interpretability spline activation functions automated discovery regression
Astrophysics Apr 29, 2024

SN 2024ggi in NGC 3621: Rising Ionization in a Nearby, CSM-Interacting Type II Supernova

W. V. Jacobson-Galán, K. W. Davis, C. D. Kilpatrick et al.

We present UV/optical/NIR observations and modeling of supernova (SN) 2024ggi, a type II supernova (SN II) located in NGC 3621 at 7.2 Mpc. Early-time ("flash") spectroscopy of SN 2024ggi within +0.8 days of discovery shows emission lines of H I, He I, C III, and N III with a narrow core and broad, symmetric wings (i.e., IIn-like) arising from the photoionized, optically-thick, unshocked circumstellar material (CSM) that surrounded the progenitor star at shock breakout. By the next spectral epoch at +1.5 days, SN 2024ggi showed a rise in ionization as emission lines of He II, C IV, N IV/V and O V became visible. This phenomenon is temporally consistent with a blueward shift in the UV/optical colors, both likely the result of shock breakout in an extended, dense CSM. The IIn-like features in SN 2024ggi persist on a timescale of $t_{\rm IIn} = 3.8 \pm 1.6$ days at which time a reduction in CSM density allows the detection of Doppler broadened features from the fastest SN material. SN 2024ggi has peak UV/optical absolute magnitudes of $M_{\rm w2} = -18.7$ mag and $M_{\rm g} = -18.1$ mag that are consistent with the known population of CSM-interacting SNe II. Comparison of SN 2024ggi with a grid of radiation hydrodynamics and non-local thermodynamic equilibrium (nLTE) radiative-transfer simulations suggests a progenitor mass-loss rate of $\dot{M} = 10^{-2}$M$_{\odot}$ yr$^{-1}$ ($v_w$ = 50 km/s), confined to a distance of $r < 5\times 10^{14}$ cm. Assuming a wind velocity of $v_w$ = 50 km/s, the progenitor star underwent an enhanced mass-loss episode in the last ~3 years before explosion.

flash spectroscopy csm interaction shock breakout stellar evolution simulation-based inference
Theoretical Physics Apr 24, 2024

Position-space renormalization schemes for four-quark operators in HQET

Joshua Lin, William Detmold, Stefan Meinel

X-space schemes are gauge-invariant, regulator-independent renormalization schemes that are defined by requiring position-space correlation functions of gauge invariant operators to be equal to their noninteracting values at particular kinematic points. These schemes can be used to nonperturbatively renormalize composite operators in Lattice Quantum Chromodynamics (LQCD), and by computing matching coefficients between the X-space scheme and MSbar in the dimensionally-regulated continuum, matrix elements calculated with LQCD can be converted to MSbar-renormalized matrix elements. Using X-space schemes for Heavy Quark Effective Theory (HQET) operators has the additional benefit that appropriate ratios of position-space correlation functions cancel the power divergent static-quark self-energy of Lattice HQET nonperturbatively. This work presents the O($α_S$) matching coefficients between X-space renormalized four-quark flavor-nonsinglet HQET operators relevant for the lifetimes of charm- and bottom-hadrons, and four-quark HQET operators relevant for mixing between neutral mesons containing a heavy quark, such as B-Bbar mixing.

renormalization x-space renormalization lattice qcd effective field theory lattice gauge theory
Astrophysics Apr 21, 2024

Learning Galaxy Intrinsic Alignment Correlations

Sneh Pandya, Yuanyuan Yang, Nicholas Van Alfen et al.

The intrinsic alignments (IA) of galaxies, regarded as a contaminant in weak lensing analyses, represents the correlation of galaxy shapes due to gravitational tidal interactions and galaxy formation processes. As such, understanding IA is paramount for accurate cosmological inferences from weak lensing surveys; however, one limitation to our understanding and mitigation of IA is expensive simulation-based modeling. In this work, we present a deep learning approach to emulate galaxy position-position ($ξ$), position-orientation ($ω$), and orientation-orientation ($η$) correlation function measurements and uncertainties from halo occupation distribution-based mock galaxy catalogs. We find strong Pearson correlation values with the model across all three correlation functions and further predict aleatoric uncertainties through a mean-variance estimation training procedure. $ξ(r)$ predictions are generally accurate to $\leq10\%$. Our model also successfully captures the underlying signal of the noisier correlations $ω(r)$ and $η(r)$, although with a lower average accuracy. We find that the model performance is inhibited by the stochasticity of the data, and will benefit from correlations averaged over multiple data realizations. Our code will be made open source upon journal publication.

intrinsic alignment emulation surrogate modeling uncertainty quantification halo occupation distribution
Foundational AI Apr 19, 2024

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Tianyuan Zhang, Hong-Xing Yu, Rundi Wu et al.

Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https://physdreamer.github.io/.

video prior distillation diffusion models score-based models physics-informed neural networks inverse problems
Theoretical Physics Apr 18, 2024

Constraints on the finite volume two-nucleon spectrum at $m_π\approx 806$ MeV

William Detmold, Marc Illa, William I. Jay et al.

The low-energy finite-volume spectrum of the two-nucleon system at a quark mass corresponding to a pion mass of $m_π\approx 806$ MeV is studied with lattice quantum chromodynamics (LQCD) using variational methods. The interpolating-operator sets used in [Phys.Rev.D 107 (2023) 9, 094508] are extended by including a complete basis of local hexaquark operators, as well as plane-wave dibaryon operators built from products of both positive- and negative-parity nucleon operators. Results are presented for the isosinglet and isotriplet two-nucleon channels. In both channels, noticably weaker variational bounds on the lowest few energy eigenvalues are obtained from operator sets which contain only hexaquark operators or operators constructed from the product of two negative-parity nucleons, while other operator sets produce low-energy variational bounds which are consistent within statistical uncertainties. The consequences of these studies for the LQCD understanding of the two-nucleon spectrum are investigated.

lattice qcd finite-volume spectroscopy variational operator basis eigenvalue decomposition hexaquark operators
Theoretical Physics Apr 17, 2024

Practical applications of machine-learned flows on gauge fields

Ryan Abbott, Michael S. Albergo, Denis Boyda et al.

Normalizing flows are machine-learned maps between different lattice theories which can be used as components in exact sampling and inference schemes. Ongoing work yields increasingly expressive flows on gauge fields, but it remains an open question how flows can improve lattice QCD at state-of-the-art scales. We discuss and demonstrate two applications of flows in replica exchange (parallel tempering) sampling, aimed at improving topological mixing, which are viable with iterative improvements upon presently available flows.

normalizing flows lattice qcd lattice gauge theory parallel tempering monte carlo methods
Theoretical Physics Apr 16, 2024

Multiscale Normalizing Flows for Gauge Theories

Ryan Abbott, Michael S. Albergo, Denis Boyda et al.

Scale separation is an important physical principle that has previously enabled algorithmic advances such as multigrid solvers. Previous work on normalizing flows has been able to utilize scale separation in the context of scalar field theories, but the principle has been largely unexploited in the context of gauge theories. This work gives an overview of a new method for generating gauge fields using hierarchical normalizing flow models. This method builds gauge fields from the outside in, allowing different parts of the model to focus on different scales of the problem. Numerical results are presented for $U(1)$ and $SU(3)$ gauge theories in 2, 3, and 4 spacetime dimensions.

normalizing flows lattice gauge theory multiscale hierarchy generative models monte carlo methods
Theoretical Physics Apr 16, 2024

TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision

Zhuo Chen, Jacob McCarran, Esteban Vizcaino et al.

Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the $\textit{Time-Evolving Natural Gradient (TENG)}$, generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG's effectiveness is further validated through its performance, surpassing current leading methods and achieving $\textit{machine precision}$ in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers' equation.

natural gradient optimization sequential-in-time optimization time-dependent variational principle neural operators loss function design
Theoretical Physics Apr 2, 2024

The Frozen Phase of Heterotic F-theory Duality

Paul-Konstantin Oehlmann, Fabian Ruehle, Benjamin Sung

We study the duality between the Spin$(32)/\mathbb{Z}_2$ heterotic string without vector structure and F-theory with frozen singularities. We give a complete description in theories with $6$d $\mathcal{N}=(1,0)$ supersymmetry and identify the duals of Spin$(32)/\mathbb{Z}_2$-instantons on ADE singularities without vector structure in the frozen phase of F-theory using an ansatz introduced by Bhardwaj, Morrison, Tachikawa, and Tomasiello. As a consequence, we obtain a strongly coupled description of orbifold phases of type I string theory without vector structure, substantially expanding the list of known examples of $6$d F-theory compactifications with frozen singularities. Supergravity theories can be fused from these instanton theories, in a way that commutes with switching off vector structure, which we use to propose new consistency checks via neutral hypermultiplet counting. Finally, we describe various Higgsings of this duality, and comment on constraints on higher form symmetries.

string theory frozen singularities heterotic-f-theory duality vector structure quantum field theory
Foundational AI Apr 1, 2024

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey et al.

The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs.

generative models model collapse synthetic data accumulation model-data feedback loops transformers
Astrophysics Apr 1, 2024

Anomaly Detection and Approximate Similarity Searches of Transients in Real-time Data Streams

P. D. Aleo, A. W. Engel, G. Narayan et al.

We present LAISS (Lightcurve Anomaly Identification and Similarity Search), an automated pipeline to detect anomalous astrophysical transients in real-time data streams. We deploy our anomaly detection model on the nightly ZTF Alert Stream via the ANTARES broker, identifying a manageable $\sim$1-5 candidates per night for expert vetting and coordinating follow-up observations. Our method leverages statistical light-curve and contextual host-galaxy features within a random forest classifier, tagging transients of rare classes (spectroscopic anomalies), of uncommon host-galaxy environments (contextual anomalies), and of peculiar or interaction-powered phenomena (behavioral anomalies). Moreover, we demonstrate the power of a low-latency ($\sim$ms) approximate similarity search method to find transient analogs with similar light-curve evolution and host-galaxy environments. We use analogs for data-driven discovery, characterization, (re-)classification, and imputation in retrospective and real-time searches. To date we have identified $\sim$50 previously known and previously missed rare transients from real-time and retrospective searches, including but not limited to: SLSNe, TDEs, SNe IIn, SNe IIb, SNe Ia-CSM, SNe Ia-91bg-like, SNe Ib, SNe Ic, SNe Ic-BL, and M31 novae. Lastly, we report the discovery of 325 total transients, all observed between 2018-2021 and absent from public catalogs ($\sim$1% of all ZTF Astronomical Transient reports to the Transient Name Server through 2021). These methods enable a systematic approach to finding the "needle in the haystack" in large-volume data streams. Because of its integration with the ANTARES broker, LAISS is built to detect exciting transients in Rubin data.

anomaly detection approximate similarity search ensemble methods supernova classification feature extraction
Foundational AI Apr 1, 2024

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

Ri-Zhao Qiu, Ge Yang, Weijia Zeng et al.

Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. Project website: https://feature-splatting.github.io/

representation learning 3d gaussian splatting feature extraction language-grounded scene editing embeddings
Foundational AI Mar 28, 2024

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Samuel Marks, Can Rager, Eric J. Michaud et al.

We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse feature circuits enable detailed understanding of unanticipated mechanisms. Because they are based on fine-grained units, sparse feature circuits are useful for downstream tasks: We introduce SHIFT, where we improve the generalization of a classifier by ablating features that a human judges to be task-irrelevant. Finally, we demonstrate an entirely unsupervised and scalable interpretability pipeline by discovering thousands of sparse feature circuits for automatically discovered model behaviors.

interpretability causal circuit discovery autoencoders sparse models feature attribution
Astrophysics Mar 27, 2024

A machine-learning pipeline for real-time detection of gravitational waves from compact binary coalescences

Ethan Marx, William Benoit, Alec Gunny et al.

The promise of multi-messenger astronomy relies on the rapid detection of gravitational waves at very low latencies ($\mathcal{O}$(1\,s)) in order to maximize the amount of time available for follow-up observations. In recent years, neural-networks have demonstrated robust non-linear modeling capabilities and millisecond-scale inference at a comparatively small computational footprint, making them an attractive family of algorithms in this context. However, integration of these algorithms into the gravitational-wave astrophysics research ecosystem has proven non-trivial. Here, we present the first fully machine learning-based pipeline for the detection of gravitational waves from compact binary coalescences (CBCs) running in low-latency. We demonstrate this pipeline to have a fraction of the latency of traditional matched filtering search pipelines while achieving state-of-the-art sensitivity to higher-mass stellar binary black holes.

gravitational waves convolutional networks signal detection low-latency gw pipeline inference-as-a-service
Foundational AI Mar 26, 2024

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.

How is knowledge stored in an LLM's weights? We study this via layer pruning: if removing a certain layer does not affect model performance in common question-answering benchmarks, then the weights in that layer are not necessary for storing the knowledge needed to answer those questions. To find these unnecessary parameters, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage, we perform a small amount of finetuning. Surprisingly, with this method we find minimal degradation of performance until after a large fraction (up to half) of the layers are removed for some common open-weight models. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge. For our study, we use parameter-efficient finetuning (PEFT) methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of our experiments can be performed on a single 40GB A100 GPU.

layer pruning fine-tuning transformers knowledge localization representation learning
Astrophysics Mar 15, 2024

Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies with CAMELS

Victoria Ono, Core Francisco Park, Nayantara Mudur et al.

Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter components that cannot be directly observed. Galaxy formation simulations can be used to study the relationship between dark matter density fields and galaxy distributions. However, this relationship can be sensitive to assumptions in cosmology and astrophysical processes embedded in the galaxy formation models, that remain uncertain in many aspects. In this work, we develop a diffusion generative model to reconstruct dark matter fields from galaxies. The diffusion model is trained on the CAMELS simulation suite that contains thousands of state-of-the-art galaxy formation simulations with varying cosmological parameters and sub-grid astrophysics. We demonstrate that the diffusion model can predict the unbiased posterior distribution of the underlying dark matter fields from the given stellar mass fields, while being able to marginalize over uncertainties in cosmological and astrophysical models. Interestingly, the model generalizes to simulation volumes approximately 500 times larger than those it was trained on, and across different galaxy formation models. Code for reproducing these results can be found at https://github.com/victoriaono/variational-diffusion-cdm

diffusion models posterior estimation dark matter cosmological simulation uncertainty quantification
Foundational AI Mar 15, 2024

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Stephanie Fu, Mark Hamilton, Laura Brandt et al.

Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features often lack the spatial resolution to directly perform dense prediction tasks like segmentation and depth prediction because models aggressively pool information over large areas. In this work, we introduce FeatUp, a task- and model-agnostic framework to restore lost spatial information in deep features. We introduce two variants of FeatUp: one that guides features with high-resolution signal in a single forward pass, and one that fits an implicit model to a single image to reconstruct features at any resolution. Both approaches use a multi-view consistency loss with deep analogies to NeRFs. Our features retain their original semantics and can be swapped into existing applications to yield resolution and performance gains even without re-training. We show that FeatUp significantly outperforms other feature upsampling and image super-resolution approaches in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation.

feature extraction superresolution multi-view consistency representation learning loss function design
Theoretical Physics Mar 13, 2024

Moments of Clarity: Streamlining Latent Spaces in Machine Learning using Moment Pooling

Rikab Gambhir, Athis Osathapan, Jesse Thaler

Many machine learning applications involve learning a latent representation of data, which is often high-dimensional and difficult to directly interpret. In this work, we propose "Moment Pooling", a natural extension of Deep Sets networks which drastically decrease latent space dimensionality of these networks while maintaining or even improving performance. Moment Pooling generalizes the summation in Deep Sets to arbitrary multivariate moments, which enables the model to achieve a much higher effective latent dimensionality for a fixed latent dimension. We demonstrate Moment Pooling on the collider physics task of quark/gluon jet classification by extending Energy Flow Networks (EFNs) to Moment EFNs. We find that Moment EFNs with latent dimensions as small as 1 perform similarly to ordinary EFNs with higher latent dimension. This small latent dimension allows for the internal representation to be directly visualized and interpreted, which in turn enables the learned internal jet representation to be extracted in closed form.

moment pooling dimensionality reduction representation learning interpretability jet physics
Astrophysics Mar 13, 2024

PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models

Siddharth Mishra-Sharma, Yiding Song, Jesse Thaler

We present PAPERCLIP (Proposal Abstracts Provide an Effective Representation for Contrastive Language-Image Pre-training), a method which associates astronomical observations imaged by telescopes with natural language using a neural network model. The model is fine-tuned from a pre-trained Contrastive Language-Image Pre-training (CLIP) model using successful observing proposal abstracts and corresponding downstream observations, with the abstracts optionally summarized via guided generation using large language models (LLMs). Using observations from the Hubble Space Telescope (HST) as an example, we show that the fine-tuned model embodies a meaningful joint representation between observations and natural language through tests targeting image retrieval (i.e., finding the most relevant observations using natural language queries) and description retrieval (i.e., querying for astrophysical object classes and use cases most relevant to a given observation). Our study demonstrates the potential for using generalist foundation models rather than task-specific models for interacting with astronomical data by leveraging text as an interface.

contrastive learning fine-tuning representation learning multi-modal foundation model embeddings
Astrophysics Mar 12, 2024

Superphot+: Realtime Fitting and Classification of Supernova Light Curves

Kaylee M. de Soto, Ashley Villar, Edo Berger et al.

Photometric classifications of supernova (SN) light curves have become necessary to utilize the full potential of large samples of observations obtained from wide-field photometric surveys, such as the Zwicky Transient Facility (ZTF) and the Vera C. Rubin Observatory. Here, we present a photometric classifier for SN light curves that does not rely on redshift information and still maintains comparable accuracy to redshift-dependent classifiers. Our new package, Superphot+, uses a parametric model to extract meaningful features from multiband SN light curves. We train a gradient-boosted machine with fit parameters from 6,061 ZTF SNe that pass data quality cuts and are spectroscopically classified as one of five classes: SN Ia, SN II, SN Ib/c, SN IIn, and SLSN-I. Without redshift information, our classifier yields a class-averaged F1-score of 0.61 +/- 0.02 and a total accuracy of 0.83 +/- 0.01. Including redshift information improves these metrics to 0.71 +/- 0.02 and 0.88 +/- 0.01, respectively. We assign new class probabilities to 3,558 ZTF transients that show SN-like characteristics (based on the ALeRCE Broker light curve and stamp classifiers), but lack spectroscopic classifications. Finally, we compare our predicted SN labels with those generated by the ALeRCE light curve classifier, finding that the two classifiers agree on photometric labels for 82 +/- 2% of light curves with spectroscopic labels and 72% of light curves without spectroscopic labels. Superphot+ is currently classifying ZTF SNe in real time via the ANTARES Broker, and is designed for simple adaptation to six-band Rubin light curves in the future.

supernova classification classification feature extraction real-time transient classification ensemble methods
Experimental Physics Mar 11, 2024

Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Philip Harris, Michael Kagan, Jeffrey Krupa et al.

Self-Supervised Learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose RS3L ("Re-simulation-based self-supervised representation learning"), a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation for contrastive learning in the physical sciences, particularly, in fields that rely on stochastic simulators. By intervening in the middle of the simulation process and re-running simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how RS3L pre-training enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.

self-supervised learning re-simulation augmentation contrastive learning data augmentation representation learning
Theoretical Physics Mar 11, 2024

On classical de Sitter solutions and parametric control

David Andriot, Fabian Ruehle

Finding string backgrounds with de Sitter spacetime, where all approximations and corrections are controlled, is an open problem. We revisit the search for de Sitter solutions in the classical regime for specific type IIB supergravity compactifications on group manifolds, an under-explored corner of the landscape that offers an interesting testing ground for swampland conjectures. While the supergravity de Sitter solutions we obtain numerically are ambiguous in terms of their classicality, we find an analytic scaling that makes four out of six compactification radii, as well as the overall volume, arbitrarily large. This potentially provides parametric control over corrections. If we could show that these solutions, or others to be found, are fully classical, they would constitute a counterexample to conjectures stating that asymptotic de Sitter solutions do not exist. We discuss this point in great detail.

de sitter solutions parametric control string theory flux compactification effective field theory
Theoretical Physics Mar 7, 2024

Photonic probabilistic machine learning using quantum vacuum noise

Seou Choi, Yannick Salamin, Charles Roques-Carmes et al.

Probabilistic machine learning utilizes controllable sources of randomness to encode uncertainty and enable statistical modeling. Harnessing the pure randomness of quantum vacuum noise, which stems from fluctuating electromagnetic fields, has shown promise for high speed and energy-efficient stochastic photonic elements. Nevertheless, photonic computing hardware which can control these stochastic elements to program probabilistic machine learning algorithms has been limited. Here, we implement a photonic probabilistic computer consisting of a controllable stochastic photonic element - a photonic probabilistic neuron (PPN). Our PPN is implemented in a bistable optical parametric oscillator (OPO) with vacuum-level injected bias fields. We then program a measurement-and-feedback loop for time-multiplexed PPNs with electronic processors (FPGA or GPU) to solve certain probabilistic machine learning tasks. We showcase probabilistic inference and image generation of MNIST-handwritten digits, which are representative examples of discriminative and generative models. In both implementations, quantum vacuum noise is used as a random seed to encode classification uncertainty or probabilistic generation of samples. In addition, we propose a path towards an all-optical probabilistic computing platform, with an estimated sampling rate of ~ 1 Gbps and energy consumption of ~ 5 fJ/MAC. Our work paves the way for scalable, ultrafast, and energy-efficient probabilistic machine learning hardware.

photonic probabilistic neuron optical parametric oscillator stochastic processes generative models bayesian inference
Theoretical Physics Mar 5, 2024

Operator Learning Renormalization Group

Xiu-Zhe Luo, Di Luo, Roger G. Melko

In this paper, we present a general framework for quantum many-body simulations called the operator learning renormalization group (OLRG). Inspired by machine learning perspectives, OLRG is a generalization of Wilson's numerical renormalization group and White's density matrix renormalization group, which recursively builds a simulatable system to approximate a target system of the same number of sites via operator maps. OLRG uses a loss function to minimize the error of a target property directly by learning the operator map in lieu of a state ansatz. This loss function is designed by a scaling consistency condition that also provides a provable bound for real-time evolution. We implement two versions of the operator maps for classical and quantum simulations. The former, which we call the Operator Matrix Map, can be implemented via neural networks on classical computers. The latter, which we call the Hamiltonian Expression Map, generates device pulse sequences to leverage the capabilities of quantum computing hardware. We illustrate the performance of both maps for calculating time-dependent quantities in the quantum Ising model Hamiltonian.

renormalization operator map learning quantum simulation scaling consistency condition loss function design
Foundational AI Mar 5, 2024

Quantum Many-Body Physics Calculations with Large Language Models

Haining Pan, Nayantara Mudur, Will Taranto et al.

Large language models (LLMs) have demonstrated an unprecedented ability to perform complex tasks in multiple domains, including mathematical and scientific reasoning. We demonstrate that with carefully designed prompts, LLMs can accurately carry out key calculations in research papers in theoretical physics. We focus on a broadly used approximation method in quantum physics: the Hartree-Fock method, requiring an analytic multi-step calculation deriving approximate Hamiltonian and corresponding self-consistency equations. To carry out the calculations using LLMs, we design multi-step prompt templates that break down the analytic calculation into standardized steps with placeholders for problem-specific information. We evaluate GPT-4's performance in executing the calculation for 15 research papers from the past decade, demonstrating that, with correction of intermediate steps, it can correctly derive the final Hartree-Fock Hamiltonian in 13 cases and makes minor errors in 2 cases. Aggregating across all research papers, we find an average score of 87.5 (out of 100) on the execution of individual calculation steps. Overall, the requisite skill for doing these calculations is at the graduate level in quantum condensed matter theory. We further use LLMs to mitigate the two primary bottlenecks in this evaluation process: (i) extracting information from papers to fill in templates and (ii) automatic scoring of the calculation steps, demonstrating good results in both cases. The strong performance is the first step for developing algorithms that automatically explore theoretical hypotheses at an unprecedented scale.

hartree-fock method prompt engineering hamiltonian systems transformers llm scientific reasoning
Experimental Physics Feb 29, 2024

New Pathways in Neutrino Physics via Quantum-Encoded Data Analysis

Jeffrey Lazar, Santiago Giner Olavarrieta, Giancarlo Gatti et al.

Ever-increasing amount of data is produced by particle detectors in their quest to unveil the laws of Nature. The large data rate requires the use of specialized triggers that promptly reduce the data rate to a manageable level; however, in doing so, unexpected new phenomena may escape detection. Additionally, the large data rate is increasingly difficult to analyze effectively, which has led to a recent revolution on machine learning techniques. Here, we present a methodology based on recent quantum compression techniques that has the capacity to store exponentially more amount of information than classically available methods. To demonstrate this, we encode the full neutrino telescope event information using parity observables in an IBM quantum processor using 8 qubits. Then we show that we can recover the information stored on the quantum computer with a fidelity of 84%. Finally, we illustrate the use of our protocol by performing a classification task that separates electron-neutrino events to muon-neutrinos events in a neutrino telescope. This new capability would eventually allow us to solve the street light effect in particle physics, where we only record signatures of particles with which we are familiar.

quantum computing quantum states quantum data encoding neutrino detection parity observables
Foundational AI Feb 26, 2024

Renormalization Group flow, Optimal Transport and Diffusion-based Generative Model

Artan Sheshmani, Yi-Zhuang You, Baturalp Buyukates et al.

Diffusion-based generative models represent a forefront direction in generative AI research today. Recent studies in physics have suggested that the renormalization group (RG) can be conceptualized as a diffusion process. This insight motivates us to develop a novel diffusion-based generative model by reversing the momentum-space RG flow. We establish a framework that interprets RG flow as optimal transport gradient flow, which minimizes a functional analogous to the Kullback-Leibler divergence, thereby bridging statistical physics and information theory. Our model applies forward and reverse diffusion processes in Fourier space, exploiting the sparse representation of natural images in this domain to efficiently separate signal from noise and manage image features across scales. By introducing a scale-dependent noise schedule informed by a dispersion relation, the model optimizes denoising performance and image generation in Fourier space, taking advantage of the distinct separation of macro and microscale features. Experimental validations on standard datasets demonstrate the model's capability to generate high-quality images while significantly reducing training time compared to existing image-domain diffusion models. This approach not only enhances our understanding of the generative processes in images but also opens new pathways for research in generative AI, leveraging the convergence of theoretical physics, optimal transport, and machine learning principles.

diffusion models optimal transport renormalization generative models fourier-space diffusion
Experimental Physics Feb 21, 2024

Seeing Double: Calibrating Two Jets at Once

Rikab Gambhir, Benjamin Nachman

Jet energy calibration is an important aspect of many measurements and searches at the LHC. Currently, these calibrations are performed on a per-jet basis, i.e. agnostic to the properties of other jets in the same event. In this work, we propose taking advantage of the correlations induced by momentum conservation between jets in order to improve their jet energy calibration. By fitting the $p_T$ asymmetry of dijet events in simulation, while remaining agnostic to the $p_T$ spectra themselves, we are able to obtain correlation-improved maximum likelihood estimates. This approach is demonstrated with simulated jets from the CMS Detector, yielding a $3$-$5\%$ relative improvement in the jet energy resolution, corresponding to a quadrature improvement of approximately 35\%.

jet physics calibration likelihood estimation collider physics conservation laws
Theoretical Physics Feb 20, 2024

Rigor with Machine Learning from Field Theory to the Poincaré Conjecture

Sergei Gukov, James Halverson, Fabian Ruehle

Machine learning techniques are increasingly powerful, leading to many breakthroughs in the natural sciences, but they are often stochastic, error-prone, and blackbox. How, then, should they be utilized in fields such as theoretical physics and pure mathematics that place a premium on rigor and understanding? In this Perspective we discuss techniques for obtaining rigor in the natural sciences with machine learning. Non-rigorous methods may lead to rigorous results via conjecture generation or verification by reinforcement learning. We survey applications of these techniques-for-rigor ranging from string theory to the smooth $4$d Poincaré conjecture in low-dimensional topology. One can also imagine building direct bridges between machine learning theory and either mathematics or theoretical physics. As examples, we describe a new approach to field theory motivated by neural network theory, and a theory of Riemannian metric flows induced by neural network gradient descent, which encompasses Perelman's formulation of the Ricci flow that was utilized to resolve the $3$d Poincaré conjecture.

neural network field theory quantum field theory conjecture generation reinforcement learning ricci flow
Astrophysics Feb 20, 2024

Full-shape analysis with simulation-based priors: constraints on single field inflation from BOSS

Mikhail M. Ivanov, Carolina Cuesta-Lazaro, Siddharth Mishra-Sharma et al.

Perturbative, or effective field theory (EFT)-based, full-shape analyses of galaxy clustering data involve ``nuisance parameters'' to capture various observational effects such as the galaxy-dark matter connection (galaxy bias). We present an efficient approach to set informative physically motivated priors on these parameters. We extract these priors from simulated galaxy catalogs based on halo occupation distribution (HOD) models. First, we build a joint distribution between EFT galaxy bias and HOD parameters from a set of 10,500 HOD mock catalogs. We use the field level EFT technique that allows for cosmic variance cancellation, enabling a precision calibration of EFT parameters from computationally inexpensive small-volume simulations. Second, we use neural density estimators -- normalizing flows -- to model the marginal probability density of the EFT parameters, which can be used as a prior distribution in full shape analyses. As a first application, we use our HOD-based priors in a new analysis of galaxy power spectra and bispectra from the BOSS survey in the context of single field primordial non-Gaussianity. We find that our priors lead to a reduction of the posterior volume of bias parameters by an order of magnitude. We also find $f_{\rm NL}^{\rm equil} = 320\pm 300$ and $f_{\rm NL}^{\rm ortho} = 100\pm 130$ (at 68\% CL) in a combined two-template analysis, representing a $\approx 40\%$ improvement in constraints on single field primordial non-Gaussianity, equivalent to doubling the survey volume.

effective field theory primordial non-gaussianity normalizing flows simulation-based inference galaxy bias calibration
Theoretical Physics Feb 9, 2024

Real-time Dynamics of the Schwinger Model as an Open Quantum System with Neural Density Operators

Joshua Lin, Di Luo, Xiaojun Yao et al.

Ab-initio simulations of multiple heavy quarks propagating in a Quark-Gluon Plasma are computationally difficult to perform due to the large dimension of the space of density matrices. This work develops machine learning algorithms to overcome this difficulty by approximating exact quantum states with neural network parametrisations, specifically Neural Density Operators. As a proof of principle demonstration in a QCD-like theory, the approach is applied to solve the Lindblad master equation in the 1+1d lattice Schwinger Model as an open quantum system. Neural Density Operators enable the study of in-medium dynamics on large lattice volumes, where multiple-string interactions and their effects on string-breaking and recombination phenomena can be studied. Thermal properties of the system at equilibrium can also be probed with these methods by variationally constructing the steady state of the Lindblad master equation. Scaling of this approach with system size is studied, and numerical demonstrations on up to 32 spatial lattice sites and with up to 3 interacting strings are performed.

neural density operators lindblad dynamics open quantum systems lattice gauge theory quantum simulation
Foundational AI Feb 8, 2024

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

David D. Baek, Ziming Liu, Max Tegmark

We present GenEFT: an effective theory framework for shedding light on the statics and dynamics of neural network generalization, and illustrate it with graph learning examples. We first investigate the generalization phase transition as data size increases, comparing experimental results with information-theory-based approximations. We find generalization in a Goldilocks zone where the decoder is neither too weak nor too powerful. We then introduce an effective theory for the dynamics of representation learning, where latent-space representations are modeled as interacting particles (repons), and find that it explains our experimentally observed phase transition between generalization and overfitting as encoder and decoder learning rates are scanned. This highlights the power of physics-inspired effective theories for bridging the gap between theoretical predictions and practice in machine learning.

effective field theory phase transitions representation learning interacting repon theory autoencoders
Foundational AI Feb 7, 2024

Opening the AI black box: program synthesis via mechanistic interpretability

Eric J. Michaud, Isaac Liao, Vedang Lad et al.

We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm. As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub. We discuss opportunities and challenges for scaling up this approach to make machine-learned models more interpretable and trustworthy.

interpretability program synthesis mechanistic interpretability autoencoders recurrent networks
Foundational AI Feb 7, 2024

A Resource Model For Neural Scaling Law

Jinyeop Song, Ziming Liu, Max Tegmark et al.

Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.

scalability neural scaling law resource allocation modular subnetworks sparse models
Astrophysics Feb 6, 2024

LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology

Matthew Ho, Deaglan J. Bartlett, Nicolas Chartier et al.

This paper presents the Learning the Universe Implicit Likelihood Inference (LtU-ILI) pipeline, a codebase for rapid, user-friendly, and cutting-edge machine learning (ML) inference in astrophysics and cosmology. The pipeline includes software for implementing various neural architectures, training schemata, priors, and density estimators in a manner easily adaptable to any research workflow. It includes comprehensive validation metrics to assess posterior estimate coverage, enhancing the reliability of inferred results. Additionally, the pipeline is easily parallelizable and is designed for efficient exploration of modeling hyperparameters. To demonstrate its capabilities, we present real applications across a range of astrophysics and cosmology problems, such as: estimating galaxy cluster masses from X-ray photometry; inferring cosmology from matter power spectra and halo point clouds; characterizing progenitors in gravitational wave signals; capturing physical dust parameters from galaxy colors and luminosities; and establishing properties of semi-analytic models of galaxy formation. We also include exhaustive benchmarking and comparisons of all implemented methods as well as discussions about the challenges and pitfalls of ML inference in astronomical sciences. All code and examples are made publicly available at https://github.com/maho3/ltu-ili.

simulation-based inference posterior estimation density estimation bayesian inference normalizing flows
Astrophysics Feb 5, 2024

Equivariant Symmetry Breaking Sets

YuQing Xie, Tess Smidt

Equivariant neural networks (ENNs) have been shown to be extremely effective in applications involving underlying symmetries. By construction ENNs cannot produce lower symmetry outputs given a higher symmetry input. However, symmetry breaking occurs in many physical systems and we may obtain a less symmetric stable state from an initial highly symmetric one. Hence, it is imperative that we understand how to systematically break symmetry in ENNs. In this work, we propose a novel symmetry breaking framework that is fully equivariant and is the first which fully addresses spontaneous symmetry breaking. We emphasize that our approach is general and applicable to equivariance under any group. To achieve this, we introduce the idea of symmetry breaking sets (SBS). Rather than redesign existing networks, we design sets of symmetry breaking objects which we feed into our network based on the symmetry of our inputs and outputs. We show there is a natural way to define equivariance on these sets, which gives an additional constraint. Minimizing the size of these sets equates to data efficiency. We prove that minimizing these sets translates to a well studied group theory problem, and tabulate solutions to this problem for the point groups. Finally, we provide some examples of symmetry breaking to demonstrate how our approach works in practice. The code for these examples is available at \url{https://github.com/atomicarchitects/equivariant-SBS}.

equivariant neural networks symmetry breaking symmetry breaking sets group theory normalizer constraint
Astrophysics Jan 29, 2024

Substructure Detection in Realistic Strong Lensing Systems with Machine Learning

Arthur Tsang, Atınç Çağan Şengül, Cora Dvorkin

Tens of thousands of galaxy-galaxy strong lensing systems are expected to be discovered by the end of the decade. These will form a vast new dataset that can be used to probe subgalactic dark matter structures through its gravitational effects, which will in turn allow us to study the nature of dark matter at small length scales. This work shows how we can leverage machine learning to search through the data and identify which systems are most likely to contain dark matter substructure and thus can be studied in greater depth. We use a UNet, an image segmentation architecture, on a simulated strongly-lensed dataset with realistic sources (COSMOS galaxies), lenses (power-law elliptical profiles with multipoles and external shear), and noise. Our machine learning algorithm is able to quickly detect most substructure at high image resolution and subhalo concentration. At a false positive rate of $10\%$, we are able to identify systems with substructure at a true positive rate of $71\%$ for a subhalo mass range of $10^{9}\text{-}10^{9.5}\,M_\odot$. While recent detections are consistent with higher concentrations, we find that our algorithm fails at detecting subhalos with lower concentrations (expected from $Λ$CDM simulations).

dark matter gravitational lensing convolutional networks image segmentation classification
Foundational AI Jan 21, 2024

Rigid Schubert classes in partial flag varieties

Yuxiang Liu, Artan Sheshmani, Shing-Tung Yau

A Schubert class is called rigid if it can only be represented by Schubert varieties. The rigid Schubert classes have been classified in Grassmannians and orthogonal Grassmannians. In this paper, we study the rigidity problem in partial flag varieties (type A) and orthogonal partial flag varieties (type B and type D). In particular, we give numerical conditions that ensure a Schubert class is rigid.

schubert variety rigidity partial flag variety rational equivalence schubert calculus group theory
Theoretical Physics Jan 19, 2024

Applications of flow models to the generation of correlated lattice QCD ensembles

Ryan Abbott, Aleksandar Botev, Denis Boyda et al.

Machine-learned normalizing flows can be used in the context of lattice quantum field theory to generate statistically correlated ensembles of lattice gauge fields at different action parameters. This work demonstrates how these correlations can be exploited for variance reduction in the computation of observables. Three different proof-of-concept applications are demonstrated using a novel residual flow architecture: continuum limits of gauge theories, the mass dependence of QCD observables, and hadronic matrix elements based on the Feynman-Hellmann approach. In all three cases, it is shown that statistical uncertainties are significantly reduced when machine-learned flows are incorporated as compared with the same calculations performed with uncorrelated ensembles or direct reweighting.

normalizing flows lattice qcd correlated ensembles lattice gauge theory generative models
Foundational AI Jan 18, 2024

SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression

Ho Fung Tsoi, Vladimir Loncar, Sridhara Dasu et al.

Compact symbolic expressions have been shown to be more efficient than neural network models in terms of resource consumption and inference speed when implemented on custom hardware such as FPGAs, while maintaining comparable accuracy~\cite{tsoi2023symbolic}. These capabilities are highly valuable in environments with stringent computational resource constraints, such as high-energy physics experiments at the CERN Large Hadron Collider. However, finding compact expressions for high-dimensional datasets remains challenging due to the inherent limitations of genetic programming, the search algorithm of most symbolic regression methods. Contrary to genetic programming, the neural network approach to symbolic regression offers scalability to high-dimensional inputs and leverages gradient methods for faster equation searching. Common ways of constraining expression complexity often involve multistage pruning with fine-tuning, which can result in significant performance loss. In this work, we propose $\tt{SymbolNet}$, a neural network approach to symbolic regression specifically designed as a model compression technique, aimed at enabling low-latency inference for high-dimensional inputs on custom hardware such as FPGAs. This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term for each pruning type, which can adaptively adjust its strength, leading to convergence at a target sparsity ratio. Unlike most existing symbolic regression methods that struggle with datasets containing more than $\mathcal{O}(10)$ inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs).

symbolic regression dynamic pruning sparse models model compression interpretability
Astrophysics Dec 27, 2023

A Physics-Informed Variational Autoencoder for Rapid Galaxy Inference and Anomaly Detection

Alexander Gagliano, V. Ashley Villar

The Vera C. Rubin Observatory is slated to observe nearly 20 billion galaxies during its decade-long Legacy Survey of Space and Time. The rich imaging data it collects will be an invaluable resource for probing galaxy evolution across cosmic time, characterizing the host galaxies of transient phenomena, and identifying novel populations of anomalous systems. To facilitate these studies, we introduce a convolutional variational autoencoder trained to estimate the redshift, stellar mass, and star-formation rates of galaxies from multi-band imaging data. We train and test our physics-informed CVAE on a spectroscopic sample of $\sim$26,000 galaxies within $z<1$ imaged through the Dark Energy Camera Legacy Survey. We show that our model can infer redshift and stellar mass more accurately than the latest image-based self-supervised learning approaches, and is >100x faster than more computationally-intensive SED-fitting techniques. Using a small sample of Green Pea and Red Spiral galaxies reported in the literature, we further demonstrate how this CVAE can be used to rapidly identify rare galaxy populations and interpret what makes them unique.

variational autoencoders physics-informed neural networks disentangled representations anomaly detection convolutional networks
Experimental Physics Dec 21, 2023

Applications of Lipschitz neural networks to the Run 3 LHCb trigger system

Blaise Delaney, Nicole Schulte, Gregory Ciezarek et al.

The operating conditions defining the current data taking campaign at the Large Hadron Collider, known as Run 3, present unparalleled challenges for the real-time data acquisition workflow of the LHCb experiment at CERN. To address the anticipated surge in luminosity and consequent event rate, the LHCb experiment is transitioning to a fully software-based trigger system. This evolution necessitated innovations in hardware configurations, software paradigms, and algorithmic design. A significant advancement is the integration of monotonic Lipschitz neural networks into the LHCb trigger system. These deep learning models offer certified robustness against detector instabilities, and the ability to encode domain-specific inductive biases. Such properties are crucial for the inclusive heavy-flavour triggers and, most notably, for the topological triggers designed to inclusively select $b$-hadron candidates by exploiting the unique kinematic and decay topologies of beauty decays. This paper describes the recent progress in integrating Lipschitz neural networks into the topological triggers, highlighting the resulting enhanced sensitivity to highly displaced multi-body candidates produced within the LHCb acceptance.

trigger systems lipschitz neural networks robustness classification monotonic constraints
Experimental Physics Dec 21, 2023

First search for dark-trident processes using the MicroBooNE detector

MicroBooNE collaboration, P. Abratenko, O. Alterkait et al.

We present a first search for dark-trident scattering in a neutrino beam using a data set corresponding to $7.2 \times 10^{20}$ protons on target taken with the MicroBooNE detector at Fermilab. Proton interactions in the neutrino target at the Main Injector produce $π^0$ and $η$ mesons, which could decay into dark-matter (DM) particles mediated via a dark photon $A^\prime$. A convolutional neural network is trained to identify interactions of the DM particles in the liquid-argon time projection chamber (LArTPC) exploiting its image-like reconstruction capability. In the absence of a DM signal, we provide limits at the $90\%$ confidence level on the squared kinematic mixing parameter $\varepsilon^2$ as a function of the dark-photon mass in the range $10\le M_{A^\prime}\le 400$ MeV. The limits cover previously unconstrained parameter space for the production of fermion or scalar DM particles $χ$ for two benchmark models with mass ratios $M_χ/M_{A^\prime}=0.6$ and $2$ and for dark fine-structure constants $0.1\leα_D\le 1$.

dark matter dark-trident scattering new physics searches dark photon kinematic mixing
Astrophysics Dec 18, 2023

Inhomogeneous Energy Injection in the 21-cm Power Spectrum: Sensitivity to Dark Matter Decay

Yitian Sun, Joshua W. Foster, Hongwan Liu et al.

The 21-cm signal provides a novel avenue to measure the thermal state of the universe during cosmic dawn and reionization (redshifts $z\sim 5-30$), and thus to probe energy injection from decaying or annihilating dark matter (DM). These DM processes are inherently inhomogeneous: both decay and annihilation are density dependent, and furthermore the fraction of injected energy that is deposited at each point depends on the gas ionization and density, leading to further anisotropies in absorption and propagation. In this work, we develop a new framework for modeling the impact of spatially inhomogeneous energy injection and deposition during cosmic dawn, accounting for ionization and baryon density dependence, as well as the attenuation of propagating photons. We showcase how this first completely inhomogeneous treatment affects the predicted 21-cm power spectrum in the presence of exotic sources of energy injection, and forecast the constraints that upcoming HERA measurements of the 21-cm power spectrum will set on DM decays to photons and to electron/positron pairs. These projected constraints considerably surpass those derived from CMB and Lyman-$α$ measurements, and for decays to electron/positron pairs they exceed all existing constraints in the sub-GeV mass range, reaching lifetimes of $\sim 10^{28}\,\mathrm{s}$. Our analysis demonstrates the unprecedented sensitivity of 21-cm cosmology to exotic sources of energy injection during the cosmic dark ages. Our code, $\mathtt{DM21cm}$, includes all these effects and is publicly available in an accompanying release.

dark matter 21-cm power spectrum inhomogeneous energy injection cosmological simulation reionization modeling
Astrophysics Dec 12, 2023

Cosmological Field Emulation and Parameter Inference with Diffusion Models

Nayantara Mudur, Carolina Cuesta-Lazaro, Douglas P. Finkbeiner

Cosmological simulations play a crucial role in elucidating the effect of physical parameters on the statistics of fields and on constraining parameters given information on density fields. We leverage diffusion generative models to address two tasks of importance to cosmology -- as an emulator for cold dark matter density fields conditional on input cosmological parameters $Ω_m$ and $σ_8$, and as a parameter inference model that can return constraints on the cosmological parameters of an input field. We show that the model is able to generate fields with power spectra that are consistent with those of the simulated target distribution, and capture the subtle effect of each parameter on modulations in the power spectrum. We additionally explore their utility as parameter inference models and find that we can obtain tight constraints on cosmological parameters.

diffusion models cosmological simulation emulation posterior estimation dark matter
Foundational AI Dec 8, 2023

Derived Moduli Spaces of Nonlinear PDEs I: Singular Propagations

Jacob Kryczka, Artan Sheshmani, Shing-Tung Yau

We construct a sheaf theoretic and derived geometric machinery to study nonlinear partial differential equations and their singular supports. We establish a notion of derived microlocalization for solution spaces of non-linear equations and develop a formalism to pose and solve singular non-linear Cauchy problems globally. Using this approach we estimate the domains of propagation for the solutions of non-linear systems. It is achieved by exploiting the fact that one may greatly enrich and simplify the study of derived non-linear PDEs over a space $X$ by studying its derived linearization which is a module over the sheaf of functions on the $S^1$-equivariant derived loop stack $\mathcal{L}X$.

derived algebraic geometry nonlinear pde theory microlocal analysis derived loop stacks sheaf theory
Foundational AI Dec 5, 2023

Generating Interpretable Networks using Hypernetworks

Isaac Liao, Ziming Liu, Max Tegmark

An essential goal in mechanistic interpretability to decode a network, i.e., to convert a neural network's raw weights to an interpretable algorithm. Given the difficulty of the decoding problem, progress has been made to understand the easier encoding problem, i.e., to convert an interpretable algorithm into network weights. Previous works focus on encoding existing algorithms into networks, which are interpretable by definition. However, focusing on encoding limits the possibility of discovering new algorithms that humans have never stumbled upon, but that are nevertheless interpretable. In this work, we explore the possibility of using hypernetworks to generate interpretable networks whose underlying algorithms are not yet known. The hypernetwork is carefully designed such that it can control network complexity, leading to a diverse family of interpretable algorithms ranked by their complexity. All of them are interpretable in hindsight, although some of them are less intuitive to humans, hence providing new insights regarding how to "think" like a neural network. For the task of computing L1 norms, hypernetworks find three algorithms: (a) the double-sided algorithm, (b) the convexity algorithm, (c) the pudding algorithm, although only the first algorithm was expected by the authors before experiments. We automatically classify these algorithms and analyze how these algorithmic phases develop during training, as well as how they are affected by complexity control. Furthermore, we show that a trained hypernetwork can correctly construct models for input dimensions not seen in training, demonstrating systematic generalization.

interpretability hypernetworks algorithmic phase discovery automated discovery phase transitions
Foundational AI Dec 5, 2023

Shifted symplectic structures on derived Quot-stacks II -- Derived Quot-schemes as dg manifolds

Dennis Borisov, Ludmil Katzarkov, Artan Sheshmani

It is proved that derived Quot-schemes, as defined by Ciocan-Fontanine and Kapranov, are represented by dg manifolds of finite type. This is the second part if a work aimed to analyze shifted symplectic structures on moduli spaces of coherent sheaves on Calabi--Yau manifolds. The first part related dg manifolds to derived schemes as defined by Toën and Vezzosi.

derived algebraic geometry moduli spaces of sheaves dg lie algebras maurer-cartan equations homotopy limit
Theoretical Physics Nov 30, 2023

Anomaly Detection in Collider Physics via Factorized Observables

Eric M. Metodiev, Jesse Thaler, Raymond Wynne

To maximize the discovery potential of high-energy colliders, experimental searches should be sensitive to unforeseen new physics scenarios. This goal has motivated the use of machine learning for unsupervised anomaly detection. In this paper, we introduce a new anomaly detection strategy called FORCE: factorized observables for regressing conditional expectations. Our approach is based on the inductive bias of factorization, which is the idea that the physics governing different energy scales can be treated as approximately independent. Assuming factorization holds separately for signal and background processes, the appearance of non-trivial correlations between low- and high-energy observables is a robust indicator of new physics. Under the most restrictive form of factorization, a machine-learned model trained to identify such correlations will in fact converge to the optimal new physics classifier. We test FORCE on a benchmark anomaly detection task for the Large Hadron Collider involving collimated sprays of particles called jets. By teasing out correlations between the kinematics and substructure of jets, our method can reliably extract percent-level signal fractions. This strategy for uncovering new physics adds to the growing toolbox of anomaly detection methods for collider physics with a complementary set of assumptions.

anomaly detection factorization-based anomaly detection collider physics jet physics conditional expectation regression
Foundational AI Nov 30, 2023

One-step Diffusion with Distribution Matching Distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang et al.

Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware.

diffusion models distribution matching distillation score-based models one-step generation fake score estimation
Astrophysics Nov 29, 2023

Learning an Effective Evolution Equation for Particle-Mesh Simulations Across Cosmologies

Nicolas Payot, Pablo Lemos, Laurence Perreault-Levasseur et al.

Particle-mesh simulations trade small-scale accuracy for speed compared to traditional, computationally expensive N-body codes in cosmological simulations. In this work, we show how a data-driven model could be used to learn an effective evolution equation for the particles, by correcting the errors of the particle-mesh potential incurred on small scales during simulations. We find that our learnt correction yields evolution equations that generalize well to new, unseen initial conditions and cosmologies. We further demonstrate that the resulting corrected maps can be used in a simulation-based inference framework to yield an unbiased inference of cosmological parameters. The model, a network implemented in Fourier space, is exclusively trained on the particle positions and velocities.

cosmological simulation particle-mesh correction simulation-based inference differentiable simulation equivariant neural networks
Astrophysics Nov 28, 2023

A point cloud approach to generative modeling for galaxy surveys at the field level

Carolina Cuesta-Lazaro, Siddharth Mishra-Sharma

We introduce a diffusion-based generative model to describe the distribution of galaxies in our Universe directly as a collection of points in 3-D space (coordinates) optionally with associated attributes (e.g., velocities and masses), without resorting to binning or voxelization. The custom diffusion model can be used both for emulation, reproducing essential summary statistics of the galaxy distribution, as well as inference, by computing the conditional likelihood of a galaxy field. We demonstrate a first application to massive dark matter haloes in the Quijote simulation suite. This approach can be extended to enable a comprehensive analysis of cosmological data, circumventing limitations inherent to summary statistic -- as well as neural simulation-based inference methods.

diffusion models generative models point cloud diffusion likelihood estimation field-level inference
Foundational AI Nov 27, 2023

Symphony: Symmetry-Equivariant Point-Centered Spherical Harmonics for 3D Molecule Generation

Ameya Daigavane, Song Kim, Mario Geiger et al.

We present Symphony, an $E(3)$-equivariant autoregressive generative model for 3D molecular geometries that iteratively builds a molecule from molecular fragments. Existing autoregressive models such as G-SchNet and G-SphereNet for molecules utilize rotationally invariant features to respect the 3D symmetries of molecules. In contrast, Symphony uses message-passing with higher-degree $E(3)$-equivariant features. This allows a novel representation of probability distributions via spherical harmonic signals to efficiently model the 3D geometry of molecules. We show that Symphony is able to accurately generate small molecules from the QM9 dataset, outperforming existing autoregressive models and approaching the performance of diffusion models.

equivariant neural networks symmetry preservation spherical harmonic signals generative models geometric deep learning
Foundational AI Nov 20, 2023

Quantum Inception Score

Akira Sone, Akira Tanji, Naoki Yamamoto

Motivated by the great success of classical generative models in machine learning, enthusiastic exploration of their quantum version has recently started. To depart on this journey, it is important to develop a relevant metric to evaluate the quality of quantum generative models; in the classical case, one such example is the (classical) inception score (cIS). In this paper, as a natural extension of cIS, we propose the quantum inception score (qIS) for quantum generators. Importantly, qIS relates the quality to the Holevo information of the quantum channel that classifies a given dataset. In this context, we show several properties of qIS. First, qIS is greater than or equal to the corresponding cIS, which is defined through projection measurements on the system output. Second, the difference between qIS and cIS arises from the presence of quantum coherence, as characterized by the resource theory of asymmetry. Third, when a set of entangled generators is prepared, there exists a classifying process leading to the further enhancement of qIS. Fourth, we harness the quantum fluctuation theorem to characterize the physical limitation of qIS. Finally, we apply qIS to assess the quality of the one-dimensional spin chain model as a quantum generative model, with the quantum convolutional neural network as a quantum classifier, for the phase classification problem in the quantum many-body physics.

quantum inception score generative models holevo information quantum computing entanglement
Foundational AI Nov 15, 2023

Super Gromov-Witten Invariants via torus localization

Enno Keßler, Artan Sheshmani, Shing-Tung Yau

In this article we propose a definition of super Gromov-Witten invariants by postulating a torus localization property for the odd directions of the moduli spaces of super stable maps and super stable curves of genus zero. That is, we define super Gromov-Witten invariants as the integral over the pullback of homology classes along the evaluation maps divided by the equivariant Euler class of the normal bundle of the embedding of the moduli space of stable spin maps into the moduli space of super stable maps. This definition sidesteps the difficulties of defining a supergeometric intersection theory and works with classical intersection theory only. The properties of the normal bundles, known from the differential geometric construction of the moduli space of super stable maps, imply that super Gromov-Witten invariants satisfy a generalization of Kontsevich-Manin axioms and allow for the construction of a super small quantum cohomology ring. We describe a method to calculate super Gromov-Witten invariants of $\mathbb{P}^n$ of genus zero by a further geometric torus localization and give explicit numbers in degree one when dimension and number of marked points are small.

super gromov-witten invariants supergeometric moduli spaces torus localization quantum cohomology spin structures
Astrophysics Nov 14, 2023

Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models

Core Francisco Park, Victoria Ono, Nayantara Mudur et al.

Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter components that cannot be directly observed. The relationship between dark matter density fields and galaxy distributions can be sensitive to assumptions in cosmology and astrophysical processes embedded in the galaxy formation models, that remain uncertain in many aspects. Based on state-of-the-art galaxy formation simulation suites with varied cosmological parameters and sub-grid astrophysics, we develop a diffusion generative model to predict the unbiased posterior distribution of the underlying dark matter fields from the given stellar mass fields, while being able to marginalize over the uncertainties in cosmology and galaxy formation.

diffusion models dark matter posterior estimation cosmological simulation uncertainty quantification
Theoretical Physics Nov 13, 2023

Safe but Incalculable: Energy-weighting is not all you need

Samuel Bright-Thonney, Benjamin Nachman, Jesse Thaler

Infrared and collinear (IRC) safety has long been used a proxy for robustness when developing new jet substructure observables. This guiding philosophy has been carried into the deep learning era, where IRC-safe neural networks have been used for many jet studies. For graph-based neural networks, the most straightforward way to achieve IRC safety is to weight particle inputs by their energies. However, energy-weighting by itself does not guarantee that perturbative calculations of machine-learned observables will enjoy small non-perturbative corrections. In this paper, we demonstrate the sensitivity of IRC-safe networks to non-perturbative effects, by training an energy flow network (EFN) to maximize its sensitivity to hadronization. We then show how to construct Lipschitz Energy Flow Networks (L-EFNs), which are both IRC safe and relatively insensitive to non-perturbative corrections. We demonstrate the performance of L-EFNs on generated samples of quark and gluon jets, and showcase fascinating differences between the learned latent representations of EFNs and L-EFNs.

jet physics irc safety non-perturbative corrections lipschitz constraint optimal transport
Experimental Physics Nov 8, 2023

Two Watts is All You Need: Enabling In-Detector Real-Time Machine Learning for Neutrino Telescopes Via Edge Computing

Miaochen Jin, Yushi Hu, Carlos A. Argüelles

The use of machine learning techniques has significantly increased the physics discovery potential of neutrino telescopes. In the upcoming years, we are expecting upgrade of currently existing detectors and new telescopes with novel experimental hardware, yielding more statistics as well as more complicated data signals. This calls out for an upgrade on the software side needed to handle this more complex data in a more efficient way. Specifically, we seek low power and fast software methods to achieve real-time signal processing, where current machine learning methods are too expensive to be deployed in the resource-constrained regions where these experiments are located. We present the first attempt at and a proof-of-concept for enabling machine learning methods to be deployed in-detector for water/ice neutrino telescopes via quantization and deployment on Google Edge Tensor Processing Units (TPUs). We design a recursive neural network with a residual convolutional embedding, and adapt a quantization process to deploy the algorithm on a Google Edge TPU. This algorithm can achieve similar reconstruction accuracy compared with traditional GPU-based machine learning solutions while requiring the same amount of power compared with CPU-based regression solutions, combining the high accuracy and low power advantages and enabling real-time in-detector machine learning in even the most power-restricted environments.

neutrino detection event reconstruction in-detector ml deployment model quantization edge tpu inference
Theoretical Physics Nov 3, 2023

T-Duality and Flavor Symmetries in Little String Theories

Hamza Ahmed, Paul-Konstantin Oehlmann, Fabian Ruehle

We explore the T-duality web of 6D Heterotic Little String Theories, focusing on flavor algebra reducing deformations. A careful analysis of the full flavor algebra, including Abelian factors, shows that the flavor rank is preserved under T-duality. This suggests a new T-duality invariant in addition to the Coulomb branch dimension and the two-group structure constants. We also engineer Little String Theories with non-simply laced flavor algebras, whose appearance we attribute to certain discrete 3-form fluxes in M-theory. Geometrically, these theories are engineered in F-theory with non-Kähler favorable K3 fibers. This geometric origin leads us to propose that freezing fluxes are preserved across T-duality. Along the way, we discuss various exotic models, including two inequivalent $\text{Spin(32)}/\mathbb{Z}_2$ models that are dual to the same $\text{E}_8 \times \text{E}_8$ theory, and a family of self-T-dual models.

t-duality little string theories string theory group theory quantum field theory
Astrophysics Nov 3, 2023

Pairing-based graph neural network for simulating quantum materials

Di Luo, David D. Dai, Liang Fu

We develop a pairing-based graph neural network for simulating quantum many-body systems. Our architecture augments a BCS-type geminal wavefunction with a generalized pair amplitude parameterized by a graph neural network. Variational Monte Carlo with our neural network simultaneously provides an accurate, flexible, and scalable method for simulating many-electron systems. We apply this method to two-dimensional semiconductor electron-hole bilayers and obtain accurate results on a variety of interaction-induced phases, including the exciton Bose-Einstein condensate, electron-hole superconductor, and bilayer Wigner crystal. Our study demonstrates the potential of physically-motivated neural network wavefunctions for quantum materials simulations.

graph neural networks quantum simulation monte carlo methods geminal wavefunction quantum states
Astrophysics Nov 2, 2023

E(2) Equivariant Neural Networks for Robust Galaxy Morphology Classification

Sneh Pandya, Purvik Patel, Franc O et al.

We propose the use of group convolutional neural network architectures (GCNNs) equivariant to the 2D Euclidean group, $E(2)$, for the task of galaxy morphology classification by utilizing symmetries of the data present in galaxy images as an inductive bias in the architecture. We conduct robustness studies by introducing artificial perturbations via Poisson noise insertion and one-pixel adversarial attacks to simulate the effects of limited observational capabilities. We train, validate, and test GCNNs equivariant to discrete subgroups of $E(2)$ - the cyclic and dihedral groups of order $N$ - on the Galaxy10 DECals dataset and find that GCNNs achieve higher classification accuracy and are consistently more robust than their non-equivariant counterparts, with an architecture equivariant to the group $D_{16}$ achieving a $95.52 \pm 0.18\%$ test-set accuracy. We also find that the model loses $<6\%$ accuracy on a $50\%$-noise dataset and all GCNNs are less susceptible to one-pixel perturbations than an identically constructed CNN. Our code is publicly available at https://github.com/snehjp2/GCNNMorphology.

equivariant neural networks group theory galaxy classification geometric deep learning symmetry preservation
Foundational AI Nov 2, 2023

Learning to See Physical Properties with Active Sensing Motor Policies

Gabriel B. Margolis, Xiang Fu, Yandong Ji et al.

Knowledge of terrain's physical properties inferred from color images can aid in making efficient robotic locomotion plans. However, unlike image classification, it is unintuitive for humans to label image patches with physical properties. Without labeled data, building a vision system that takes as input the observed terrain and predicts physical properties remains challenging. We present a method that overcomes this challenge by self-supervised labeling of images captured by robots during real-world traversal with physical property estimators trained in simulation. To ensure accurate labeling, we introduce Active Sensing Motor Policies (ASMP), which are trained to explore locomotion behaviors that increase the accuracy of estimating physical parameters. For instance, the quadruped robot learns to swipe its foot against the ground to estimate the friction coefficient accurately. We show that the visual system trained with a small amount of real-world traversal data accurately predicts physical parameters. The trained system is robust and works even with overhead images captured by a drone despite being trained on data collected by cameras attached to a quadruped robot walking on the ground.

self-supervised learning active terrain sensing reinforcement learning active learning proprioceptive state estimation
Theoretical Physics Oct 30, 2023

Metric Flows with Neural Networks

James Halverson, Fabian Ruehle

We develop a general theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincaré conjecture. We demonstrate that such fixed kernel regimes lead to poor learning of numerical Calabi-Yau metrics, as is expected since the associated neural networks do not learn features. Conversely, we demonstrate that well-learned numerical metrics at finite-width exhibit an evolving metric-NTK, associated with feature learning. Our theory of neural network metric flows therefore explains why neural networks are better at learning Calabi-Yau metrics than fixed kernel methods, such as the Ricci flow.

metric neural tangent kernel calabi-yau metric learning kernel methods string theory geometric deep learning
Astrophysics Oct 24, 2023

Precise Cosmological Constraints from BOSS Galaxy Clustering with a Simulation-Based Emulator of the Wavelet Scattering Transform

Georgios Valogiannis, Sihan Yuan, Cora Dvorkin

We perform a reanalysis of the BOSS CMASS DR12 galaxy dataset using a simulation-based emulator for the Wavelet Scattering Transform (WST) coefficients. Moving beyond our previous works, which laid the foundation for the first galaxy clustering application of this estimator, we construct a neural net-based emulator for the cosmological dependence of the WST coefficients and the 2-point correlation function multipoles, trained from the state-of-the-art suite of \textsc{AbacusSummit} simulations combined with a flexible Halo Occupation Distribution (HOD) galaxy model. In order to confirm the accuracy of our pipeline, we subject it to a series of thorough internal and external mock parameter recovery tests, before applying it to reanalyze the CMASS observations in the redshift range $0.46<z<0.57$. We find that a joint WST + 2-point correlation function likelihood analysis allows us to obtain marginalized 1$σ$ errors on the $Λ$CDM parameters that are tighter by a factor of $2.5-6$, compared to the 2-point correlation function, and by a factor of $1.4-2.5$ compared to the WST-only results. This corresponds to a competitive $0.9\%$, $2.3\%$ and $1\%$ level of determination for parameters $ω_c$, $σ_8$ $\&$ $n_s$, respectively, and also to a $0.7\%$ $\&$ $2.5 \%$ constraint on derived parameters h and $f(z)σ_8(z)$, in agreement with the \textit{Planck} 2018 results. Our results reaffirm the constraining power of the WST and highlight the exciting prospect of employing higher-order statistics in order to fully exploit the power of upcoming Stage-IV spectroscopic observations.

wavelet scattering transform simulation-based inference emulation bayesian inference feature extraction
Foundational AI Oct 11, 2023

Feature Learning and Generalization in Deep Networks with Orthogonal Weights

Hannah Day, Yonatan Kahn, Daniel A. Roberts

Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed in deep networks with depth comparable to width. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.

orthogonal initialization neural tangent kernel kernel methods feature extraction scalability
Foundational AI Oct 11, 2023

Growing Brains: Co-emergence of Anatomical and Functional Modularity in Recurrent Neural Networks

Ziming Liu, Mikail Khona, Ila R. Fiete et al.

Recurrent neural networks (RNNs) trained on compositional tasks can exhibit functional modularity, in which neurons can be clustered by activity similarity and participation in shared computational subtasks. Unlike brains, these RNNs do not exhibit anatomical modularity, in which functional clustering is correlated with strong recurrent coupling and spatial localization of functional clusters. Contrasting with functional modularity, which can be ephemerally dependent on the input, anatomically modular networks form a robust substrate for solving the same subtasks in the future. To examine whether it is possible to grow brain-like anatomical modularity, we apply a recent machine learning method, brain-inspired modular training (BIMT), to a network being trained to solve a set of compositional cognitive tasks. We find that functional and anatomical clustering emerge together, such that functionally similar neurons also become spatially localized and interconnected. Moreover, compared to standard $L_1$ or no regularization settings, the model exhibits superior performance by optimally balancing task performance and network sparsity. In addition to achieving brain-like organization in RNNs, our findings also suggest that BIMT holds promise for applications in neuromorphic computing and enhancing the interpretability of neural network architectures.

anatomical modularity brain-inspired modular training recurrent networks multi-task learning clustering
Experimental Physics Oct 11, 2023

Search for heavy neutral leptons in electron-positron and neutral-pion final states with the MicroBooNE detector

MicroBooNE collaboration, P. Abratenko, O. Alterkait et al.

We present the first search for heavy neutral leptons (HNL) decaying into $νe^+e^-$ or $νπ^0$ final states in a liquid-argon time projection chamber using data collected with the MicroBooNE detector. The data were recorded synchronously with the NuMI neutrino beam from Fermilab's Main Injector corresponding to a total exposure of $7.01 \times 10^{20}$ protons on target. We set upper limits at the $90\%$ confidence level on the mixing parameter $\lvert U_{μ4}\rvert^2$ in the mass ranges $10\le m_{\rm HNL}\le 150$ MeV for the $νe^+e^-$ channel and $150\le m_{\rm HNL}\le 245$ MeV for the $νπ^0$ channel, assuming $\lvert U_{e 4}\rvert^2 = \lvert U_{τ4}\rvert^2 = 0$. These limits represent the most stringent constraints in the mass range $35<m_{\rm HNL}<175$ MeV and the first constraints from a direct search for $νπ^0$ decays.

new physics searches heavy neutral leptons lepton mixing parameters liquid-argon tpc neutrino detection
Theoretical Physics Oct 11, 2023

Functional renormalization group for signal detection and stochastic ergodicity breaking

Harold Erbin, Riccardo Finotello, Bio Wahabou Kpera et al.

Signal detection is one of the main challenges of data science. As it often happens in data analysis, the signal in the data may be corrupted by noise. There is a wide range of techniques aimed at extracting the relevant degrees of freedom from data. However, some problems remain difficult. It is notably the case of signal detection in almost continuous spectra when the signal-to-noise ratio is small enough. This paper follows a recent bibliographic line which tackles this issue with field-theoretical methods. Previous analysis focused on equilibrium Boltzmann distributions for some effective field representing the degrees of freedom of data. It was possible to establish a relation between signal detection and $\mathbb{Z}_2$-symmetry breaking. In this paper, we consider a stochastic field framework inspiring by the so-called "Model A", and show that the ability to reach or not an equilibrium state is correlated with the shape of the dataset. In particular, studying the renormalization group of the model, we show that the weak ergodicity prescription is always broken for signals small enough, when the data distribution is close to the Marchenko-Pastur (MP) law. This, in particular, enables the definition of a detection threshold in the regime where the signal-to-noise ratio is small enough.

renormalization signal detection stochastic ergodicity breaking phase transitions stochastic processes
Foundational AI Oct 10, 2023

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

Samuel Marks, Max Tegmark

Large Language Models (LLMs) have impressive capabilities, but are prone to outputting falsehoods. Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations. However, this line of work is controversial, with some authors pointing out failures of these probes to generalize in basic ways, among other conceptual issues. In this work, we use high-quality datasets of simple true/false statements to study in detail the structure of LLM representations of truth, drawing on three lines of evidence: 1. Visualizations of LLM true/false statement representations, which reveal clear linear structure. 2. Transfer experiments in which probes trained on one dataset generalize to different datasets. 3. Causal evidence obtained by surgically intervening in a LLM's forward pass, causing it to treat false statements as true and vice versa. Overall, we present evidence that at sufficient scale, LLMs linearly represent the truth or falsehood of factual statements. We also show that simple difference-in-mean probes generalize as well as other probing techniques while identifying directions which are more causally implicated in model outputs.

representation learning interpretability truth geometry linear truth probing transformers
Foundational AI Oct 9, 2023

Grokking as Compression: A Nonlinear Complexity Perspective

Ziming Liu, Ziqian Zhong, Max Tegmark

We attribute grokking, the phenomenon where generalization is much delayed after memorization, to compression. To do so, we define linear mapping number (LMN) to measure network complexity, which is a generalized version of linear region number for ReLU networks. LMN can nicely characterize neural network compression before generalization. Although the $L_2$ norm has been a popular choice for characterizing model complexity, we argue in favor of LMN for a number of reasons: (1) LMN can be naturally interpreted as information/computation, while $L_2$ cannot. (2) In the compression phase, LMN has linear relations with test losses, while $L_2$ is correlated with test losses in a complicated nonlinear way. (3) LMN also reveals an intriguing phenomenon of the XOR network switching between two generalization solutions, while $L_2$ does not. Besides explaining grokking, we argue that LMN is a promising candidate as the neural network version of the Kolmogorov complexity since it explicitly considers local or conditioned linear computations aligned with the nature of modern artificial neural networks.

linear mapping number grokking eigenvalue decomposition interpretability spectral methods
Foundational AI Oct 3, 2023

A Neural Scaling Law from Lottery Ticket Ensembling

Ziming Liu, Max Tegmark

Neural scaling laws (NSL) refer to the phenomenon where model performance improves with scale. Sharma & Kaplan analyzed NSL using approximation theory and predict that MSE losses decay as $N^{-α}$, $α=4/d$, where $N$ is the number of model parameters, and $d$ is the intrinsic input dimension. Although their theory works well for some cases (e.g., ReLU networks), we surprisingly find that a simple 1D problem $y=x^2$ manifests a different scaling law ($α=1$) from their predictions ($α=4$). We opened the neural networks and found that the new scaling law originates from lottery ticket ensembling: a wider network on average has more "lottery tickets", which are ensembled to reduce the variance of outputs. We support the ensembling mechanism by mechanistically interpreting single neural networks, as well as studying them statistically. We attribute the $N^{-1}$ scaling law to the "central limit theorem" of lottery tickets. Finally, we discuss its potential implications for large language models and statistical physics-type theories of learning.

neural scaling laws lottery ticket ensembling ensemble methods variance reduction interpretability
Foundational AI Oct 3, 2023

Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution

Rui Wang, Elyssa Hofgard, Han Gao et al.

Modeling symmetry breaking is essential for understanding the fundamental changes in the behaviors and properties of physical systems, from microscopic particle interactions to macroscopic phenomena like fluid dynamics and cosmic structures. Thus, identifying sources of asymmetry is an important tool for understanding physical systems. In this paper, we focus on learning asymmetries of data using relaxed group convolutions. We provide both theoretical and empirical evidence that this flexible convolution technique allows the model to maintain the highest level of equivariance that is consistent with data and discover the subtle symmetry-breaking factors in various physical systems. We employ various relaxed group convolution architectures to uncover various symmetry-breaking factors that are interpretable and physically meaningful in different physical systems, including the phase transition of crystal structure, the isotropy and homogeneity breaking in turbulent flow, and the time-reversal symmetry breaking in pendulum systems.

equivariant neural networks symmetry breaking relaxed group convolution group theory approximate equivariance
Astrophysics Sep 28, 2023

Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample

Enrique Paillas, Carolina Cuesta-Lazaro, Will J. Percival et al.

We present a clustering analysis of the BOSS DR12 CMASS galaxy sample, combining measurements of the galaxy two-point correlation function and density-split clustering down to a scale of $1\,h^{-1}{\rm Mpc}$. Our theoretical framework is based on emulators trained on high-fidelity mock galaxy catalogues that forward model the cosmological dependence of the clustering statistics within an extended-$Λ$CDM framework, including redshift-space and Alcock-Paczynski distortions. Our base-$Λ$CDM analysis finds $ω_{\rm cdm} = 0.1201\pm 0.0022$, $σ_8 = 0.792\pm 0.034$, and $n_s = 0.970\pm 0.018$, corresponding to $fσ_8 = 0.462\pm 0.020$ at $z \approx 0.525$, which is in agreement with Planck 2018 predictions and various clustering studies in the literature. We test single-parameter extensions to base-$Λ$CDM, varying the running of the spectral index, the dark energy equation of state, and the density of massless relic neutrinos, finding no compelling evidence for deviations from the base model. We model the galaxy-halo connection using a halo occupation distribution framework, finding signatures of environment-based assembly bias in the data. We validate our pipeline against mock catalogues that match the clustering and selection properties of CMASS, showing that we can recover unbiased cosmological constraints even with a volume 84 times larger than the one used in this study.

density-split clustering emulation simulation-based inference bayesian inference dark matter
Astrophysics Sep 28, 2023

SUNBIRD: A simulation-based model for full-shape density-split clustering

Carolina Cuesta-Lazaro, Enrique Paillas, Sihan Yuan et al.

Combining galaxy clustering information from regions of different environmental densities can help break cosmological parameter degeneracies and access non-Gaussian information from the density field that is not readily captured by the standard two-point correlation function (2PCF) analyses. However, modelling these density-dependent statistics down to the non-linear regime has so far remained challenging. We present a simulation-based model that is able to capture the cosmological dependence of the full shape of the density-split clustering (DSC) statistics down to intra-halo scales. Our models are based on neural-network emulators that are trained on high-fidelity mock galaxy catalogues within an extended-$Λ$CDM framework, incorporating the effects of redshift-space, Alcock-Paczynski distortions and models of the halo-galaxy connection. Our models reach sub-percent level accuracy down to $1\,h^{-1}{\rm Mpc}$ and are robust against different choices of galaxy-halo connection modelling. When combined with the galaxy 2PCF, DSC can tighten the constraints on $ω_{\rm cdm}$, $σ_8$, and $n_s$ by factors of 2.9, 1.9, and 2.1, respectively, compared to a 2PCF-only analysis. DSC additionally puts strong constraints on environment-based assembly bias parameters. Our code is made publicly available on Github.

density-split clustering simulation-based inference emulation cosmological simulation surrogate modeling
Experimental Physics Sep 27, 2023

Chained Quantile Morphing with Normalizing Flows

Samuel Bright-Thonney, Philip Harris, Patrick McCormack et al.

Accounting for inaccuracies in Monte Carlo simulations is a crucial step in any high energy physics analysis. It becomes especially important when training machine learning models, which can amplify simulation inaccuracies and introduce large discrepancies and systematic uncertainties when the model is applied to data. In this paper, we introduce a method to transform simulated events to better match data using normalizing flows, a class of deep learning-based density estimation models. Our proposal uses a technique called chained quantile morphing, which corrects a set of observables by iteratively shifting each entry according to a conditonal cumulative density function. We demonstrate the technique on a realistic particle physics dataset, and compare it to a neural network-based reweighting method. We also introduce a new contrastive learning technique to correct high dimensional particle-level inputs, which naively cannot be efficiently corrected with morphing strategies.

normalizing flows chained quantile morphing density estimation collider physics monte carlo methods
Experimental Physics Sep 20, 2023

GWAK: Gravitational-Wave Anomalous Knowledge with Recurrent Autoencoders

Ryan Raikman, Eric A. Moreno, Ekaterina Govorkova et al.

Matched-filtering detection techniques for gravitational-wave (GW) signals in ground-based interferometers rely on having well-modeled templates of the GW emission. Such techniques have been traditionally used in searches for compact binary coalescences (CBCs), and have been employed in all known GW detections so far. However, interesting science cases aside from compact mergers do not yet have accurate enough modeling to make matched filtering possible, including core-collapse supernovae and sources where stochasticity may be involved. Therefore the development of techniques to identify sources of these types is of significant interest. In this paper, we present a method of anomaly detection based on deep recurrent autoencoders to enhance the search region to unmodeled transients. We use a semi-supervised strategy that we name Gravitational Wave Anomalous Knowledge (GWAK). While the semi-supervised nature of the problem comes with a cost in terms of accuracy as compared to supervised techniques, there is a qualitative advantage in generalizing experimental sensitivity beyond pre-computed signal templates. We construct a low-dimensional embedded space using the GWAK method, capturing the physical signatures of distinct signals on each axis of the space. By introducing signal priors that capture some of the salient features of GW signals, we allow for the recovery of sensitivity even when an unmodeled anomaly is encountered. We show that regions of the GWAK space can identify CBCs, detector glitches and also a variety of unmodeled astrophysical sources.

anomaly detection autoencoders gravitational waves semi-supervised learning recurrent networks
Astrophysics Sep 17, 2023

Simulation-based Inference for Exoplanet Atmospheric Retrieval: Insights from winning the Ariel Data Challenge 2023 using Normalizing Flows

Mayeul Aubin, Carolina Cuesta-Lazaro, Ethan Tregidga et al.

Advancements in space telescopes have opened new avenues for gathering vast amounts of data on exoplanet atmosphere spectra. However, accurately extracting chemical and physical properties from these spectra poses significant challenges due to the non-linear nature of the underlying physics. This paper presents novel machine learning models developed by the AstroAI team for the Ariel Data Challenge 2023, where one of the models secured the top position among 293 competitors. Leveraging Normalizing Flows, our models predict the posterior probability distribution of atmospheric parameters under different atmospheric assumptions. Moreover, we introduce an alternative model that exhibits higher performance potential than the winning model, despite scoring lower in the challenge. These findings highlight the need to reevaluate the evaluation metric and prompt further exploration of more efficient and accurate approaches for exoplanet atmosphere spectra analysis. Finally, we present recommendations to enhance the challenge and models, providing valuable insights for future applications on real observational data. These advancements pave the way for more effective and timely analysis of exoplanet atmospheric properties, advancing our understanding of these distant worlds.

normalizing flows simulation-based inference posterior estimation exoplanets inverse problems
Theoretical Physics Sep 3, 2023

Advances in machine-learning-based sampling motivated by lattice quantum chromodynamics

Kyle Cranmer, Gurtej Kanwar, Sébastien Racanière et al.

Sampling from known probability distributions is a ubiquitous task in computational science, underlying calculations in domains from linguistics to biology and physics. Generative machine-learning (ML) models have emerged as a promising tool in this space, building on the success of this approach in applications such as image, text, and audio generation. Often, however, generative tasks in scientific domains have unique structures and features -- such as complex symmetries and the requirement of exactness guarantees -- that present both challenges and opportunities for ML. This Perspective outlines the advances in ML-based sampling motivated by lattice quantum field theory, in particular for the theory of quantum chromodynamics. Enabling calculations of the structure and interactions of matter from our most fundamental understanding of particle physics, lattice quantum chromodynamics is one of the main consumers of open-science supercomputing worldwide. The design of ML algorithms for this application faces profound challenges, including the necessity of scaling custom ML architectures to the largest supercomputers, but also promises immense benefits, and is spurring a wave of development in ML-based sampling more broadly. In lattice field theory, if this approach can realize its early promise it will be a transformative step towards first-principles physics calculations in particle, nuclear and condensed matter physics that are intractable with traditional approaches.

lattice qcd normalizing flows monte carlo methods equivariant neural networks symmetry preservation
Theoretical Physics Sep 1, 2023

Signal-to-noise improvement through neural network contour deformations for 3D $SU(2)$ lattice gauge theory

William Detmold, Gurtej Kanwar, Yin Lin et al.

Complex contour deformations of the path integral have been demonstrated to significantly improve the signal-to-noise ratio of observables in previous studies of two-dimensional gauge theories with open boundary conditions. In this work, new developments based on gauge fixing and a neural network definition of the deformation are introduced, which enable an effective application to theories in higher dimensions and with generic boundary conditions. Improvements of the signal-to-noise ratio by up to three orders of magnitude for Wilson loop measurements are shown in $SU(2)$ lattice gauge theory in three spacetime dimensions.

lattice gauge theory signal-to-noise improvement complex contour deformation wilson loops monte carlo methods
Astrophysics Aug 18, 2023

Data Compression and Inference in Cosmology with Self-Supervised Machine Learning

Aizhan Akhmetzhanova, Siddharth Mishra-Sharma, Cora Dvorkin

The influx of massive amounts of data from current and upcoming cosmological surveys necessitates compression schemes that can efficiently summarize the data with minimal loss of information. We introduce a method that leverages the paradigm of self-supervised machine learning in a novel manner to construct representative summaries of massive datasets using simulation-based augmentations. Deploying the method on hydrodynamical cosmological simulations, we show that it can deliver highly informative summaries, which can be used for a variety of downstream tasks, including precise and accurate parameter inference. We demonstrate how this paradigm can be used to construct summary representations that are insensitive to prescribed systematic effects, such as the influence of baryonic physics. Our results indicate that self-supervised machine learning techniques offer a promising new approach for compression of cosmological data as well its analysis.

self-supervised learning representation learning simulation-based augmentation simulation-based inference dimensionality reduction
Astrophysics Aug 18, 2023

Subhalo effective density slope measurements from HST strong lensing data with neural likelihood-ratio estimation

Gemma Zhang, Atınç Çağan Şengül, Cora Dvorkin

Examining the properties of subhalos with strong gravitational lensing images can shed light on the nature of dark matter. From upcoming large-scale surveys, we expect to discover orders of magnitude more strong lens systems that can be used for subhalo studies. To optimally extract information from a large number of strong lensing images, machine learning provides promising avenues for efficient analysis that is unachievable with traditional analysis methods, but application of machine learning techniques to real observations is still limited. We build upon previous work, which uses a neural likelihood-ratio estimator, to constrain the effective density slopes of subhalos and demonstrate the feasibility of this method on real strong lensing observations. To do this, we implement significant improvements to the forward simulation pipeline and undertake careful model evaluation using simulated images. Ultimately, we use our trained model to predict the effective subhalo density slope from combining a set of strong lensing images taken by the \textit{Hubble Space Telescope}. We found the subhalo slope measurement of this set of observations to be steeper than the slope predictions of cold dark matter subhalos. Our result adds to several previous works that also measured high subhalo slopes in observations. Although a possible explanation for this is that subhalos with steeper slopes are easier to detect due to selection effects and thus contribute to statistical bias, our result nevertheless points to the need for careful analysis of more strong lensing observations from future surveys.

dark matter simulation-based inference strong gravitational lensing likelihood ratio subhalo density slope
Theoretical Physics Aug 18, 2023

Reconstructing $S$-matrix Phases with Machine Learning

Aurélien Dersy, Matthew D. Schwartz, Alexander Zhiboedov

An important element of the $S$-matrix bootstrap program is the relationship between the modulus of an $S$-matrix element and its phase. Unitarity relates them by an integral equation. Even in the simplest case of elastic scattering, this integral equation cannot be solved analytically and numerical approaches are required. We apply modern machine learning techniques to studying the unitarity constraint. We find that for a given modulus, when a phase exists it can generally be reconstructed to good accuracy with machine learning. Moreover, the loss of the reconstruction algorithm provides a good proxy for whether a given modulus can be consistent with unitarity at all. In addition, we study the question of whether multiple phases can be consistent with a single modulus, finding novel phase-ambiguous solutions. In particular, we find a new phase-ambiguous solution which pushes the known limit on such solutions significantly beyond the previous bound.

scattering amplitudes phase ambiguity inverse problems s-matrix bootstrap quantum field theory
Theoretical Physics Aug 16, 2023

Gravitational action for a massive Majorana fermion in 2d quantum gravity

Corinne de Lacroix, Harold Erbin, Vincent Lahoche

We compute the gravitational action of a free massive Majorana fermion coupled to two-dimensional gravity on compact Riemann surfaces of arbitrary genus. The structure is similar to the case of the massive scalar. The small-mass expansion of the gravitational yields the Liouville action at zeroth order, and we can identify the Mabuchi action at first order. While the massive Majorana action is a conformal deformation of the massless Majorana CFT, we find an action different from the one given by the David-Distler-Kawai (DDK) ansatz.

majorana fermion quantum field theory liouville gravity conformal field theory spectral methods
Astrophysics Aug 14, 2023

An Extensive $\textit{Hubble Space Telescope}$ Study of the Offset and Host Light Distributions of Type I Superluminous Supernovae

Brian Hsu, Peter K. Blanchard, Edo Berger et al.

We present an extensive $\textit{Hubble Space Telescope}$ ($\textit{HST}$) rest-frame ultraviolet (UV) imaging study of the locations of Type I superluminous supernovae (SLSNe) within their host galaxies. The sample includes 65 SLSNe with detected host galaxies in the redshift range $z\approx 0.05-2$. Using precise astrometric matching with SN images, we determine the distributions of physical and host-normalized offsets relative to the host centers, as well as the fractional flux distribution relative to the underlying UV light distribution. We find that the host-normalized offsets of SLSNe roughly track an exponential disk profile, but exhibit an overabundance of sources with large offsets of $1.5-4$ times their host half-light radius. The SLSNe normalized offsets are systematically larger than those of long gamma-ray bursts (LGRBs), and even Type Ib/c and II SNe. Furthermore, we find that about 40\% of all SLSNe occur in the dimmest regions of their host galaxies (fractional flux of 0), in stark contrast to LGRBs and Type Ib/c and II SNe. We do not detect any significant trends in the locations of SLSNe as a function of redshift, or as a function of explosion and magnetar engine parameters inferred from modeling of their optical lights curves. The significant difference in SLSN locations compared to LGRBs (and normal core-collapse SNe) suggests that at least some of their progenitors follow a different evolutionary path. We speculate that SLSNe arise from massive runaway stars from disrupted binary systems, with velocities of $\sim 10^2$ km s$^{-1}$.

supernova classification runaway star progenitors fractional flux distribution stellar evolution magnetar engine
Experimental Physics Aug 9, 2023

FLORAH: A generative model for halo assembly histories

Tri Nguyen, Chirag Modi, L. Y. Aaron Yung et al.

The mass assembly history (MAH) of dark matter halos plays a crucial role in shaping the formation and evolution of galaxies. MAHs are used extensively in semi-analytic and empirical models of galaxy formation, yet current analytic methods to generate them are inaccurate and unable to capture their relationship with the halo internal structure and large-scale environment. This paper introduces FLORAH, a machine-learning framework for generating assembly histories of ensembles of dark matter halos. We train FLORAH on the assembly histories from the GUREFT and VSMDPL N-body simulations and demonstrate its ability to recover key properties such as the time evolution of mass and concentration. We obtain similar results for the galaxy stellar mass versus halo mass relation and its residuals when we run the Santa Cruz semi-analytic model on FLORAH-generated assembly histories and halo formation histories extracted from an N-body simulation. We further show that FLORAH also reproduces the dependence of clustering on properties other than mass (assembly bias), which is not captured by other analytic methods. By combining multiple networks trained on a suite of simulations with different redshift ranges and mass resolutions, we are able to construct accurate main progenitor branches (MPBs) with a wide dynamic mass range from $z=0$ up to an ultra-high redshift $z \approx 20$, currently far beyond that of a single N-body simulation. FLORAH is the first step towards a machine learning-based framework for planting full merger trees; this will enable the exploration of different galaxy formation scenarios with great computational efficiency at unprecedented accuracy.

generative models dark matter normalizing flows recurrent networks cosmological simulation
Experimental Physics Aug 7, 2023

First application of a liquid argon time projection chamber for the search for intranuclear neutron-antineutron transitions and annihilation in $^{40}$Ar using the MicroBooNE detector

MicroBooNE collaboration, P. Abratenko, O. Alterkait et al.

We present a novel methodology to search for intranuclear neutron-antineutron transition ($n\rightarrow\bar{n}$) followed by $\bar{n}$-nucleon annihilation within an $^{40}$Ar nucleus, using the MicroBooNE liquid argon time projection chamber (LArTPC) detector. A discovery of $n\rightarrow\bar{n}$ transition or a new best limit on the lifetime of this process would either constitute physics beyond the Standard Model or greatly constrain theories of baryogenesis, respectively. The approach presented in this paper makes use of deep learning methods to select $n\rightarrow\bar{n}$ events based on their unique features and differentiate them from cosmogenic backgrounds. The achieved signal and background efficiencies are (70.22$\pm$6.04)\% and (0.0020$\pm$0.0003)\%, respectively. A demonstration of a search is performed with a data set corresponding to an exposure of $3.32 \times10^{26}\,$neutron-years, and where the background rate is constrained through direct measurement, assuming the presence of a negligible signal. With this approach, no excess of events over the background prediction is observed, setting a demonstrative lower bound on the $n\rightarrow\bar{n}$ lifetime in $^{40}$Ar of $τ_{\textrm{m}} \gtrsim 1.1\times10^{26}\,$years, and on the free $n\rightarrow\bar{n}$ transition time of $τ_{\textrm{\nnbar}} \gtrsim 2.6\times10^{5}\,$s, each at the $90\%$ confidence level. This analysis represents a first-ever proof-of-principle demonstration of the ability to search for this rare process in LArTPCs with high efficiency and low background.

neutron-antineutron oscillation new physics searches lartpc imaging convolutional networks signal detection
Astrophysics Aug 2, 2023

A parsec-scale Galactic 3D dust map out to 1.25 kpc from the Sun

Gordian Edenhofer, Catherine Zucker, Philipp Frank et al.

High-resolution 3D maps of interstellar dust are critical for probing the underlying physics shaping the structure of the interstellar medium, and for foreground correction of astrophysical observations affected by dust. We aim to construct a new 3D map of the spatial distribution of interstellar dust extinction out to a distance of 1.25 kpc from the Sun. We leveraged distance and extinction estimates to 54 million nearby stars derived from the Gaia BP/RP spectra. Using the stellar distance and extinction information, we inferred the spatial distribution of dust extinction. We modeled the logarithmic dust extinction with a Gaussian process in a spherical coordinate system via iterative charted refinement and a correlation kernel inferred in previous work. In total, our posterior has over 661 million degrees of freedom. We probed the posterior distribution using the variational inference method MGVI. Our 3D dust map has an angular resolution of up to 14' (Nside = 256), and we achieve parsec-scale distance resolution, sampling the dust in 516 logarithmically spaced distance bins spanning 69 pc to 1250 pc. We generated 12 samples from the variational posterior of the 3D dust distribution and release the samples alongside the mean 3D dust map and its corresponding uncertainty. Our map resolves the internal structure of hundreds of molecular clouds in the solar neighborhood and will be broadly useful for studies of star formation, Galactic structure, and young stellar populations. It is available for download in a variety of coordinate systems online and can also be queried via the publicly available dustmaps Python package.

3d dust mapping bayesian inference variational inference posterior estimation kernel methods
Foundational AI Jul 27, 2023

Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation

William Shen, Ge Yang, Alan Yu et al.

Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects.

distilled feature fields representation learning neural radiance fields few-shot manipulation embeddings
Theoretical Physics Jul 25, 2023

Score-based Diffusion Models for Generating Liquid Argon Time Projection Chamber Images

Zeviel Imani, Shuchin Aeron, Taritree Wongjirad

For the first time, we show high-fidelity generation of LArTPC-like data using a generative neural network. This demonstrates that methods developed for natural images do transfer to LArTPC-produced images, which, in contrast to natural images, are globally sparse but locally dense. We present the score-based diffusion method employed. We evaluate the fidelity of the generated images using several quality metrics, including modified measures used to evaluate natural images, comparisons between high-dimensional distributions, and comparisons relevant to LArTPC experiments.

score-based models diffusion models detector simulation generative models surrogate modeling
Foundational AI Jul 16, 2023

Polarization Multi-Image Synthesis with Birefringent Metasurfaces

Dean Hazineh, Soon Wei Daniel Lim, Qi Guo et al.

Optical metasurfaces composed of precisely engineered nanostructures have gained significant attention for their ability to manipulate light and implement distinct functionalities based on the properties of the incident field. Computational imaging systems have started harnessing this capability to produce sets of coded measurements that benefit certain tasks when paired with digital post-processing. Inspired by these works, we introduce a new system that uses a birefringent metasurface with a polarizer-mosaicked photosensor to capture four optically-coded measurements in a single exposure. We apply this system to the task of incoherent opto-electronic filtering, where digital spatial-filtering operations are replaced by simpler, per-pixel sums across the four polarization channels, independent of the spatial filter size. In contrast to previous work on incoherent opto-electronic filtering that can realize only one spatial filter, our approach can realize a continuous family of filters from a single capture, with filters being selected from the family by adjusting the post-capture digital summation weights. To find a metasurface that can realize a set of user-specified spatial filters, we introduce a form of gradient descent with a novel regularizer that encourages light efficiency and a high signal-to-noise ratio. We demonstrate several examples in simulation and with fabricated prototypes, including some with spatial filters that have prescribed variations with respect to depth and wavelength. Visit the Project Page at https://deanhazineh.github.io/publications/Multi_Image_Synthesis/MIS_Home.html

birefringent metasurface inverse problems opto-electronic filtering loss function design polarization-coded imaging
Theoretical Physics Jul 6, 2023

Neural Network Field Theories: Non-Gaussianity, Actions, and Locality

Mehmet Demirtas, James Halverson, Anindita Maiti et al.

Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $φ^4$ theory is realized as an infinite-$N$ neural network field theory.

nn-ft correspondence quantum field theory connected correlators edgeworth expansion stochastic processes
Astrophysics Jul 6, 2023

From Discovery to the First Month of the Type II Supernova 2023ixf: High and Variable Mass Loss in the Final Year before Explosion

Daichi Hiramatsu, Daichi Tsuna, Edo Berger et al.

We present the discovery of the Type II supernova SN 2023ixf in M101 and follow-up photometric and spectroscopic observations, respectively, in the first month and week of its evolution. Our discovery was made within a day of estimated first light, and the following light curve is characterized by a rapid rise ($\approx5$ days) to a luminous peak ($M_V\approx-18.2$ mag) and plateau ($M_V\approx-17.6$ mag) extending to $30$ days with a fast decline rate of $\approx0.03$ mag day$^{-1}$. During the rising phase, $U-V$ color shows blueward evolution, followed by redward evolution in the plateau phase. Prominent flash features of hydrogen, helium, carbon, and nitrogen dominate the spectra up to $\approx5$ days after first light, with a transition to a higher ionization state in the first $\approx2$ days. Both the $U-V$ color and flash ionization states suggest a rise in the temperature, indicative of a delayed shock breakout inside dense circumstellar material (CSM). From the timescales of CSM interaction, we estimate its compact radial extent of $\sim(3-7)\times10^{14}$ cm. We then construct numerical light-curve models based on both continuous and eruptive mass-loss scenarios shortly before explosion. For the continuous mass-loss scenario, we infer a range of mass-loss history with $0.1-1.0\,M_\odot\,{\rm yr}^{-1}$ in the final $2-1$ yr before explosion, with a potentially decreasing mass loss of $0.01-0.1\,M_\odot\,{\rm yr}^{-1}$ in $\sim0.7-0.4$ yr toward the explosion. For the eruptive mass-loss scenario, we favor eruptions releasing $0.3-1\,M_\odot$ of the envelope at about a year before explosion, which result in CSM with mass and extent similar to the continuous scenario. We discuss the implications of the available multiwavelength constraints obtained thus far on the progenitor candidate and SN 2023ixf to our variable CSM models.

supernova classification circumstellar interaction stellar evolution shock breakout flash spectroscopy
Foundational AI Jun 30, 2023

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Ziqian Zhong, Ziming Liu, Max Tegmark et al.

Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.

interpretability mechanistic explanation algorithmic phase space phase transitions embeddings
Theoretical Physics Jun 21, 2023

Hierarchical Neural Simulation-Based Inference Over Event Ensembles

Lukas Heinrich, Siddharth Mishra-Sharma, Chris Pollard et al.

When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for frequentist and Bayesian dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via a hierarchical forward model. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to significantly tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics and cosmology.

simulation-based inference hierarchical forward model posterior estimation likelihood ratio bayesian inference
Experimental Physics Jun 16, 2023

Development of the Topological Trigger for LHCb Run 3

Nicole Schulte, Blaise Raheem Delaney, Niklas Nolte et al.

The data-taking conditions expected in Run 3 of the LHCb experiment at CERN are unprecedented and challenging for the software and computing systems. Despite that, the LHCb collaboration pioneers the use of a software-only trigger system to cope with the increased event rate efficiently. The beauty physics programme of LHCb is heavily reliant on topological triggers. These are devoted to selecting beauty-hadron candidates inclusively, based on the characteristic decay topology and kinematic properties expected from beauty decays. The following proceeding describes the current progress of the Run 3 implementation of the topological triggers using Lipschitz monotonic neural networks. This architecture offers robustness under varying detector conditions and sensitivity to long-lived candidates, improving the possibility of discovering New Physics at LHCb.

trigger systems lipschitz neural networks monotonic neural networks robustness new physics searches
Foundational AI Jun 16, 2023

Torus Actions on Moduli Spaces of Super Stable Maps of Genus Zero

Enno Keßler, Artan Sheshmani, Shing-Tung Yau

We construct smooth $\mathbb{C}^*$-actions on the moduli spaces of super $J$-holomorphic curves as well as super stable curves and super stable maps of genus zero and fixed tree type such that their reduced spaces are torus invariant. Furthermore, we give explicit descriptions of the normal bundles to the fixed loci in terms of spinor bundles and their sections. Main steps to the construction of the $\mathbb{C}^*$-action are the proof that the charts of the moduli space of super $J$-holomorphic curves obtained by the implicit function theorem yield a smooth split atlas and a detailed study of the superconformal automorphism group of $\mathbb{P}_{\mathbb{C}}^{1|1}$ and its action on component fields.

super moduli spaces supergeometry torus localization string theory spinor bundles
Experimental Physics Jun 9, 2023

NuCLR: Nuclear Co-Learned Representations

Ouail Kitouni, Niklas Nolte, Sokratis Trifinopoulos et al.

We introduce Nuclear Co-Learned Representations (NuCLR), a deep learning model that predicts various nuclear observables, including binding and decay energies, and nuclear charge radii. The model is trained using a multi-task approach with shared representations and obtains state-of-the-art performance, achieving levels of precision that are crucial for understanding fundamental phenomena in nuclear (astro)physics. We also report an intriguing finding that the learned representation of NuCLR exhibits the prominent emergence of crucial aspects of the nuclear shell model, namely the shell structure, including the well-known magic numbers, and the Pauli Exclusion Principle. This suggests that the model is capable of capturing the underlying physical principles and that our approach has the potential to offer valuable insights into nuclear theory.

representation learning nuclear shell emergence multi-task learning embeddings interpretability
Theoretical Physics Jun 6, 2023

Quantum Computation and Simulation using Fermion-Pair Registers

Xiangkai Sun, Di Luo, Soonwon Choi

We propose and analyze an approach to realize quantum computation and simulation using fermionic particles under quantum gas microscopes. Our work is inspired by a recent experimental demonstration of large-scale quantum registers, where tightly localized fermion pairs are used to encode qubits exhibiting long coherence time and robustness against laser intensity noise. We describe how to engineer the SWAP gate and high-fidelity controlled-phase gates by adjusting the fermion hopping as well as Feshbach interaction strengths. Combined with previously demonstrated single-qubit rotations, these gates establish the computational universality of the system. Furthermore, we show that 2D quantum Ising Hamiltonians with tunable transverse and longitudinal fields can be efficient simulated by modulating Feshbach interaction strengths. We present a sample-efficient protocol to characterize engineered gates and Hamiltonian dynamics based on an improved classical shadow process tomography that requires minimal experimental controls. Our work opens up new opportunities to harness existing ultracold quantum gases for quantum information sciences.

quantum computing quantum simulation fermion-pair qubits quantum states feshbach resonance
Foundational AI May 31, 2023

Discovering New Interpretable Conservation Laws as Sparse Invariants

Ziming Liu, Patrick Obin Sturm, Saketh Bharadwaj et al.

Discovering conservation laws for a given dynamical system is important but challenging. In a theorist setup (differential equations and basis functions are both known), we propose the Sparse Invariant Detector (SID), an algorithm that auto-discovers conservation laws from differential equations. Its algorithmic simplicity allows robustness and interpretability of the discovered conserved quantities. We show that SID is able to rediscover known and even discover new conservation laws in a variety of systems. For two examples in fluid mechanics and atmospheric chemistry, SID discovers 14 and 3 conserved quantities, respectively, where only 12 and 2 were previously known to domain experts.

conservation laws sparse invariant detection sparse models automated discovery eigenvalue decomposition
Foundational AI May 29, 2023

Learning Linear Groups in Neural Networks

Emmanouil Theodosis, Karim Helwani, Demba Ba

Employing equivariance in neural networks leads to greater parameter efficiency and improved generalization performance through the encoding of domain knowledge in the architecture; however, the majority of existing approaches require an a priori specification of the desired symmetries. We present a neural network architecture, Linear Group Networks (LGNs), for learning linear groups acting on the weight space of neural networks. Linear groups are desirable due to their inherent interpretability, as they can be represented as finite matrices. LGNs learn groups without any supervision or knowledge of the hidden symmetries in the data and the groups can be mapped to well known operations in machine learning. We use LGNs to learn groups on multiple datasets while considering different downstream tasks; we demonstrate that the linear group structure depends on both the data distribution and the considered task.

linear group learning equivariant neural networks group theory weight space symmetry symmetry preservation
Foundational AI May 23, 2023

Machine Learning for Quantum-Enhanced Gravitational-Wave Observatories

Chris Whittle, Ge Yang, Matthew Evans et al.

Machine learning has become an effective tool for processing the extensive data sets produced by large physics experiments. Gravitational-wave detectors are now listening to the universe with quantum-enhanced sensitivity, accomplished with the injection of squeezed vacuum states. Squeezed state preparation and injection is operationally complicated, as well as highly sensitive to environmental fluctuations and variations in the interferometer state. Achieving and maintaining optimal squeezing levels is a challenging problem and will require development of new techniques to reach the lofty targets set by design goals for future observing runs and next-generation detectors. We use machine learning techniques to predict the squeezing level during the third observing run of the Laser Interferometer Gravitational-Wave Observatory (LIGO) based on auxiliary data streams, and offer interpretations of our models to identify and quantify salient sources of squeezing degradation. The development of these techniques lays the groundwork for future efforts to optimize squeezed state injection in gravitational-wave detectors, with the goal of enabling closed-loop control of the squeezer subsystem by an agent based on machine learning.

gravitational waves quantum states squeezed state optimization interpretability surrogate modeling
Foundational AI May 22, 2023

Materialistic: Selecting Similar Materials in Images

Prafull Sharma, Julien Philip, Michaël Gharbi et al.

Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic segmentation (different woods or metal should not be selected together), we formulate the problem as a similarity-based grouping problem based on a user-provided image location. In particular, we propose to leverage the unsupervised DINO features coupled with a proposed Cross-Similarity module and an MLP head to extract material similarities in an image. We train our model on a new synthetic image dataset, that we release. We show that our method generalizes well to real-world images. We carefully analyze our model's behavior on varying material properties and lighting. Additionally, we evaluate it against a hand-annotated benchmark of 50 real photographs. We further demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials.

cross-similarity feature weighting representation learning self-supervised learning feature extraction cross-image material similarity
Astrophysics May 18, 2023

Multiple Peaks and a Long Precursor in the Type IIn Supernova 2021qqp: An Energetic Explosion in a Complex Circumstellar Environment

Daichi Hiramatsu, Tatsuya Matsumoto, Edo Berger et al.

We present optical photometry and spectroscopy of the Type IIn supernova (SN) 2021qqp. Its unusual light curve is marked by a long precursor for $\approx300$ days, a rapid increase in brightness for $\approx60$ days, and then a sharp increase of $\approx1.6$ mag in only a few days to a first peak of $M_r \approx -19.5$ mag. The light curve then declines rapidly until it re-brightens to a second distinct peak of $M_r \approx -17.3$ mag centered at $\approx335$ days after the first peak. The spectra are dominated by Balmer lines with a complex morphology, including a narrow component with a width of $\approx 1300$ km s$^{-1}$ (first peak) and $\approx 2500$ km s$^{-1}$ (second peak) that we associate with the circumstellar medium (CSM) and a P Cygni component with an absorption velocity of $\approx 8500$ km s$^{-1}$ (first peak) and $\approx 5600$ km s$^{-1}$ (second peak) that we associate with the SN-CSM interaction shell. Using the luminosity and velocity evolution, we construct a flexible analytical model, finding two significant mass-loss episodes with peak mass loss rates of $\approx 10$ and $\approx 5\,M_{\odot}$ yr$^{-1}$ about $0.8$ and $2$ yr before explosion, respectively, with a total CSM mass of $\approx 2-4\,M_{\odot}$. We show that the most recent mass-loss episode could explain the precursor for the year preceding the explosion. The SN ejecta mass is constrained to be $\approx 5-30\,M_{\odot}$ for an explosion energy of $\approx (3-10)\times10^{51}$ erg. We discuss eruptive massive stars (luminous blue variable, pulsational pair instability) and an extreme stellar merger with a compact object as possible progenitor channels.

circumstellar interaction supernova classification mass-loss episodes multi-peak light curve stellar evolution
Theoretical Physics May 10, 2023

Constraint of pionless EFT using two-nucleon spectra from lattice QCD

William Detmold, Fernando Romero-López, Phiala E. Shanahan

Finite-volume pionless effective field theory (FVEFT$_{ π\!/ }$) at next-to-leading order (NLO) is used to analyze the two-nucleon lattice QCD spectrum of Ref.~\cite{Amarasinghe:2021lqa}, performed at quark masses corresponding to a pion mass of approximately $800 $ MeV. Specifically, the effective theory is formulated in finite volume, and variational sets of wave functions are optimized using differential programming. Using these wave functions projected to the appropriate finite-volume symmetry group, variational bounds from FVEFT$_{π\!/ }$ are obtained for the ground state, as well as excited states. By comparison with the lattice QCD GEVP spectrum, different low energy constants (LECs) are constrained. Relativistic corrections are incorporated, allowing for the extractions of NLO LECs, as well as the leading $s$-$d$-wave mixing term in the deuteron channel.

lattice qcd effective field theory finite-volume eft hamiltonian systems variational wave functions
Experimental Physics May 6, 2023

Symbolic Regression on FPGAs for Fast Machine Learning Inference

Ho Fung Tsoi, Adrian Alan Pol, Vladimir Loncar et al.

The high-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to enhance physics sensitivity while still meeting data processing time constraints. In this contribution, we introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR). It searches the equation space to discover algebraic relations approximating a dataset. We use PySR (a software to uncover these expressions based on an evolutionary algorithm) and extend the functionality of hls4ml (a package for machine learning inference in FPGAs) to support PySR-generated expressions for resource-constrained production environments. Deep learning models often optimize the top metric by pinning the network size because the vast hyperparameter space prevents an extensive search for neural architecture. Conversely, SR selects a set of models on the Pareto front, which allows for optimizing the performance-resource trade-off directly. By embedding symbolic forms, our implementation can dramatically reduce the computational resources needed to perform critical tasks. We validate our method on a physics benchmark: the multiclass classification of jets produced in simulated proton-proton collisions at the CERN Large Hadron Collider. We show that our approach can approximate a 3-layer neural network using an inference model that achieves up to a 13-fold decrease in execution time, down to 5 ns, while still preserving more than 90% approximation accuracy.

symbolic regression fpga inference jet physics trigger systems collider physics
Theoretical Physics May 5, 2023

A Spectral Metric for Collider Geometry

Andrew J. Larkoski, Jesse Thaler

By quantifying the distance between two collider events, one can triangulate a metric space and reframe collider data analysis as computational geometry. One popular geometric approach is to first represent events as an energy flow on an idealized celestial sphere and then define the metric in terms of optimal transport in two dimensions. In this paper, we advocate for representing events in terms of a spectral function that encodes pairwise particle angles and products of particle energies, which enables a metric distance defined in terms of one-dimensional optimal transport. This approach has the advantage of automatically incorporating obvious isometries of the data, like rotations about the colliding beam axis. It also facilitates first-principles calculations, since there are simple closed-form expressions for optimal transport in one dimension. Up to isometries and event sets of measure zero, the spectral representation is unique, so the metric on the space of spectral functions is a metric on the space of events. At lowest order in perturbation theory in electron-positron collisions, our metric is simply the summed squared invariant masses of the two event hemispheres. Going to higher orders, we present predictions for the distribution of metric distances between jets in fixed-order and resummed perturbation theory as well as in parton-shower generators. Finally, we speculate on whether the spectral approach could furnish a useful metric on the space of quantum field theories.

spectral emd optimal transport spectral methods collider physics jet physics
Foundational AI May 4, 2023

Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability

Ziming Liu, Eric Gan, Max Tegmark

We introduce Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable. Inspired by brains, BIMT embeds neurons in a geometric space and augments the loss function with a cost proportional to the length of each neuron connection. We demonstrate that BIMT discovers useful modular neural networks for many simple tasks, revealing compositional structures in symbolic formulas, interpretable decision boundaries and features for classification, and mathematical structure in algorithmic datasets. The ability to directly see modules with the naked eye can complement current mechanistic interpretability strategies such as probes, interventions or staring at all weights.

interpretability modular training loss function design locality regularization sparse models
Theoretical Physics May 3, 2023

Normalizing flows for lattice gauge theory in arbitrary space-time dimension

Ryan Abbott, Michael S. Albergo, Aleksandar Botev et al.

Applications of normalizing flows to the sampling of field configurations in lattice gauge theory have so far been explored almost exclusively in two space-time dimensions. We report new algorithmic developments of gauge-equivariant flow architectures facilitating the generalization to higher-dimensional lattice geometries. Specifically, we discuss masked autoregressive transformations with tractable and unbiased Jacobian determinants, a key ingredient for scalable and asymptotically exact flow-based sampling algorithms. For concreteness, results from a proof-of-principle application to SU(3) lattice gauge theory in four space-time dimensions are reported.

normalizing flows lattice gauge theory equivariant neural networks lattice qcd monte carlo methods
Foundational AI May 3, 2023

Dynamic Sparse Training with Structured Sparsity

Mike Lasby, Anna Golubeva, Utku Evci et al.

Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference. Although the resulting models are highly sparse and theoretically less computationally expensive, achieving speedups with unstructured sparsity on real-world hardware is challenging. In this work, we propose a sparse-to-sparse DST method, Structured RigL (SRigL), to learn a variant of fine-grained structured N:M sparsity by imposing a constant fan-in constraint. Using our empirical analysis of existing DST methods at high sparsity, we additionally employ a neuron ablation method which enables SRigL to achieve state-of-the-art sparse-to-sparse structured DST performance on a variety of Neural Network (NN) architectures. Using a 90% sparse linear layer, we demonstrate a real-world acceleration of 3.4x/2.5x on CPU for online inference and 1.7x/13.0x on GPU for inference with a batch size of 256 when compared to equivalent dense/unstructured (CSR) sparse layers, respectively.

sparse models dynamic sparse training structured n:m sparsity neuron ablation scalability
Experimental Physics May 1, 2023

Pileup and Infrared Radiation Annihilation (PIRANHA): A Paradigm for Continuous Jet Grooming

Samuel Alipour-Fard, Patrick T. Komiske, Eric M. Metodiev et al.

Jet grooming is an important strategy for analyzing relativistic particle collisions in the presence of contaminating radiation. Most jet grooming techniques introduce hard cutoffs to remove soft radiation, leading to discontinuous behavior and associated experimental and theoretical challenges. In this paper, we introduce Pileup and Infrared Radiation Annihilation (PIRANHA), a paradigm for continuous jet grooming that overcomes the discontinuity and infrared sensitivity of hard-cutoff grooming procedures. We motivate PIRANHA from the perspective of optimal transport and the Energy Mover's Distance and review Apollonius Subtraction and Iterated Voronoi Subtraction as examples of PIRANHA-style grooming. We then introduce a new tree-based implementation of PIRANHA, Recursive Subtraction, with reduced computational costs. Finally, we demonstrate the performance of Recursive Subtraction in mitigating sensitivity to soft distortions from hadronization and detector effects, and additive contamination from pileup and the underlying event.

jet physics continuous jet grooming optimal transport collider physics energy mover's distance
Experimental Physics Apr 27, 2023

Prometheus: An Open-Source Neutrino Telescope Simulation

Jeffrey Lazar, Stephan Meighen-Berger, Christian Haack et al.

Neutrino telescopes are gigaton-scale neutrino detectors comprised of individual light-detection units. Though constructed from simple building blocks, they have opened a new window to the Universe and are able to probe center-of-mass energies that are comparable to those of collider experiments. \prometheus{} is a new, open-source simulation tailored for this kind of detector. Our package, which is written in a combination of \texttt{C++} and \texttt{Python} provides a balance of ease of use and performance and allows the user to simulate a neutrino telescope with arbitrary geometry deployed in ice or water. \prometheus{} simulates the neutrino interactions in the volume surrounding the detector, computes the light yield of the hadronic shower and the out-going lepton, propagates the photons in the medium, and records their arrival times and position in user-defined regions. Finally, \prometheus{} events are serialized into a \texttt{parquet} file, which is a compact and interoperational file format that allows prompt access to the events for further analysis.

detector simulation neutrino detection open-source neutrino simulator monte carlo methods photon propagation
Theoretical Physics Apr 18, 2023

Searching for ribbons with machine learning

Sergei Gukov, James Halverson, Ciprian Manolescu et al.

We apply Bayesian optimization and reinforcement learning to a problem in topology: the question of when a knot bounds a ribbon disk. This question is relevant in an approach to disproving the four-dimensional smooth Poincaré conjecture; using our programs, we rule out many potential counterexamples to the conjecture. We also show that the programs are successful in detecting many ribbon knots in the range of up to 70 crossings.

knot ribbon detection low-dimensional topology reinforcement learning bayesian inference poincaré conjecture search
Theoretical Physics Apr 7, 2023

Correlation function distributions for O(N) lattice field theories in the disordered phase

Cagin Yunus, William Detmold

Numerical computations in strongly-interacting quantum field theories are often performed using Monte-Carlo sampling methods. A key task in these calculations is to estimate the value of a given physical quantity from the distribution of stochastic samples that are generated using the Monte-Carlo method. Typically, the sample mean and sample variance are used to define the expectation values and uncertainties of computed quantities. However, the Monte-Carlo sample distribution contains more information than these basic properties and it is useful to investigate it more generally. In this work, the exact form of the probability distributions of two-point correlation functions at zero momentum in O(N) lattice field theories in the disordered phase and in infinite volume are determined. These distributions allow for a robust investigation of the efficacy of the Monte-Carlo sampling procedure and are shown also to allow for improved estimators of the target physical quantity to be constructed. The theoretical expectations are shown to agree with numerical calculations in the O(2) model.

correlation function pdf monte carlo methods quantum field theory lattice gauge theory stochastic processes
Foundational AI Apr 5, 2023

GenPhys: From Physical Processes to Generative Models

Ziming Liu, Di Luo, Yilun Xu et al.

Since diffusion models (DM) and the more recent Poisson flow generative models (PFGM) are inspired by physical processes, it is reasonable to ask: Can physical processes offer additional new generative models? We show that the answer is yes. We introduce a general family, Generative Models from Physical Processes (GenPhys), where we translate partial differential equations (PDEs) describing physical processes to generative models. We show that generative models can be constructed from s-generative PDEs (s for smooth). GenPhys subsume the two existing generative models (DM and PFGM) and even give rise to new families of generative models, e.g., "Yukawa Generative Models" inspired from weak interactions. On the other hand, some physical processes by default do not belong to the GenPhys family, e.g., the wave equation and the Schrödinger equation, but could be made into the GenPhys family with some modifications. Our goal with GenPhys is to explore and expand the design space of generative models.

generative models s-generative pdes diffusion models dispersion relations physics-informed neural networks
Theoretical Physics Apr 4, 2023

ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation

Zhuo Chen, Laker Newhouse, Eddie Chen et al.

Quantum many-body physics simulation has important impacts on understanding fundamental science and has applications to quantum materials design and quantum technology. However, due to the exponentially growing size of the Hilbert space with respect to the particle number, a direct simulation is intractable. While representing quantum states with tensor networks and neural networks are the two state-of-the-art methods for approximate simulations, each has its own limitations in terms of expressivity and inductive bias. To address these challenges, we develop a novel architecture, Autoregressive Neural TensorNet (ANTN), which bridges tensor networks and autoregressive neural networks. We show that Autoregressive Neural TensorNet parameterizes normalized wavefunctions, allows for exact sampling, generalizes the expressivity of tensor networks and autoregressive neural networks, and inherits a variety of symmetries from autoregressive neural networks. We demonstrate our approach on quantum state learning as well as finding the ground state of the challenging 2D $J_1$-$J_2$ Heisenberg model with different systems sizes and coupling parameters, outperforming both tensor networks and autoregressive neural networks. Our work opens up new opportunities for quantum many-body physics simulation, quantum technology design, and generative modeling in artificial intelligence.

tensor networks quantum simulation autoregressive wavefunction quantum states neural network quantum states
Foundational AI Apr 3, 2023

Neural Volumetric Memory for Visual Locomotion Control

Ruihan Yang, Ge Yang, Xiaolong Wang

Legged robots have the potential to expand the reach of autonomy beyond paved roads. In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. Due to the partial observability of the problem, the robot has to rely on past observations to infer the terrain currently beneath it. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world. NVM aggregates feature volumes from multiple camera views by first bringing them back to the ego-centric frame of the robot. We test the learned visual-locomotion policy on a physical robot and show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more naïve methods. We also include ablation studies and show that the representations stored in the neural volumetric memory capture sufficient geometric information to reconstruct the scene. Our project page with videos is https://rchalyang.github.io/NVM .

neural volumetric memory equivariant neural networks geometric deep learning reinforcement learning representation learning
Foundational AI Apr 3, 2023

DribbleBot: Dynamic Legged Manipulation in the Wild

Yandong Ji, Gabriel B. Margolis, Pulkit Agrawal

DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i.e., in-the-wild). We adopt the paradigm of training policies in simulation using reinforcement learning and transferring them into the real world. We overcome critical challenges of accounting for variable ball motion dynamics on different terrains and perceiving the ball using body-mounted cameras under the constraints of onboard computing. Our results provide evidence that current quadruped platforms are well-suited for studying dynamic whole-body control problems involving simultaneous locomotion and manipulation directly from sensory observations.

reinforcement learning transfer learning legged locomotion whole-body control reward optimization
Theoretical Physics Mar 31, 2023

Level Crossings, Attractor Points and Complex Multiplication

Hamza Ahmed, Fabian Ruehle

We study the complex structure moduli dependence of the scalar Laplacian eigenmodes for one-parameter families of Calabi-Yau $n$-folds in P^{n+1}. It was previously observed that some eigenmodes get lighter while others get heavier as a function of these moduli, which leads to eigenvalue crossing. We identify the cause for this behavior for the torus. We then show that at points in a sublocus of complex structure moduli space where Laplacian eigenmodes cross, the torus has complex multiplication. We speculate that the generalization to arbitrary Calabi-Yau manifolds could be that level crossing is related to rank one attractor points. To test this, we compute the eigenmodes numerically for the quartic K3 and the quintic threefold, and match crossings to CM and attractor points in these varieties. To quantify the error of our numerical methods, we also study the dependence of the numerical spectrum on the quality of the Calabi-Yau metric approximation, the number of points sampled from the Calabi-Yau variety, the truncation of the eigenbasis, and the the distance from degeneration points in complex structure moduli space.

eigenvalue decomposition complex multiplication attractor points string theory calabi-yau moduli
Foundational AI Mar 25, 2023

Noisy dynamical systems evolve error correcting codes and modularity

Trevor McCourt, Ila R. Fiete, Isaac L. Chuang

Noise is a ubiquitous feature of the physical world. As a result, the first prerequisite of life is fault tolerance: maintaining integrity of state despite external bombardment. Recent experimental advances have revealed that biological systems achieve fault tolerance by implementing mathematically intricate error-correcting codes and by organizing in a modular fashion that physically separates functionally distinct subsystems. These elaborate structures represent a vanishing volume in the massive genetic configuration space. How is it possible that the primitive process of evolution, by which all biological systems evolved, achieved such unusual results? In this work, through experiments in Boolean networks, we show that the simultaneous presence of error correction and modularity in biological systems is no coincidence. Rather, it is a typical co-occurrence in noisy dynamic systems undergoing evolution. From this, we deduce the principle of error correction enhanced evolvability: systems possessing error-correcting codes are more effectively improved by evolution than those without.

error-correcting codes modularity evolvability robustness boolean network evolution
Foundational AI Mar 23, 2023

The Quantization Model of Neural Scaling

Eric J. Michaud, Ziming Liu, Uzay Girit et al.

We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are "quantized" into discrete chunks ($\textbf{quanta}$). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model gradients, we automatically decompose model behavior into a diverse set of skills (quanta). We tentatively find that the frequency at which these quanta are used in the training distribution roughly follows a power law corresponding with the empirical scaling exponent for language models, a prediction of our theory.

scalability neural scaling laws knowledge quantization emergent capabilities interpretability
Theoretical Physics Mar 14, 2023

Artificial intelligence for artificial materials: moiré atom

Di Luo, Aidan P. Reddy, Trithep Devakul et al.

Moiré engineering in atomically thin van der Waals heterostructures creates artificial quantum materials with designer properties. We solve the many-body problem of interacting electrons confined to a moiré superlattice potential minimum (the moiré atom) using a 2D fermionic neural network. We show that strong Coulomb interactions in combination with the anisotropic moiré potential lead to striking ``Wigner molecule" charge density distributions observable with scanning tunneling microscopy.

fermionic neural network moiré superlattice quantum states wigner molecule symmetry preservation
Theoretical Physics Mar 7, 2023

Exploring the CP-violating Dashen phase in the Schwinger model with tensor networks

Lena Funcke, Karl Jansen, Stefan Kühn

We numerically study the phase structure of the two-flavor Schwinger model with matrix product states, focusing on the (1+1)-dimensional analog of the CP-violating Dashen phase in QCD. We simulate the two-flavor Schwinger model around the point where the positive mass of one fermion flavor corresponds to the negative mass of the other fermion flavor, which is a sign-problem afflicted regime for conventional Monte Carlo techniques. Our results indicate that the model undergoes a CP-violating Dashen phase transition at this point, which manifests itself in abrupt changes of the average electric field and the analog of the pion condensate in the model. Studying the scaling of the bipartite entanglement entropy as a function of the volume, we find clear indications that this transition is not of first order.

phase transitions tensor networks matrix product states lattice gauge theory cp violation
Foundational AI Mar 4, 2023

Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries

Charlotte Loh, Seungwook Han, Shivchander Sudalairaj et al.

Deep ensembles (DE) have been successful in improving model performance by learning diverse members via the stochasticity of random initialization. While recent works have attempted to promote further diversity in DE via hyperparameters or regularizing loss functions, these methods primarily still rely on a stochastic approach to explore the hypothesis space. In this work, we present Multi-Symmetry Ensembles (MSE), a framework for constructing diverse ensembles by capturing the multiplicity of hypotheses along symmetry axes, which explore the hypothesis space beyond stochastic perturbations of model weights and hyperparameters. We leverage recent advances in contrastive representation learning to create models that separately capture opposing hypotheses of invariant and equivariant functional classes and present a simple ensembling approach to efficiently combine appropriate hypotheses for a given task. We show that MSE effectively captures the multiplicity of conflicting hypotheses that is often required in large, diverse datasets like ImageNet. As a result of their inherent diversity, MSE improves classification performance, uncertainty quantification, and generalization across a series of transfer tasks.

ensemble methods contrastive learning equivariant neural networks hypothesis space diversity symmetry preservation
Astrophysics Mar 2, 2023

Via Machinae 2.0: Full-Sky, Model-Agnostic Search for Stellar Streams in Gaia DR2

David Shih, Matthew R. Buckley, Lina Necib

We present an update to Via Machinae, an automated stellar stream-finding algorithm based on the deep learning anomaly detector ANODE. Via Machinae identifies stellar streams within Gaia, using only angular positions, proper motions, and photometry, without reference to a model of the Milky Way potential for orbit integration or stellar distances. This new version, Via Machinae 2.0, includes many improvements and refinements to nearly every step of the algorithm, that altogether result in more robust and visually distinct stream candidates than our original formulation. In this work, we also provide a quantitative estimate of the false positive rate of Via Machinae 2.0 by applying it to a simulated Gaia-mock catalog based on Galaxia, a smooth model of the Milky Way that does not contain substructure or stellar streams. Finally, we perform the first full-sky search for stellar streams with Via Machinae 2.0, identifying 102 streams at high significance within the Gaia Data Release 2, of which only 10 have been previously identified. While follow-up observations for further confirmation are required, taking into account the false positive rate presented in this work, we expect approximately 90 of these stream candidates to correspond to real stellar structures.

stellar streams anomaly detection normalizing flows model-agnostic stream search density estimation
Theoretical Physics Mar 1, 2023

Computational Mirror Symmetry

Mehmet Demirtas, Manki Kim, Liam McAllister et al.

We present an efficient algorithm for computing the prepotential in compactifications of type II string theory on mirror pairs of Calabi-Yau threefolds in toric varieties. Applying this method, we exhibit the first systematic computation of genus-zero Gopakumar-Vafa invariants in compact threefolds with many moduli, including examples with up to 491 vector multiplets.

calabi-yau geometry gopakumar-vafa invariants string theory picard-fuchs equations symmetry preservation
Theoretical Physics Feb 23, 2023

SHAPER: Can You Hear the Shape of a Jet?

Demba Ba, Akshunna S. Dogra, Rikab Gambhir et al.

The identification of interesting substructures within jets is an important tool for searching for new physics and probing the Standard Model at colliders. Many of these substructure tools have previously been shown to take the form of optimal transport problems, in particular the Energy Mover's Distance (EMD). In this work, we show that the EMD is in fact the natural structure for comparing collider events, which accounts for its recent success in understanding event and jet substructure. We then present a Shape Hunting Algorithm using Parameterized Energy Reconstruction (SHAPER), which is a general framework for defining and computing shape-based observables. SHAPER generalizes N-jettiness from point clusters to any extended, parametrizable shape. This is accomplished by efficiently minimizing the EMD between events and parameterized manifolds of energy flows representing idealized shapes, implemented using the dual-potential Sinkhorn approximation of the Wasserstein metric. We show how the geometric language of observables as manifolds can be used to define novel observables with built-in infrared-and-collinear safety. We demonstrate the efficacy of the SHAPER framework by performing empirical jet substructure studies using several examples of new shape-based observables.

optimal transport jet physics energy mover's distance irc safety collider physics
Theoretical Physics Feb 23, 2023

Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing Flows

Owen Dugan, Peter Y. Lu, Rumen Dangovski et al.

Studying the dynamics of open quantum systems can enable breakthroughs both in fundamental physics and applications to quantum engineering and quantum computation. Since the density matrix $ρ$, which is the fundamental description for the dynamics of such systems, is high-dimensional, customized deep generative neural networks have been instrumental in modeling $ρ$. However, the complex-valued nature and normalization constraints of $ρ$, as well as its complicated dynamics, prohibit a seamless connection between open quantum systems and the recent advances in deep generative modeling. Here we lift that limitation by utilizing a reformulation of open quantum system dynamics to a partial differential equation (PDE) for a corresponding probability distribution $Q$, the Husimi Q function. Thus, we model the Q function seamlessly with off-the-shelf deep generative models such as normalizing flows. Additionally, we develop novel methods for learning normalizing flow evolution governed by high-dimensional PDEs based on the Euler method and the application of the time-dependent variational principle. We name the resulting approach $Q$-$Flow$ and demonstrate the scalability and efficiency of Q-Flow on open quantum system simulations, including the dissipative harmonic oscillator and the dissipative bosonic model. Q-Flow is superior to conventional PDE solvers and state-of-the-art physics-informed neural network solvers, especially in high-dimensional systems.

normalizing flows husimi q function open quantum dynamics quantum simulation quantum states
Foundational AI Feb 8, 2023

PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

Yilun Xu, Ziming Liu, Yonglong Tian et al.

We introduce a new family of physics-inspired generative models termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). These models realize generative trajectories for $N$ dimensional data by embedding paths in $N{+}D$ dimensional space while still controlling the progression with a simple scalar norm of the $D$ additional variables. The new models reduce to PFGM when $D{=}1$ and to diffusion models when $D{\to}\infty$. The flexibility of choosing $D$ allows us to trade off robustness against rigidity as increasing $D$ results in more concentrated coupling between the data and the additional variable norms. We dispense with the biased large batch field targets used in PFGM and instead provide an unbiased perturbation-based objective similar to diffusion models. To explore different choices of $D$, we provide a direct alignment method for transferring well-tuned hyperparameters from diffusion models ($D{\to} \infty$) to any finite $D$ values. Our experiments show that models with finite $D$ can be superior to previous state-of-the-art diffusion models on CIFAR-10/FFHQ $64{\times}64$ datasets, with FID scores of $1.91/2.43$ when $D{=}2048/128$. In class-conditional setting, $D{=}2048$ yields current state-of-the-art FID of $1.74$ on CIFAR-10. In addition, we demonstrate that models with smaller $D$ exhibit improved robustness against modeling errors. Code is available at https://github.com/Newbeeer/pfgmpp

generative models poisson flow augmentation diffusion models dimension interpolation symmetry preservation
Theoretical Physics Feb 6, 2023

Geometry of contact: contact planning for multi-legged robots via spin models duality

Baxi Chong, Di Luo, Tianyu Wang et al.

Contact planning is crucial in locomoting systems.Specifically, appropriate contact planning can enable versatile behaviors (e.g., sidewinding in limbless locomotors) and facilitate speed-dependent gait transitions (e.g., walk-trot-gallop in quadrupedal locomotors). The challenges of contact planning include determining not only the sequence by which contact is made and broken between the locomotor and the environments, but also the sequence of internal shape changes (e.g., body bending and limb shoulder joint oscillation). Most state-of-art contact planning algorithms focused on conventional robots (e.g.biped and quadruped) and conventional tasks (e.g. forward locomotion), and there is a lack of study on general contact planning in multi-legged robots. In this paper, we show that using geometric mechanics framework, we can obtain the global optimal contact sequence given the internal shape changes sequence. Therefore, we simplify the contact planning problem to a graph optimization problem to identify the internal shape changes. Taking advantages of the spatio-temporal symmetry in locomotion, we map the graph optimization problem to special cases of spin models, which allows us to obtain the global optima in polynomial time. We apply our approach to develop new forward and sidewinding behaviors in a hexapod and a 12-legged centipede. We verify our predictions using numerical and robophysical models, and obtain novel and effective locomotion behaviors.

contact sequence optimization geometric mechanics geometric deep learning spin model duality group theory
Experimental Physics Jan 17, 2023

EPiC-GAN: Equivariant Point Cloud Generation for Particle Jets

Erik Buhmann, Gregor Kasieczka, Jesse Thaler

With the vast data-collecting capabilities of current and future high-energy collider experiments, there is an increasing demand for computationally efficient simulations. Generative machine learning models enable fast event generation, yet so far these approaches are largely constrained to fixed data structures and rigid detector geometries. In this paper, we introduce EPiC-GAN - equivariant point cloud generative adversarial network - which can produce point clouds of variable multiplicity. This flexible framework is based on deep sets and is well suited for simulating sprays of particles called jets. The generator and discriminator utilize multiple EPiC layers with an interpretable global latent vector. Crucially, the EPiC layers do not rely on pairwise information sharing between particles, which leads to a significant speed-up over graph- and transformer-based approaches with more complex relation diagrams. We demonstrate that EPiC-GAN scales well to large particle multiplicities and achieves high generation fidelity on benchmark jet generation tasks.

generative adversarial networks equivariant neural networks jet physics point cloud generation detector simulation
Astrophysics Jan 16, 2023

First demonstration of neural sensing and control in a kilometer-scale gravitational wave observatory

Nikhil Mukund, James Lough, Aparna Bisht et al.

Suspended optics in gravitational wave (GW) observatories are susceptible to alignment perturbations, particularly slow drifts over time, due to variations in temperature and seismic levels. Such misalignments affect the coupling of the incident laser beam into the optical cavities, degrade both circulating power and optomechanical photon squeezing and thus decrease the astrophysical sensitivity to merging binaries. Traditional alignment techniques involve differential wavefront sensing using multiple quadrant photodiodes but are often restricted in bandwidth and are limited by the sensing noise. We present the first-ever successful implementation of neural network-based sensing and control at a gravitational wave observatory and demonstrate low-frequency control of the signal recycling mirror at the GEO 600 detector. Alignment information for three critical optics is simultaneously extracted from the interferometric dark port camera images via a CNN-LSTM network architecture and is then used for MIMO control using soft actor-critic-based deep reinforcement learning. Overall sensitivity improvement achieved using our scheme demonstrates deep learning's capabilities as a viable tool for real-time sensing and control for current and next-generation GW interferometers.

reinforcement learning optical cavity control gravitational waves convolutional networks recurrent networks
Experimental Physics Dec 20, 2022

Comparing Point Cloud Strategies for Collider Event Classification

Peter Onyisi, Delon Shen, Jesse Thaler

In this paper, we compare several event classification architectures defined on the point cloud representation of collider events. These approaches, which are based on the frameworks of deep sets and edge convolutions, circumvent many of the difficulties associated with traditional feature engineering. To benchmark our architectures against more traditional event classification strategies, we perform a case study involving Higgs boson decays to tau leptons. We find a 2.5 times increase in performance compared to a baseline ATLAS analysis with engineered features. Our point cloud architectures can be viewed as simplified versions of graph neural networks, where each particle in the event corresponds to a graph node. In our case study, we find the best balance of performance and computational cost for simple pairwise architectures, which are based on learned edge features.

collider physics classification point cloud event representation edge convolution graph neural networks
Astrophysics Dec 15, 2022

Non-parametric Lagrangian biasing from the insights of neural nets

Xiaohan Wu, Julian B. Munoz, Daniel J. Eisenstein

We present a Lagrangian model of galaxy clustering bias in which we train a neural net using the local properties of the smoothed initial density field to predict the late-time mass-weighted halo field. By fitting the mass-weighted halo field in the AbacusSummit simulations at z=0.5, we find that including three coarsely spaced smoothing scales gives the best recovery of the halo power spectrum. Adding more smoothing scales may lead to 2-5% underestimation of the large-scale power and can cause the neural net to overfit. We find that the fitted halo-to-mass ratio can be well described by two directions in the original high-dimension feature space. Projecting the original features into these two principal components and re-training the neural net either reproduces the original training result, or outperforms it with a better match of the halo power spectrum. The elements of the principal components are unlikely to be assigned physical meanings, partly owing to the features being highly correlated between different smoothing scales. Our work illustrates a potential need to include multiple smoothing scales when studying galaxy bias, and this can be done easily with machine-learning methods that can take in high dimensional input feature space.

lagrangian biasing dimensionality reduction multi-scale smoothing cosmological simulation feature extraction
Theoretical Physics Dec 14, 2022

Simulating 2+1D Lattice Quantum Electrodynamics at Finite Density with Neural Flow Wavefunctions

Zhuo Chen, Di Luo, Kaiwen Hu et al.

We present a neural flow wavefunction, Gauge-Fermion FlowNet, and use it to simulate 2+1D lattice compact quantum electrodynamics with finite density dynamical fermions. The gauge field is represented by a neural network which parameterizes a discretized flow-based transformation of the amplitude while the fermionic sign structure is represented by a neural net backflow. This approach directly represents the $U(1)$ degree of freedom without any truncation, obeys Guass's law by construction, samples autoregressively avoiding any equilibration time, and variationally simulates Gauge-Fermion systems with sign problems accurately. In this model, we investigate confinement and string breaking phenomena in different fermion density and hopping regimes. We study the phase transition from the charge crystal phase to the vacuum phase at zero density, and observe the phase seperation and the net charge penetration blocking effect under magnetic interaction at finite density. In addition, we investigate a magnetic phase transition due to the competition effect between the kinetic energy of fermions and the magnetic energy of the gauge field. With our method, we further note potential differences on the order of the phase transitions between a continuous $U(1)$ system and one with finite truncation. Our state-of-the-art neural network approach opens up new possibilities to study different gauge theories coupled to dynamical matter in higher dimensions.

lattice gauge theory normalizing flows quantum states phase transitions neural backflow
Astrophysics Dec 8, 2022

Stellar Reddening Based Extinction Maps for Cosmological Applications

Nayantara Mudur, Core Francisco Park, Douglas P Finkbeiner

Cosmological surveys must correct their observations for the reddening of extragalactic objects by Galactic dust. Existing dust maps, however, have been found to have spatial correlations with the large-scale structure of the Universe. Errors in extinction maps can propagate systematic biases into samples of dereddened extragalactic objects and into cosmological measurements such as correlation functions between foreground lenses and background objects and the primordial non-gaussianity parameter $f_{NL}$. Emission-based maps are contaminated by the cosmic infrared background, while maps inferred from stellar-reddenings suffer from imperfect removal of quasars and galaxies from stellar catalogs. Thus, stellar-reddening based maps using catalogs without extragalactic objects offer a promising path to making dust maps with minimal correlations with large-scale structure. We present two high-latitude integrated extinction maps based on stellar reddenings, with a point spread function of full-width half-maximum 6.1' and 15'. We employ a strict selection of catalog objects to filter out galaxies and quasars and measure the spatial correlation of our extinction maps with extragalactic structure. Our galactic extinction maps have reduced spatial correlation with large scale structure relative to most existing stellar-reddening based and emission-based extinction maps.

dust extinction mapping large-scale structure contamination bayesian inference stellar catalog cleaning posterior estimation
Astrophysics Dec 7, 2022

Measuring the 8621 Å Diffuse Interstellar Band in Gaia DR3 RVS Spectra: Obtaining a Clean Catalog by Marginalizing over Stellar Types

Andrew K. Saydjari, Ana Sofía M. Uzsoy, Catherine Zucker et al.

Diffuse interstellar bands (DIBs) are broad absorption features associated with interstellar dust and can serve as chemical and kinematic tracers. Conventional measurements of DIBs in stellar spectra are complicated by residuals between observations and best-fit stellar models. To overcome this, we simultaneously model the spectrum as a combination of stellar, dust, and residual components, with full posteriors on the joint distribution of the components. This decomposition is obtained by modeling each component as a draw from a high-dimensional Gaussian distribution in the data-space (the observed spectrum) -- a method we call "Marginalized Analytic Data-space Gaussian Inference for Component Separation" (MADGICS). We use a data-driven prior for the stellar component, which avoids missing stellar features not well-modeled by synthetic spectra. This technique provides statistically rigorous uncertainties and detection thresholds, which are required to work in the low signal-to-noise regime that is commonplace for dusty lines of sight. We reprocess all public Gaia DR3 RVS spectra and present an improved 8621 Å DIB catalog, free of detectable stellar line contamination. We constrain the rest-frame wavelength to $8623.14 \pm 0.087$ Å (vacuum), find no significant evidence for DIBs in the Local Bubble from the $1/6^{\rm{th}}$ of RVS spectra that are public, and show unprecedented correlation with kinematic substructure in Galactic CO maps. We validate the catalog, its reported uncertainties, and biases using synthetic injection tests. We believe MADGICS provides a viable path forward for large-scale spectral line measurements in the presence of complex spectral contamination.

spectral component separation bayesian inference diffuse interstellar band mapping signal detection posterior estimation
Foundational AI Dec 6, 2022

Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior

Gabriel B Margolis, Pulkit Agrawal

Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/

multiplicity of behavior reinforcement learning robustness sim-to-real transfer multi-task learning
Theoretical Physics Dec 2, 2022

Yang-Mills glueball masses from spectral reconstruction

Jan M. Pawlowski, Coralie S. Schneider, Jonas Turnwald et al.

We compute masses of the two lightest glueballs from spectral reconstructions of timelike interaction channels of the four-gluon vertex in Landau gauge Yang-Mills theory. The Euclidean spacelike dressings of the vertex are calculated with the functional renormalisation group. For the spectral reconstruction of these Euclidean data, we employ Gaussian process regression. The glueball resonances can be identified straightforwardly and we obtain $m_{sc} = 1870(75)~$ MeV as well as $m_{ps} = 2700(120)~$ MeV, in accordance with functional bound state and lattice calculations.

quantum field theory spectral methods glueball spectroscopy renormalization inverse problems
Experimental Physics Dec 1, 2022

Variational Neural-Network Ansatz for Continuum Quantum Field Theory

John M. Martyn, Khadijeh Najafi, Di Luo

Physicists dating back to Feynman have lamented the difficulties of applying the variational principle to quantum field theories. In non-relativistic quantum field theories, the challenge is to parameterize and optimize over the infinitely many $n$-particle wave functions comprising the state's Fock space representation. Here we approach this problem by introducing neural-network quantum field states, a deep learning ansatz that enables application of the variational principle to non-relativistic quantum field theories in the continuum. Our ansatz uses the Deep Sets neural network architecture to simultaneously parameterize all of the $n$-particle wave functions comprising a quantum field state. We employ our ansatz to approximate ground states of various field theories, including an inhomogeneous system and a system with long-range interactions, thus demonstrating a powerful new tool for probing quantum field theories.

quantum field theory quantum states neural-network quantum field states monte carlo methods symmetry preservation
Experimental Physics Nov 25, 2022

Search for Higgs boson and observation of Z boson through their decay into a charm quark-antiquark pair in boosted topologies in proton-proton collisions at $\sqrt{s}$ = 13 TeV

CMS Collaboration

A search for the standard model (SM) Higgs boson (H) produced with transverse momentum greater than 450 GeV and decaying to a charm quark-antiquark ($\mathrm{c\bar{c}}$) pair is presented. The search is performed using proton-proton collision data collected at $\sqrt{s}$ = 13 TeV by the CMS experiment at the LHC, corresponding to an integrated luminosity of 138 fb$^{-1}$. Boosted H $\to$ $\mathrm{c\bar{c}}$ decay products are reconstructed as a single large-radius jet and identified using a deep neural network charm tagging technique. The method is validated by measuring the Z $\to$ $\mathrm{c\bar{c}}$ decay process, which is observed in association with jets at high $p_\mathrm{T}$ for the first time with a signal strength of 1.00 $_{-0.14}^{+0.17}$ (syst) $\pm$ 0.08 (theo) $\pm$ 0.06 (stat), defined as the ratio of the observed process rate to the standard model expectation. The observed (expected) upper limit on $σ$(H) $\mathcal{B}$(H $\to$ $\mathrm{c\bar{c}}$) is set at 47 (39) times the SM prediction at 95% confidence level.

collider physics jet physics charm jet tagging new physics searches standard model
Foundational AI Nov 24, 2022

Learning Integrable Dynamics with Action-Angle Networks

Ameya Daigavane, Arthur Kosmala, Miles Cranmer et al.

Machine learning has become increasingly popular for efficiently modelling the dynamics of complex physical systems, demonstrating a capability to learn effective models for dynamics which ignore redundant degrees of freedom. Learned simulators typically predict the evolution of the system in a step-by-step manner with numerical integration techniques. However, such models often suffer from instability over long roll-outs due to the accumulation of both estimation and integration error at each prediction step. Here, we propose an alternative construction for learned physical simulators that are inspired by the concept of action-angle coordinates from classical mechanics for describing integrable systems. We propose Action-Angle Networks, which learn a nonlinear transformation from input coordinates to the action-angle space, where evolution of the system is linear. Unlike traditional learned simulators, Action-Angle Networks do not employ any higher-order numerical integration methods, making them extremely efficient at modelling the dynamics of integrable physical systems.

action-angle coordinates hamiltonian systems integrable systems normalizing flows symmetry preservation
Astrophysics Nov 22, 2022

Can denoising diffusion probabilistic models generate realistic astrophysical fields?

Nayantara Mudur, Douglas P. Finkbeiner

Score-based generative models have emerged as alternatives to generative adversarial networks (GANs) and normalizing flows for tasks involving learning and sampling from complex image distributions. In this work we investigate the ability of these models to generate fields in two astrophysical contexts: dark matter mass density fields from cosmological simulations and images of interstellar dust. We examine the fidelity of the sampled cosmological fields relative to the true fields using three different metrics, and identify potential issues to address. We demonstrate a proof-of-concept application of the model trained on dust in denoising dust images. To our knowledge, this is the first application of this class of models to the interstellar medium.

diffusion models score-based models cosmological simulation generative models dark matter
Foundational AI Nov 21, 2022

Visual Dexterity: In-Hand Reorientation of Novel and Complex Object Shapes

Tao Chen, Megha Tippur, Siyang Wu et al.

In-hand object reorientation is necessary for performing many dexterous manipulation tasks, such as tool use in less structured environments that remain beyond the reach of current robots. Prior works built reorientation systems assuming one or many of the following: reorienting only specific objects with simple shapes, limited range of reorientation, slow or quasistatic manipulation, simulation-only results, the need for specialized and costly sensor suites, and other constraints which make the system infeasible for real-world deployment. We present a general object reorientation controller that does not make these assumptions. It uses readings from a single commodity depth camera to dynamically reorient complex and new object shapes by any rotation in real-time, with the median reorientation time being close to seven seconds. The controller is trained using reinforcement learning in simulation and evaluated in the real world on new object shapes not used for training, including the most challenging scenario of reorienting objects held in the air by a downward-facing hand that must counteract gravity during reorientation. Our hardware platform only uses open-source components that cost less than five thousand dollars. Although we demonstrate the ability to overcome assumptions in prior work, there is ample scope for improving absolute performance. For instance, the challenging duck-shaped object not used for training was dropped in 56 percent of the trials. When it was not dropped, our controller reoriented the object within 0.4 radians (23 degrees) 75 percent of the time. Videos are available at: https://taochenshh.github.io/projects/visual-dexterity.

dexterous manipulation reinforcement learning sim-to-real transfer transfer learning point cloud policy
Theoretical Physics Nov 16, 2022

Characterizing 4-string contact interaction using machine learning

Harold Erbin, Atakan Hilmi Fırat

The geometry of 4-string contact interaction of closed string field theory is characterized using machine learning. We obtain Strebel quadratic differentials on 4-punctured spheres as a neural network by performing unsupervised learning with a custom-built loss function. This allows us to solve for local coordinates and compute their associated mapping radii numerically. We also train a neural network distinguishing vertex from Feynman region. As a check, 4-tachyon contact term in the tachyon potential is computed and a good agreement with the results in the literature is observed. We argue that our algorithm is manifestly independent of number of punctures and scaling it to characterize the geometry of $n$-string contact interaction is feasible.

string theory strebel quadratic differentials physics-informed neural networks loss function design quantum field theory
Theoretical Physics Nov 14, 2022

Aspects of scaling and scalability for flow-based sampling of lattice QCD

Ryan Abbott, Michael S. Albergo, Aleksandar Botev et al.

Recent applications of machine-learned normalizing flows to sampling in lattice field theory suggest that such methods may be able to mitigate critical slowing down and topological freezing. However, these demonstrations have been at the scale of toy models, and it remains to be determined whether they can be applied to state-of-the-art lattice quantum chromodynamics calculations. Assessing the viability of sampling algorithms for lattice field theory at scale has traditionally been accomplished using simple cost scaling laws, but as we discuss in this work, their utility is limited for flow-based approaches. We conclude that flow-based approaches to sampling are better thought of as a broad family of algorithms with different scaling properties, and that scalability must be assessed experimentally.

normalizing flows lattice qcd scalability monte carlo methods critical slowing down
Astrophysics Nov 8, 2022

Limits on Simultaneous and Delayed Optical Emission from Well-localized Fast Radio Bursts

Daichi Hiramatsu, Edo Berger, Brian D. Metzger et al.

We present the largest compilation to date of optical observations during and following fast radio bursts (FRBs). The data set includes our dedicated simultaneous and follow-up observations, as well as serendipitous archival survey observations, for a sample of 15 well-localized FRBs: eight repeating and seven one-off sources. Our simultaneous (and nearly simultaneous with a $0.4$ s delay) optical observations of 13 (1) bursts from the repeating FRB 20220912A provide the deepest such limits to date for any extragalactic FRB, reaching a luminosity limit of $νL_ν\lesssim 10^{42}$ erg s$^{-1}$ ($\lesssim 2\times10^{41}$ erg s$^{-1}$) with $15-400$ s exposures; an optical-flux-to-radio-fluence ratio of $f_{\rm opt}/F_{\rm radio}\lesssim 10^{-7}$ ms$^{-1}$ ($\lesssim 10^{-8}$ ms$^{-1}$); and flux ratio of $f_{\rm opt}/f_{\rm radio}\lesssim 0.02-\lesssim 2\times 10^{-5}$ ($\lesssim 10^{-6}$) on millisecond to second timescales. These simultaneous limits provide useful constraints in the context of FRB emission models, such as the pulsar magnetosphere and pulsar nebula models. Interpreting all available optical limits in the context of the synchrotron maser model, we find that they constrain the flare energies to $\lesssim 10^{43}-10^{49}$ erg (depending on the distances of the various repeating FRBs, with $\lesssim 10^{39}$ erg for the Galactic SGR 1935+2154). These limits are generally at least an order of magnitude larger than those inferred from the FRBs themselves, although in the case of FRB 20220912A our simultaneous and rapid follow-up observations severely restrict the model parameter space. We conclude by exploring the potential of future simultaneous and rapid-response observations with large optical telescopes.

fast radio bursts signal detection synchrotron maser emission magnetar outbursts hypothesis testing
Theoretical Physics Nov 6, 2022

Gauge Equivariant Neural Networks for 2+1D U(1) Gauge Theory Simulations in Hamiltonian Formulation

Di Luo, Shunyue Yuan, James Stokes et al.

Gauge Theory plays a crucial role in many areas in science, including high energy physics, condensed matter physics and quantum information science. In quantum simulations of lattice gauge theory, an important step is to construct a wave function that obeys gauge symmetry. In this paper, we have developed gauge equivariant neural network wave function techniques for simulating continuous-variable quantum lattice gauge theories in the Hamiltonian formulation. We have applied the gauge equivariant neural network approach to find the ground state of 2+1-dimensional lattice gauge theory with U(1) gauge group using variational Monte Carlo. We have benchmarked our approach against the state-of-the-art complex Gaussian wave functions, demonstrating improved performance in the strong coupling regime and comparable results in the weak coupling regime.

equivariant neural networks lattice gauge theory symmetry preservation neural network quantum states quantum simulation
Theoretical Physics Nov 2, 2022

QuACK: Accelerating Gradient-Based Quantum Optimization with Koopman Operator Learning

Di Luo, Jiayu Shen, Rumen Dangovski et al.

Quantum optimization, a key application of quantum computing, has traditionally been stymied by the linearly increasing complexity of gradient calculations with an increasing number of parameters. This work bridges the gap between Koopman operator theory, which has found utility in applications because it allows for a linear representation of nonlinear dynamical systems, and natural gradient methods in quantum optimization, leading to a significant acceleration of gradient-based quantum optimization. We present Quantum-circuit Alternating Controlled Koopman learning (QuACK), a novel framework that leverages an alternating algorithm for efficient prediction of gradient dynamics on quantum computers. We demonstrate QuACK's remarkable ability to accelerate gradient-based optimization across a range of applications in quantum optimization and machine learning. In fact, our empirical studies, spanning quantum chemistry, quantum condensed matter, quantum machine learning, and noisy environments, have shown accelerations of more than 200x speedup in the overparameterized regime, 10x speedup in the smooth regime, and 3x speedup in the non-smooth regime. With QuACK, we offer a robust advancement that harnesses the advantage of gradient-based quantum optimization for practical benefits.

koopman operator learning quantum computing variational quantum algorithms scalability quantum simulation
Foundational AI Oct 30, 2022

A Solvable Model of Neural Scaling Laws

Alexander Maloney, Daniel A. Roberts, James Sully

Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource. To understand this better, we first identify the necessary properties allowing such scaling laws to arise and then propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology. By solving this model in the dual limit of large training set size and large number of parameters, we gain insight into (i) the statistical structure of datasets and tasks that lead to scaling laws, (ii) the way nonlinear feature maps, such as those provided by neural networks, enable scaling laws when trained on these datasets, (iii) the optimality of the equiparameterization scaling of training sets and parameters, and (iv) whether such scaling laws can break down and how they behave when they do. Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps and then translated into power-law scalings of the test loss and how the finite extent of the data's spectral power law causes the model's performance to plateau.

neural scaling laws random feature model spectral methods scalability equiparameterization
Astrophysics Oct 28, 2022

Deep Learning Detection and Classification of Gravitational Waves from Neutron Star-Black Hole Mergers

Richard Qiu, Plamen Krastev, Kiranjyot Gill et al.

The Laser Interferometer Gravitational-Wave Observatory (LIGO) and Virgo Interferometer Collaborations have now detected all three classes of compact binary mergers: binary black hole (BBH), binary neutron star (BNS), and neutron star-black hole (NSBH). For coalescences involving neutron stars, the simultaneous observation of gravitational and electromagnetic radiation produced by an event, has broader potential to enhance our understanding of these events, and also to probe the equation of state (EOS) of dense matter. However, electromagnetic follow-up to gravitational wave (GW) events requires rapid real-time detection and classification of GW signals, and conventional detection approaches are computationally prohibitive for the anticipated rate of detection of next-generation GW detectors. In this work, we present the first deep learning based results of classification of GW signals from NSBH mergers in \textit{real} LIGO data. We show for the first time that a deep neural network can successfully distinguish all three classes of compact binary mergers and separate them from detector noise. Specifically, we train a convolutional neural network (CNN) on $\sim 500,000$ data samples of real LIGO noise with injected BBH, BNS, and NSBH GW signals, and we show that our network has high sensitivity and accuracy. Most importantly, we successfully recover the two confirmed NSBH events to-date (GW200105 and GW200115) and the two confirmed BNS mergers to-date (GW170817 and GW190425), together with $\approx 90\%$ of all BBH candidate events from the third Gravitational Wave Transient Catalog, GWTC-3. These results are an important step towards low-latency real-time GW detection, enabling multi-messenger astronomy.

gravitational waves convolutional networks classification signal detection compact binary coalescence
Theoretical Physics Oct 27, 2022

Large-time correlation functions in bosonic lattice field theories

Cagin Yunus, William Detmold

Large-time correlation functions have a pivotal role in extracting particle masses from Euclidean lattice field theory calculations, however little is known about the statistical properties of these quantities. In this work, the asymptotic form of the distributions of the correlation functions at vanishing momentum is determined for bosonic interacting lattice field theories with a unique gapped vacuum. It is demonstrated that the deviations from the asymptotic form at large Euclidean times can be utilized to determine the spectrum of the theory.

quantum field theory correlation function distributions signal-to-noise problem lattice qcd stochastic processes
Foundational AI Oct 24, 2022

Precision Machine Learning

Eric J. Michaud, Ziming Liu, Max Tegmark

We explore unique considerations involved in fitting ML models to data with very high precision, as is often required for science applications. We empirically compare various function approximation methods and study how they scale with increasing parameters and data. We find that neural networks can often outperform classical approximation methods on high-dimensional examples, by auto-discovering and exploiting modular structures therein. However, neural networks trained with common optimizers are less powerful for low-dimensional cases, which motivates us to study the unique properties of neural network loss landscapes and the corresponding optimization challenges that arise in the high precision regime. To address the optimization issue in low dimensions, we develop training tricks which enable us to train neural networks to extremely low loss, close to the limits allowed by numerical precision.

precision machine learning regression scalability loss function design modular structure discovery
Astrophysics Oct 19, 2022

The First Two Years of FLEET: an Active Search for Superluminous Supernovae

Sebastian Gomez, Edo Berger, Peter K. Blanchard et al.

In November 2019 we began operating FLEET (Finding Luminous and Exotic Extragalactic Transients), a machine learning algorithm designed to photometrically identify Type I superluminous supernovae (SLSNe) in transient alert streams. Using FLEET, we spectroscopically classified 21 of the 50 SLSNe identified worldwide between November 2019 and January 2022. Based on our original algorithm, we anticipated that FLEET would achieve a purity of about 50\% for transients with a probability of being a SLSN, \pslsn$>0.5$; the true on-sky purity we obtained is closer to 80\%. Similarly, we anticipated FLEET could reach a completeness of about 30\%, and we indeed measure an upper limit on the completeness of $\approx 33$\%. Here, we present FLEET 2.0, an updated version of FLEET trained on 4,780 transients (almost 3 times more than in FLEET 1.0). FLEET 2.0 has a similar predicted purity to FLEET 1.0, but outperforms FLEET 1.0 in terms of completeness, which is now closer to $\approx 40$\% for transients with \pslsn$>0.5$. Additionally, we explore possible systematics that might arise from the use of FLEET for target selection. We find that the population of SLSNe recovered by FLEET is mostly indistinguishable from the overall SLSN population, in terms of physical and most observational parameters. We provide FLEET as an open source package on GitHub https://github.com/gmzsebastian/FLEET

supernova classification classification photometric transient classification active learning calibration
Astrophysics Oct 19, 2022

Identifying Tidal Disruption Events with an Expansion of the FLEET Machine Learning Algorithm

Sebastian Gomez, V. Ashley Villar, Edo Berger et al.

We present an expansion of FLEET, a machine learning algorithm optimized to select transients that are most likely to be tidal disruption events (TDEs). FLEET is based on a random forest algorithm trained on the light curves and host galaxy information of 4,779 spectroscopically classified transients. For transients with a probability of being a TDE, \ptde$>0.5$, we can successfully recover TDEs with a $\approx40$\% completeness and a $\approx30$\% purity when using the first 20 days of photometry, or a similar completeness and $\approx50$\% purity when including 40 days of photometry. We find that the most relevant features for differentiating TDEs from other transients are the normalized host separation, and the light curve $(g-r)$ color during peak. Additionally, we use FLEET to produce a list of the 39 most likely TDE candidates discovered by the Zwicky Transient Facility that remain currently unclassified. We explore the use of FLEET for the Legacy Survey of Space and Time on the Vera C. Rubin Observatory (\textit{Rubin}) and the \textit{Nancy Grace Roman Space Telescope} (\textit{Roman}). We simulate the \textit{Rubin} and \textit{Roman} survey strategies and estimate that $\sim 10^4$ TDEs could be discovered every year by \textit{Rubin}, and $\sim200$ TDEs per year by \textit{Roman}. Finally, we run FLEET on the TDEs in our \textit{Rubin} survey simulation and find that we can recover $\sim 30$\% of those at a redshift $z <0.5$ with \ptde$>0.5$. This translates to $\sim3,000$ TDEs per year that FLEET could uncover from \textit{Rubin}. FLEET is provided as a open source package on GitHub https://github.com/gmzsebastian/FLEET

classification tidal disruption events ensemble methods feature extraction photometric transient classification
Theoretical Physics Oct 17, 2022

Symmetries of Calabi-Yau Prepotentials with Isomorphic Flops

Andre Lukas, Fabian Ruehle

Calabi-Yau threefolds with infinitely many flops to isomorphic manifolds have an extended Kahler cone made up from an infinite number of individual Kahler cones. These cones are related by reflection symmetries across flop walls. We study the implications of this cone structure for mirror symmetry, by considering the instanton part of the prepotential in Calabi-Yau threefolds. We show that such isomorphic flops across facets of the Kahler cone boundary give rise to symmetry groups isomorphic to Coxeter groups. In the dual Mori cone, non-flopping curve classes that are identified under these groups have the same Gopakumar-Vafa invariants. This leads to instanton prepotentials invariant under Coxeter groups, which we make manifest by introducing appropriate invariant functions. For some cases, these functions can be expressed in terms of theta functions whose appearance can be linked to an elliptic fibration structure of the Calabi-Yau manifold.

calabi-yau geometry group theory gopakumar-vafa invariants instanton prepotential symmetry breaking
Theoretical Physics Oct 16, 2022

Electric-Magnetic Duality in a Class of $G_2$-Compactifications of M-theory

James Halverson, Benjamin Sung, Jiahua Tian

We study electric-magnetic duality in compactifications of M-theory on twisted connected sum (TCS) $G_2$ manifolds via duality with F-theory. Specifically, we study the physics of the D3-branes in F-theory compactified on a Calabi-Yau fourfold $Y$, dual to a compactification of M-theory on a TCS $G_2$ manifold $X$. $\mathcal{N}=2$ supersymmetry is restored in an appropriate geometric limit. In that limit, we demonstrate that the dual of D3-branes probing seven-branes corresponds to the shrinking of certain surfaces and curves, yielding light particles that may carry both electric and magnetic charges. We provide evidence that the Minahan-Nemeschansky theories with $E_n$ flavor symmetry may be realized in this way. The $SL(2,\mathbb{Z})$ monodromy of the 3/7-brane system is dual to a Fourier-Mukai transform of the dual IIA/M-theory geometry in this limit, and we extrapolate this monodromy action to the global compactification. Away from the limit, the theory is broken to $\mathcal{N}=1$ supersymmetry by a D-term.

string theory g2 compactification electric-magnetic duality conformal field theory calabi-yau geometry
Theoretical Physics Oct 11, 2022

Learning to Optimize Quasi-Newton Methods

Isaac Liao, Rumen R. Dangovski, Jakob N. Foerster et al.

Fast gradient-based optimization algorithms have become increasingly essential for the computationally efficient training of machine learning models. One technique is to multiply the gradient by a preconditioner matrix to produce a step, but it is unclear what the best preconditioner matrix is. This paper introduces a novel machine learning optimizer called LODO, which tries to online meta-learn the best preconditioner during optimization. Specifically, our optimizer merges Learning to Optimize (L2O) techniques with quasi-Newton methods to learn preconditioners parameterized as neural networks; they are more flexible than preconditioners in other quasi-Newton methods. Unlike other L2O methods, LODO does not require any meta-training on a training task distribution, and instead learns to optimize on the fly while optimizing on the test task, adapting to the local characteristics of the loss landscape while traversing it. Theoretically, we show that our optimizer approximates the inverse Hessian in noisy loss landscapes and is capable of representing a wide range of inverse Hessians. We experimentally verify that our algorithm can optimize in noisy settings, and show that simpler alternatives for representing the inverse Hessians worsen performance. Lastly, we use our optimizer to train a semi-realistic deep neural network with 95k parameters at speeds comparable to those of standard neural network optimizers.

quasi-newton methods learning to optimize hypergradient descent inverse problems surrogate modeling
Theoretical Physics Oct 10, 2022

On the Importance of Calibration in Semi-supervised Learning

Charlotte Loh, Rumen Dangovski, Shivchander Sudalairaj et al.

State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data by combining techniques of consistency regularization and pseudo-labeling. During pseudo-labeling, the model's predictions on unlabeled data are used for training and thus, model calibration is important in mitigating confirmation bias. Yet, many SOTA methods are optimized for model performance, with little focus directed to improve model calibration. In this work, we empirically demonstrate that model calibration is strongly correlated with model performance and propose to improve calibration via approximate Bayesian techniques. We introduce a family of new SSL models that optimizes for calibration and demonstrate their effectiveness across standard vision benchmarks of CIFAR-10, CIFAR-100 and ImageNet, giving up to 15.9% improvement in test accuracy. Furthermore, we also demonstrate their effectiveness in additional realistic and challenging problems, such as class-imbalanced datasets and in photonics science.

semi-supervised learning calibration pseudo-labeling uncertainty quantification bayesian inference
Foundational AI Oct 3, 2022

Omnigrok: Grokking Beyond Algorithmic Data

Ziming Liu, Eric J. Michaud, Max Tegmark

Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks, identifying the mismatch between training and test losses as the cause for grokking. We refer to this as the "LU mechanism" because training and test losses (against model weight norm) typically resemble "L" and "U", respectively. This simple mechanism can nicely explain many aspects of grokking: data size dependence, weight decay dependence, the emergence of representations, etc. Guided by the intuitive picture, we are able to induce grokking on tasks involving images, language and molecules. In the reverse direction, we are able to eliminate grokking for algorithmic datasets. We attribute the dramatic nature of grokking for algorithmic datasets to representation learning.

grokking lu mechanism representation learning interpretability weight norm dynamics
Foundational AI Oct 2, 2022

AI-Assisted Discovery of Quantitative and Formal Models in Social Science

Julia Balla, Sihao Huang, Owen Dugan et al.

In social science, formal and quantitative models, such as ones describing economic growth and collective action, are used to formulate mechanistic explanations, provide predictions, and uncover questions about observed phenomena. Here, we demonstrate the use of a machine learning system to aid the discovery of symbolic models that capture nonlinear and dynamical relationships in social science datasets. By extending neuro-symbolic methods to find compact functions and differential equations in noisy and longitudinal data, we show that our system can be used to discover interpretable models from real-world data in economics and sociology. Augmenting existing workflows with symbolic regression can help uncover novel relationships and explore counterfactual models during the scientific process. We propose that this AI-assisted framework can bridge parametric and non-parametric models commonly employed in social science research by systematically exploring the space of nonlinear models and enabling fine-grained control over expressivity and interpretability.

symbolic regression neuro-symbolic integration interpretability automated discovery scientific workflows
Experimental Physics Sep 30, 2022

Finding NEEMo: Geometric Fitting using Neural Estimation of the Energy Mover's Distance

Ouail Kitouni, Niklas Nolte, Mike Williams

A novel neural architecture was recently developed that enforces an exact upper bound on the Lipschitz constant of the model by constraining the norm of its weights in a minimal way, resulting in higher expressiveness compared to other techniques. We present a new and interesting direction for this architecture: estimation of the Wasserstein metric (Earth Mover's Distance) in optimal transport by employing the Kantorovich-Rubinstein duality to enable its use in geometric fitting applications. Specifically, we focus on the field of high-energy particle physics, where it has been shown that a metric for the space of particle-collider events can be defined based on the Wasserstein metric, referred to as the Energy Mover's Distance (EMD). This metrization has the potential to revolutionize data-driven collider phenomenology. The work presented here represents a major step towards realizing this goal by providing a differentiable way of directly calculating the EMD. We show how the flexibility that our approach enables can be used to develop novel clustering algorithms.

optimal transport energy mover's distance lipschitz networks kantorovich-rubinstein duality jet physics
Foundational AI Sep 22, 2022

Poisson Flow Generative Models

Yilun Xu, Ziming Liu, Max Tegmark et al.

We propose a new "Poisson flow" generative model (PFGM) that maps a uniform distribution on a high-dimensional hemisphere into any data distribution. We interpret the data points as electrical charges on the $z=0$ hyperplane in a space augmented with an additional dimension $z$, generating a high-dimensional electric field (the gradient of the solution to Poisson equation). We prove that if these charges flow upward along electric field lines, their initial distribution in the $z=0$ plane transforms into a distribution on the hemisphere of radius $r$ that becomes uniform in the $r \to\infty$ limit. To learn the bijective transformation, we estimate the normalized field in the augmented space. For sampling, we devise a backward ODE that is anchored by the physically meaningful additional dimension: the samples hit the unaugmented data manifold when the $z$ reaches zero. Experimentally, PFGM achieves current state-of-the-art performance among the normalizing flow models on CIFAR-10, with an Inception score of $9.68$ and a FID score of $2.35$. It also performs on par with the state-of-the-art SDE approaches while offering $10\times $ to $20 \times$ acceleration on image generation tasks. Additionally, PFGM appears more tolerant of estimation errors on a weaker network architecture and robust to the step size in the Euler method. The code is available at https://github.com/Newbeeer/poisson_flow .

generative models poisson field normalizing flows augmented dimension sampling physics-informed neural networks
Theoretical Physics Aug 31, 2022

Discovering Conservation Laws using Optimal Transport and Manifold Learning

Peter Y. Lu, Rumen Dangovski, Marin Soljačić

Conservation laws are key theoretical and practical tools for understanding, characterizing, and modeling nonlinear dynamical systems. However, for many complex systems, the corresponding conserved quantities are difficult to identify, making it hard to analyze their dynamics and build stable predictive models. Current approaches for discovering conservation laws often depend on detailed dynamical information or rely on black box parametric deep learning methods. We instead reformulate this task as a manifold learning problem and propose a non-parametric approach for discovering conserved quantities. We test this new approach on a variety of physical systems and demonstrate that our method is able to both identify the number of conserved quantities and extract their values. Using tools from optimal transport theory and manifold learning, our proposed method provides a direct geometric approach to identifying conservation laws that is both robust and interpretable without requiring an explicit model of the system nor accurate time information.

manifold learning optimal transport conservation laws phase space isosurface embedding dimensionality reduction
Astrophysics Aug 29, 2022

Inferring subhalo effective density slopes from strong lensing observations with neural likelihood-ratio estimation

Gemma Zhang, Siddharth Mishra-Sharma, Cora Dvorkin

Strong gravitational lensing has emerged as a promising approach for probing dark matter models on sub-galactic scales. Recent work has proposed the subhalo effective density slope as a more reliable observable than the commonly used subhalo mass function. The subhalo effective density slope is a measurement independent of assumptions about the underlying density profile and can be inferred for individual subhalos through traditional sampling methods. To go beyond individual subhalo measurements, we leverage recent advances in machine learning and introduce a neural likelihood-ratio estimator to infer an effective density slope for populations of subhalos. We demonstrate that our method is capable of harnessing the statistical power of multiple subhalos (within and across multiple images) to distinguish between characteristics of different subhalo populations. The computational efficiency warranted by the neural likelihood-ratio estimator over traditional sampling enables statistical studies of dark matter perturbers and is particularly useful as we expect an influx of strong lensing systems from upcoming surveys.

dark matter likelihood ratio subhalo effective density slope simulation-based inference gravitational lensing
Astrophysics Aug 26, 2022

Uncovering dark matter density profiles in dwarf galaxies with graph neural networks

Tri Nguyen, Siddharth Mishra-Sharma, Reuel Williams et al.

Dwarf galaxies are small, dark matter-dominated galaxies, some of which are embedded within the Milky Way. Their lack of baryonic matter (e.g., stars and gas) makes them perfect test beds for probing the properties of dark matter -- understanding the spatial dark matter distribution in these systems can be used to constrain microphysical dark matter interactions that influence the formation and evolution of structures in our Universe. We introduce a new method that leverages simulation-based inference and graph-based machine learning in order to infer the dark matter density profiles of dwarf galaxies from observable kinematics of stars gravitationally bound to these systems. Our approach aims to address some of the limitations of established methods based on dynamical Jeans modeling. We show that this novel method can place stronger constraints on dark matter profiles and, consequently, has the potential to weigh in on some of the ongoing puzzles associated with the small-scale structure of dark matter halos, such as the core-cusp discrepancy.

graph neural networks simulation-based inference dark matter posterior estimation density estimation
Foundational AI Aug 18, 2022

Stable Object Reorientation using Contact Plane Registration

Richard Li, Carlos Esteves, Ameesh Makadia et al.

We present a system for accurately predicting stable orientations for diverse rigid objects. We propose to overcome the critical issue of modelling multimodality in the space of rotations by using a conditional generative model to accurately classify contact surfaces. Our system is capable of operating from noisy and partially-observed pointcloud observations captured by real world depth cameras. Our method substantially outperforms the current state-of-the-art systems on a simulated stacking task requiring highly accurate rotations, and demonstrates strong sim2real zero-shot transfer results across a variety of unseen objects on a real world reorientation task. Project website: \url{https://richardrl.github.io/stable-reorientation/}

contact plane registration variational autoencoders generative models rotation multimodality geometric deep learning
Experimental Physics Aug 10, 2022

Neural Embedding: Learning the Embedding of the Manifold of Physics Data

Sang Eon Park, Philip Harris, Bryan Ostdiek

In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets.

embeddings manifold learning dimensionality reduction anomaly detection optimal transport
Theoretical Physics Aug 8, 2022

Confinement in non-Abelian lattice gauge theory via persistent homology

Daniel Spitz, Julian M. Urban, Jan M. Pawlowski

We investigate the structure of confining and deconfining phases in SU(2) lattice gauge theory via persistent homology, which gives us access to the topology of a hierarchy of combinatorial objects constructed from given data. Specifically, we use filtrations by traced Polyakov loops, topological densities, holonomy Lie algebra fields, as well as electric and magnetic fields. This allows for a comprehensive picture of confinement. In particular, topological densities form spatial lumps which show signatures of the classical probability distribution of instanton-dyons. Signatures of well-separated dyons located at random positions are encoded in holonomy Lie algebra fields, following the semi-classical temperature dependence of the instanton appearance probability. Debye screening discriminating between electric and magnetic fields is visible in persistent homology and pronounced at large gauge coupling. All employed constructions are gauge-invariant without a priori assumptions on the configurations under study. This work showcases the versatility of persistent homology for statistical and quantum physics studies, barely explored to date.

persistent homology lattice gauge theory phase transitions instanton-dyons topology-based observables
Theoretical Physics Aug 7, 2022

Sampling QCD field configurations with gauge-equivariant flow models

Ryan Abbott, Michael S. Albergo, Aleksandar Botev et al.

Machine learning methods based on normalizing flows have been shown to address important challenges, such as critical slowing-down and topological freezing, in the sampling of gauge field configurations in simple lattice field theories. A critical question is whether this success will translate to studies of QCD. This Proceedings presents a status update on advances in this area. In particular, it is illustrated how recently developed algorithmic components may be combined to construct flow-based sampling algorithms for QCD in four dimensions. The prospects and challenges for future use of this approach in at-scale applications are summarized.

normalizing flows lattice qcd lattice gauge theory equivariant neural networks symmetry preservation
Astrophysics Aug 1, 2022

Robust Clustering of the Local Milky Way Stellar Kinematic Substructures with Gaia eDR3

Xiaowei Ou, Lina Necib, Anna Frebel

We apply the clustering algorithm HDBSCAN on the Gaia early third data release astrometry combined with the Gaia second data release radial velocity measurements of almost 5.5 million stars to identify the local stellar kinematic substructures in the solar neighborhood. Understanding these structures helps build a more complete picture of the formation of the Milky Way, as well as an empirical phase space distribution of dark matter that would inform detection experiments. The main goal of this study is to provide a list of the most stable clusters, by taking into account the measurement uncertainties and studying the stability of the clustering results. We apply the clustering algorithm in two spaces, in velocity space in order to study recently accreted structures, and in action-angle space to find phase-mixed structures. We find 23 (6) robust clusters in velocity space (action-angle space) that are consistently not associated with noise. They are attributed to the known structures: the Gaia Sausage-Enceladus, the Helmi Stream, and globular cluster NGC 3201 are found in both spaces, while NGC 104 and the thick disk (Sequoia) are identified in velocity space (action-angle space). We discuss the kinematic properties of these structures and study whether many of the small clusters belong to a similar larger cluster based on their chemical abundances. Although we do not identify any new structures, we find that the HDBSCAN member selection of already known structures is unstable to input kinematics of the stars when resampled within their uncertainties. We therefore present the most stable subset of local kinematic structures, which are consistently identified by the clustering algorithm, and emphasize the need to take into account error propagation during both the manual and automated identification of stellar structures, both for existing ones as well as future discoveries. (abridged)

clustering uncertainty quantification stellar streams robustness dark matter
Astrophysics Jul 26, 2022

Characterizing the Expected Behavior of Non-Poissonian Template Fitting

Luis Gabriel C. Bariuan, Tracy R. Slatyer

We have performed a systematic study of the statistical behavior of non-Poissonian template fitting (NPTF), a method designed to analyze and characterize unresolved point sources in general counts datasets. In this paper, we focus on the properties and characteristics of the Fermi-LAT gamma-ray data set. In particular, we have simulated and analyzed gamma-ray sky maps under varying conditions of exposure, angular resolution, pixel size, energy window, event selection, and source brightness. We describe how these conditions affect the sensitivity of NPTF to the presence of point sources, for inner-galaxy studies of point sources within the Galactic Center excess, and for the simplified case of isotropic emission. We do not find opportunities for major gains in sensitivity from varying these choices, within the range available with current Fermi-LAT data. We provide an analytic estimate of the NPTF sensitivity to point sources for the case of isotropic emission and perfect angular resolution, and find good agreement with our numerical results for that case.

non-poissonian statistics signal detection likelihood ratio source count function point spread function
Astrophysics Jul 25, 2022

Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks

Christopher J. Shallue, Daniel J. Eisenstein

We present a method to reconstruct the initial linear-regime matter density field from the late-time non-linearly evolved density field in which we channel the output of standard first-order reconstruction to a convolutional neural network (CNN). Our method shows dramatic improvement over the reconstruction of either component alone. We show why CNNs are not well-suited for reconstructing the initial density directly from the late-time density: CNNs are local models, but the relationship between initial and late-time density is not local. Our method leverages standard reconstruction as a preprocessing step, which inverts bulk gravitational flows sourced over very large scales, transforming the residual reconstruction problem from long-range to local and making it ideally suited for a CNN. We develop additional techniques to account for redshift distortions, which warp the density fields measured by galaxy surveys. Our method improves the range of scales of high-fidelity reconstruction by a factor of 2 in wavenumber above standard reconstruction, corresponding to a factor of 8 increase in the number of well-reconstructed modes. In addition, our method almost completely eliminates the anisotropy caused by redshift distortions. As galaxy surveys continue to map the Universe in increasingly greater detail, our results demonstrate the opportunity offered by CNNs to untangle the non-linear clustering at intermediate scales more accurately than ever before.

convolutional networks inverse problems density field reconstruction baryon acoustic oscillations cosmological simulation
Foundational AI Jul 19, 2022

Bounding generalization error with input compression: An empirical study with infinite-width networks

Angus Galloway, Anna Golubeva, Mahmoud Salem et al.

Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.

input compression bound mutual information estimation kernel methods uncertainty quantification representation learning
Theoretical Physics Jul 18, 2022

Gauge-equivariant flow models for sampling in lattice field theories with pseudofermions

Ryan Abbott, Michael S. Albergo, Denis Boyda et al.

This work presents gauge-equivariant architectures for flow-based sampling in fermionic lattice field theories using pseudofermions as stochastic estimators for the fermionic determinant. This is the default approach in state-of-the-art lattice field theory calculations, making this development critical to the practical application of flow models to theories such as QCD. Methods by which flow-based sampling approaches can be improved via standard techniques such as even/odd preconditioning and the Hasenbusch factorization are also outlined. Numerical demonstrations in two-dimensional U(1) and SU(3) gauge theories with $N_f=2$ flavors of fermions are provided.

normalizing flows lattice gauge theory equivariant neural networks pseudofermion sampling lattice qcd
Astrophysics Jul 13, 2022

Modeling early-universe energy injection with Dense Neural Networks

Yitian Sun, Tracy R. Slatyer

We show that Dense Neural Networks can be used to accurately model the cooling of high-energy particles in the early universe, in the context of the public code package DarkHistory. DarkHistory self-consistently computes the temperature and ionization history of the early universe in the presence of exotic energy injections, such as might arise from the annihilation or decay of dark matter. The original version of DarkHistory uses large pre-computed transfer function tables to evolve photon and electron spectra in redshift steps, which require a significant amount of memory and storage space. We present a light version of DarkHistory that makes use of simple Dense Neural Networks to store and interpolate the transfer functions, which performs well on small computers without heavy memory or storage usage. This method anticipates future expansion with additional parametric dependence in the transfer functions without requiring exponentially larger data tables.

surrogate modeling transfer function interpolation dark matter emulation scalability
Foundational AI Jul 1, 2022

Deep Learning and Symbolic Regression for Discovering Parametric Equations

Michael Zhang, Samuel Kim, Peter Y. Lu et al.

Symbolic regression is a machine learning technique that can learn the governing formulas of data and thus has the potential to transform scientific discovery. However, symbolic regression is still limited in the complexity and dimensionality of the systems that it can analyze. Deep learning on the other hand has transformed machine learning in its ability to analyze extremely complex and high-dimensional datasets. We propose a neural network architecture to extend symbolic regression to parametric systems where some coefficient may vary but the structure of the underlying governing equation remains constant. We demonstrate our method on various analytic expressions, ODEs, and PDEs with varying coefficients and show that it extrapolates well outside of the training domain. The neural network-based architecture can also integrate with other deep learning architectures so that it can analyze high-dimensional data while being trained end-to-end. To this end we integrate our architecture with convolutional neural networks to analyze 1D images of varying spring systems.

symbolic regression parametric equation learning interpretability sparse models automated discovery
Astrophysics Jun 29, 2022

Strong Lensing Source Reconstruction Using Continuous Neural Fields

Siddharth Mishra-Sharma, Ge Yang

From the nature of dark matter to the rate of expansion of our Universe, observations of distant galaxies distorted through strong gravitational lensing have the potential to answer some of the major open questions in astrophysics. Modeling galaxy-galaxy strong lensing observations presents a number of challenges as the exact configuration of both the background source and foreground lens galaxy is unknown. A timely call, prompted by a number of upcoming surveys anticipating high-resolution lensing images, demands methods that can efficiently model lenses at their full complexity. In this work, we introduce a method that uses continuous neural fields to non-parametrically reconstruct the complex morphology of a source galaxy while simultaneously inferring a distribution over foreground lens galaxy configurations. We demonstrate the efficacy of our method through experiments on simulated data targeting high-resolution lensing images similar to those anticipated in near-future astrophysical surveys.

continuous neural fields gravitational lensing inverse problems posterior estimation bayesian inference
Astrophysics Jun 23, 2022

The Dark Energy Camera Plane Survey 2 (DECaPS2): More Sky, Less Bias, and Better Uncertainties

A. K. Saydjari, E. F. Schlafly, D. Lang et al.

Deep optical and near-infrared imaging of the entire Galactic plane is essential for understanding our Galaxy's stars, gas, and dust. The second data release of the DECam Plane Survey (DECaPS2) extends the five-band optical and near-infrared survey of the southern Galactic plane to cover $6.5\%$ of the sky, |b| < 10° and 6° > l > -124°, complementary to coverage by Pan-STARRS1. Typical single-exposure effective depths, including crowding effects and other complications, are 23.5, 22.6, 22.1, 21.6, and 20.8 mag in $g$, $r$, $i$, $z$, and $Y$ bands, respectively, with around 1 arcsecond seeing. The survey comprises 3.32 billion objects built from 34 billion detections in 21.4 thousand exposures, totaling 260 hours open shutter time on the Dark Energy Camera (DECam) at Cerro Tololo. The data reduction pipeline features several improvements, including the addition of synthetic source injection tests to validate photometric solutions across the entire survey footprint. A convenient functional form for the detection bias in the faint limit was derived and leveraged to characterize the photometric pipeline performance. A new post-processing technique was applied to every detection to de-bias and improve uncertainty estimates of the flux in the presence of structured backgrounds, specifically targeting nebulosity. The images and source catalogs are publicly available at http://decaps.skymaps.info/.

photometric crowded-field deblending uncertainty quantification structured background estimation calibration model validation
Astrophysics Jun 15, 2022

A Stimulating Explanation of the Extragalactic Radio Background

Andrea Caputo, Hongwan Liu, Siddharth Mishra-Sharma et al.

Despite an intense theoretical and experimental effort over the past decade, observations of the extragalactic radio background at multiple frequencies below 10 GHz are not understood in terms of known radio sources, and may represent a sign of new physics. In this Letter we identify a new class of dark sector models with feebly interacting particles, where dark photons oscillate into ordinary photons that contribute to the radio background. Our scenario can explain both the magnitude and the spectral index of the radio background, while being consistent with other cosmological and astrophysical constraints. These models predict new relativistic degrees of freedom and spectral distortions of the cosmic microwave background, which could be detected in the next generation of experiments.

dark photon oscillation stimulated decay dark sector model dark matter new physics searches
Foundational AI Jun 9, 2022

Overcoming the Spectral Bias of Neural Value Approximation

Ge Yang, Anurag Ajay, Pulkit Agrawal

Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a spectral bias, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. In this work, we re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural tangent kernel. With just a single line-change, our approach, the Fourier feature networks (FFN) produce state-of-the-art performance on challenging continuous control domains with only a fraction of the compute. Faster convergence and better off-policy stability also make it possible to remove the target network without suffering catastrophic divergences, which further reduces TD}(0)'s estimation bias on a few tasks.

reinforcement learning fourier feature networks kernel methods spectral methods neural tangent kernel
Theoretical Physics Jun 8, 2022

Simplifying Polylogarithms with Machine Learning

Aurélien Dersy, Matthew D. Schwartz, Xiaoyuan Zhang

Polylogrithmic functions, such as the logarithm or dilogarithm, satisfy a number of algebraic identities. For the logarithm, all the identities follow from the product rule. For the dilogarithm and higher-weight classical polylogarithms, the identities can involve five functions or more. In many calculations relevant to particle physics, complicated combinations of polylogarithms often arise from Feynman integrals. Although the initial expressions resulting from the integration usually simplify, it is often difficult to know which identities to apply and in what order. To address this bottleneck, we explore to what extent machine learning methods can help. We consider both a reinforcement learning approach, where the identities are analogous to moves in a game, and a transformer network approach, where the problem is viewed analogously to a language-translation task. While both methods are effective, the transformer network appears more powerful and holds promise for practical use in symbolic manipulation tasks in mathematical physics.

transformers polylogarithm identities reinforcement learning symbolic computation scattering amplitudes
Astrophysics May 24, 2022

Revealing the Milky Way's Most Recent Major Merger with a Gaia EDR3 Catalog of Machine-Learned Line-of-Sight Velocities

Adriana Dropulic, Hongwan Liu, Bryan Ostdiek et al.

Machine learning can play a powerful role in inferring missing line-of-sight velocities from astrometry in surveys such as Gaia. In this paper, we apply a neural network to Gaia Early Data Release 3 (EDR3) and obtain line-of-sight velocities and associated uncertainties for ~92 million stars. The network, which takes as input a star's parallax, angular coordinates, and proper motions, is trained and validated on ~6.4 million stars in Gaia with complete phase-space information. The network's uncertainty on its velocity prediction is a key aspect of its design; by properly convolving these uncertainties with the inferred velocities, we obtain accurate stellar kinematic distributions. As a first science application, we use the new network-completed catalog to identify candidate stars that belong to the Milky Way's most recent major merger, Gaia-Sausage-Enceladus (GSE). We present the kinematic, energy, angular momentum, and spatial distributions of the ~450,000 GSE candidates in this sample, and also study the chemical abundances of those with cross matches to GALAH and APOGEE. The network's predictive power will only continue to improve with future Gaia data releases as the training set of stars with complete phase-space information grows. This work provides a first demonstration of how to use machine learning to exploit high-dimensional correlations on data to infer line-of-sight velocities, and offers a template for how to train, validate and apply such a neural network when complete observational data is not available.

uncertainty quantification stellar phase-space inference regression galactic archaeology high-dimensional correlation learning
Theoretical Physics May 20, 2022

Degeneracy Engineering for Classical and Quantum Annealing: A Case Study of Sparse Linear Regression in Collider Physics

Eric R. Anschuetz, Lena Funcke, Patrick T. Komiske et al.

Classical and quantum annealing are computing paradigms that have been proposed to solve a wide range of optimization problems. In this paper, we aim to enhance the performance of annealing algorithms by introducing the technique of degeneracy engineering, through which the relative degeneracy of the ground state is increased by modifying a subset of terms in the objective Hamiltonian. We illustrate this novel approach by applying it to the example of $\ell_0$-norm regularization for sparse linear regression, which is in general an NP-hard optimization problem. Specifically, we show how to cast $\ell_0$-norm regularization as a quadratic unconstrained binary optimization (QUBO) problem, suitable for implementation on annealing platforms. As a case study, we apply this QUBO formulation to energy flow polynomials in high-energy collider physics, finding that degeneracy engineering substantially improves the annealing performance. Our results motivate the application of degeneracy engineering to a variety of regularized optimization problems.

degeneracy engineering sparse models qubo encoding quantum computing regression
Foundational AI May 20, 2022

Towards Understanding Grokking: An Effective Theory of Representation Learning

Ziming Liu, Ouail Kitouni, Niklas Nolte et al.

We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion. We find on transformers the grokking phase stays closer to the memorization phase (compared to the comprehension phase), leading to delayed generalization. The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.

representation learning grokking effective field theory phase transitions embeddings
Theoretical Physics May 13, 2022

Power Counting Energy Flow Polynomials

Pedro Cal, Jesse Thaler, Wouter J. Waalewijn

Power counting is a systematic strategy for organizing collider observables and their associated theoretical calculations. In this paper, we use power counting to characterize a class of jet substructure observables called energy flow polynomials (EFPs). EFPs provide an overcomplete linear basis for infrared-and-collinear safe jet observables, but it is known that in practice, a small subset of EFPs is often sufficient for specific jet analysis tasks. By applying power counting arguments, we obtain linear relationships between EFPs that hold for quark and gluon jets to a specific order in the power counting. We test these relations in the parton shower generator Pythia, finding excellent agreement. Power counting allows us to truncate the basis of EFPs without affecting performance, which we corroborate through a study of quark-gluon tagging and regression.

jet physics energy flow polynomials power counting collider physics effective field theory
Experimental Physics May 10, 2022

Bias and Priors in Machine Learning Calibrations for High Energy Physics

Rikab Gambhir, Benjamin Nachman, Jesse Thaler

Machine learning offers an exciting opportunity to improve the calibration of nearly all reconstructed objects in high-energy physics detectors. However, machine learning approaches often depend on the spectra of examples used during training, an issue known as prior dependence. This is an undesirable property of a calibration, which needs to be applicable in a variety of environments. The purpose of this paper is to explicitly highlight the prior dependence of some machine learning-based calibration strategies. We demonstrate how some recent proposals for both simulation-based and data-based calibrations inherit properties of the sample used for training, which can result in biases for downstream analyses. In the case of simulation-based calibration, we argue that our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls of prior dependence, whereas prior-independent data-based calibration remains an open problem.

calibration prior dependence jet physics collider physics gaussian ansatz
Theoretical Physics May 9, 2022

Disentangling Quarks and Gluons with CMS Open Data

Patrick T. Komiske, Serhii Kryhin, Jesse Thaler

We study quark and gluon jets separately using public collider data from the CMS experiment. Our analysis is based on 2.3/fb of proton-proton collisions at 7 TeV, collected at the Large Hadron Collider in 2011. We define two non-overlapping samples via a pseudorapidity cut -- central jets with |eta| < 0.65 and forward jets with |eta| > 0.65 -- and employ jet topic modeling to extract individual distributions for the maximally separable categories. Under certain assumptions, such as sample independence and mutual irreducibility, these categories correspond to "quark" and "gluon" jets, as given by a recently proposed operational definition. We consider a number of different methods for extracting reducibility factors from the central and forward datasets, from which the fractions of quark jets in each sample can be determined. The greatest stability and robustness to statistical uncertainties is achieved by a novel method based on parametrizing the endpoints of a receiver operating characteristic (ROC) curve. To mitigate detector effects, which would otherwise induce unphysical differences between central and forward jets, we use the OmniFold method to perform central value unfolding. As a demonstration of the power of this method, we extract the intrinsic dimensionality of the quark and gluon jet samples, which exhibit Casimir scaling, as expected from the strongly-ordered limit. To our knowledge, this work is the first application of full phase space unfolding to real collider data, and one of the first applications of topic modeling to extract separate quark and gluon distributions at the LHC.

jet physics jet topic modeling collider physics unfolding roc curve fit
Experimental Physics May 6, 2022

Learning Uncertainties the Frequentist Way: Calibration and Correlation in High Energy Physics

Rikab Gambhir, Benjamin Nachman, Jesse Thaler

Calibration is a common experimental physics problem, whose goal is to infer the value and uncertainty of an unobservable quantity Z given a measured quantity X. Additionally, one would like to quantify the extent to which X and Z are correlated. In this paper, we present a machine learning framework for performing frequentist maximum likelihood inference with Gaussian uncertainty estimation, which also quantifies the mutual information between the unobservable and measured quantities. This framework uses the Donsker-Varadhan representation of the Kullback-Leibler divergence -- parametrized with a novel Gaussian Ansatz -- to enable a simultaneous extraction of the maximum likelihood values, uncertainties, and mutual information in a single training. We demonstrate our framework by extracting jet energy corrections and resolution factors from a simulation of the CMS detector at the Large Hadron Collider. By leveraging the high-dimensional feature space inside jets, we improve upon the nominal CMS jet resolution by upwards of 15%.

calibration uncertainty quantification gaussian ansatz mutual information likelihood ratio
Foundational AI May 5, 2022

Rapid Locomotion via Reinforcement Learning

Gabriel B Margolis, Ge Yang, Kartik Paigwar et al.

Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work. Videos of the robot's behaviors are available at: https://agility.csail.mit.edu/

reinforcement learning adaptive curriculum transfer learning multi-task learning domain randomization
Theoretical Physics May 2, 2022

Infinite Variance in Monte Carlo Sampling of Lattice Field Theories

Cagin Yunus, William Detmold

In Monte Carlo calculations of expectation values in lattice quantum field theories, the stochastic variance of the sampling procedure that is used defines the precision of the calculation for a fixed number of samples. If the variance of an estimator of a particular quantity is formally infinite, or in practice very large compared to the square of the mean, then that quantity can not be reliably estimated using the given sampling procedure. There are multiple scenarios in which this occurs, including in Lattice Quantum Chromodynamics, and a particularly simple example is given by the Gross-Neveu model where Monte Carlo calculations involve the introduction of auxiliary bosonic variables through a Hubbard-Stratonovich (HS) transformation. Here, it is shown that the variances of HS estimators for classes of operators involving fermion fields are divergent in this model and an even simpler zero-dimensional analogue. To correctly estimate these observables, two alternative sampling methods are proposed and numerically investigated.

monte carlo methods infinite variance sampling quantum field theory hubbard-stratonovich transformation lattice qcd
Astrophysics Apr 28, 2022

Going Beyond the Galaxy Power Spectrum: an Analysis of BOSS Data with Wavelet Scattering Transforms

Georgios Valogiannis, Cora Dvorkin

We perform the first application of the wavelet scattering transform (WST) to actual galaxy observations, through a WST analysis of the BOSS DR12 CMASS dataset. We included the effects of redshift-space anisotropy, non-trivial survey geometry, systematic weights, and the Alcock-Paczynski distortion effect, following the commonly adopted steps for the power spectrum analysis. In order to capture the cosmological dependence of the WST, we use galaxy mocks obtained from the state-of-the-art ABACUSSUMMIT simulations, tuned to match the anisotropic correlation function of the BOSS CMASS sample in the redshift range $0.46<z<0.60$. Using our model for the WST coefficients, as well as for the first 2 multipoles of the galaxy power spectrum, that we use as reference, we perform a likelihood analysis of the CMASS data. We obtain the posterior probability distributions of 4 cosmological parameters, $\{ω_b,ω_c,n_s,σ_8\}$, as well as the Hubble constant, derived from a fixed value of the angular size of the sound horizon at last scattering measured by the Planck satellite, all of which are marginalized over the 7 nuisance parameters of the Halo Occupation Distribution model. The WST is found to deliver a substantial improvement in the values of the predicted $1σ$ errors compared to the regular power spectrum, which are tighter by a factor of $3-5$ in the case of flat and uninformative priors and by a factor of $3-8$, when a Big Bang Nucleosynthesis prior is applied on the value of $ω_b$. Our results are investigative and subject to certain approximations, which we discuss in the text.

wavelet scattering transform simulation-based inference posterior estimation bayesian inference cosmological simulation
Foundational AI Apr 21, 2022

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo et al.

We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference between the original sentence and an edited sentence, where the edited sentence is obtained by stochastically masking out the original sentence and then sampling from a masked language model. We show that DiffSCE is an instance of equivariant contrastive learning (Dangovski et al., 2021), which generalizes contrastive learning and learns representations that are insensitive to certain types of augmentations and sensitive to other "harmful" types of augmentations. Our experiments show that DiffCSE achieves state-of-the-art results among unsupervised sentence representation learning methods, outperforming unsupervised SimCSE by 2.3 absolute points on semantic textual similarity tasks.

contrastive learning representation learning embeddings self-supervised learning equivariant neural networks
Astrophysics Apr 20, 2022

Photometrically-Classified Superluminous Supernovae from the Pan-STARRS1 Medium Deep Survey: A Case Study for Science with Machine Learning-Based Classification

Brian Hsu, Griffin Hosseinzadeh, V. Ashley Villar et al.

With the upcoming Vera C.~Rubin Observatory Legacy Survey of Space and Time (LSST), it is expected that only $\sim 0.1\%$ of all transients will be classified spectroscopically. To conduct studies of rare transients, such as Type I superluminous supernovae (SLSNe), we must instead rely on photometric classification. In this vein, here we carry out a pilot study of SLSNe from the Pan-STARRS1 Medium-Deep Survey (PS1-MDS) classified photometrically with our SuperRAENN and Superphot algorithms. We first construct a sub-sample of the photometric sample using a list of simple selection metrics designed to minimize contamination and ensure sufficient data quality for modeling. We then fit the multi-band light curves with a magnetar spin-down model using the Modular Open-Source Fitter for Transients (MOSFiT). Comparing the magnetar engine and ejecta parameter distributions of the photometric sample to those of the PS1-MDS spectroscopic sample and a larger literature spectroscopic sample, we find that these samples are overall consistent, but that the photometric sample extends to slower spins and lower ejecta masses, which correspond to lower luminosity events, as expected for photometric selection. While our PS1-MDS photometric sample is still smaller than the overall SLSN spectroscopic sample, our methodology paves the way to an orders-of-magnitude increase in the SLSN sample in the LSST era through photometric selection and study.

supernova classification classification photometric light curve fitting autoencoders magnetar central engine
Astrophysics Apr 18, 2022

Luminous Supernovae: Unveiling a Population Between Superluminous and Normal Core-collapse Supernovae

Sebastian Gomez, Edo Berger, Matt Nicholl et al.

Stripped-envelope core-collapse supernovae can be divided into two broad classes: the common Type Ib/c supernovae (SNe Ib/c), powered by the radioactive decay of $^{56}$Ni, and the rare superluminous supernovae (SLSNe), most likely powered by the spin-down of a magnetar central engine. Up to now, the intermediate regime between these two populations has remained mostly unexplored. Here, we present a comprehensive study of 40 \textit{luminous supernovae} (LSNe), SNe with peak magnitudes of $M_r = -19$ to $-20$ mag, bound by SLSNe on the bright end and by SNe Ib/c on the dim end. Spectroscopically, LSNe appear to form a continuum between Type Ic SNe and SLSNe. Given their intermediate nature, we model the light curves of all LSNe using a combined magnetar plus radioactive decay model and find that they are indeed intermediate, not only in terms of their peak luminosity and spectra, but also in their rise times, power sources, and physical parameters. We sub-classify LSNe into distinct groups that are either as fast-evolving as SNe Ib/c or as slow-evolving as SLSNe, and appear to be either radioactively or magnetar powered, respectively. Our findings indicate that LSNe are powered by either an over-abundant production of $^{56}$Ni or by weak magnetar engines, and may serve as the missing link between the two populations.

supernova classification light curve modeling magnetar central engine radioactive decay power bayesian inference
Astrophysics Apr 11, 2022

Quantification of high dimensional non-Gaussianities and its implication to Fisher analysis in cosmology

Core Francisco Park, Erwan Allys, Francisco Villaescusa-Navarro et al.

It is well known that the power spectrum is not able to fully characterize the statistical properties of non-Gaussian density fields. Recently, many different statistics have been proposed to extract information from non-Gaussian cosmological fields that perform better than the power spectrum. The Fisher matrix formalism is commonly used to quantify the accuracy with which a given statistic can constrain the value of the cosmological parameters. However, these calculations typically rely on the assumption that the likelihood of the considered statistic follows a multivariate Gaussian distribution. In this work we follow Sellentin & Heavens (2017) and use two different statistical tests to identify non-Gaussianities in different statistics such as the power spectrum, bispectrum, marked power spectrum, and wavelet scatering transform (WST). We remove the non-Gaussian components of the different statistics and perform Fisher matrix calculations with the \textit{Gaussianized} statistics using Quijote simulations. We show that constraints on the parameters can change by a factor of $\sim 2$ in some cases. We show with simple examples how statistics that do not follow a multivariate Gaussian distribution can achieve artificially tight bounds on the cosmological parameters when using the Fisher matrix formalism. We think that the non-Gaussian tests used in this work represent a powerful tool to quantify the robustness of Fisher matrix calculations and their underlying assumptions. We release the code used to compute the power spectra, bispectra, and WST that can be run on both CPUs and GPUs.

fisher information matrix hypothesis testing likelihood estimation uncertainty quantification cosmological simulation
Experimental Physics Apr 5, 2022

Towards Designing and Exploiting Generative Networks for Neutrino Physics Experiments using Liquid Argon Time Projection Chambers

Paul Lutkus, Taritree Wongjirad, Shuchin Aeron

In this paper, we show that a hybrid approach to generative modeling via combining the decoder from an autoencoder together with an explicit generative model for the latent space is a promising method for producing images of particle trajectories in a liquid argon time projection chamber (LArTPC). LArTPCs are a type of particle physics detector used by several current and future experiments focused on studies of the neutrino. We implement a Vector-Quantized Variational Autoencoder (VQ-VAE) and PixelCNN which produces images with LArTPC-like features and introduce a method to evaluate the quality of the images using a semantic segmentation that identifies important physics-based features.

generative models variational autoencoders detector simulation neutrino detection convolutional networks
Foundational AI Apr 5, 2022

Pareto-optimal clustering with the primal deterministic information bottleneck

Andrew K. Tan, Max Tegmark, Isaac L. Chuang

At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.

clustering information bottleneck pareto frontier mapping lossy compression representation learning
Foundational AI Mar 23, 2022

AI Poincaré 2.0: Machine Learning Conservation Laws from Differential Equations

Ziming Liu, Varun Madhavan, Max Tegmark

We present a machine learning algorithm that discovers conservation laws from differential equations, both numerically (parametrized as neural networks) and symbolically, ensuring their functional independence (a non-linear generalization of linear independence). Our independence module can be viewed as a nonlinear generalization of singular value decomposition. Our method can readily handle inductive biases for conservation laws. We validate it with examples including the 3-body problem, the KdV equation and nonlinear Schrödinger equation.

conservation laws functional independence differential rank hamiltonian systems manifold learning
Theoretical Physics Mar 17, 2022

Theoretical tools for neutrino scattering: interplay between lattice QCD, EFTs, nuclear physics, phenomenology, and neutrino event generators

L. Alvarez Ruso, A. M. Ankowski, S. Bacca et al.

Maximizing the discovery potential of increasingly precise neutrino experiments will require an improved theoretical understanding of neutrino-nucleus cross sections over a wide range of energies. Low-energy interactions are needed to reconstruct the energies of astrophysical neutrinos from supernovae bursts and search for new physics using increasingly precise measurement of coherent elastic neutrino scattering. Higher-energy interactions involve a variety of reaction mechanisms including quasi-elastic scattering, resonance production, and deep inelastic scattering that must all be included to reliably predict cross sections for energies relevant to DUNE and other accelerator neutrino experiments. This white paper discusses the theoretical status, challenges, required resources, and path forward for achieving precise predictions of neutrino-nucleus scattering and emphasizes the need for a coordinated theoretical effort involved lattice QCD, nuclear effective theories, phenomenological models of the transition region, and event generators.

neutrino-nucleus cross sections lattice qcd effective field theory nuclear many-body theory coherent elastic neutrino scattering
Foundational AI Mar 16, 2022

Unsupervised Semantic Segmentation by Distilling Feature Correspondences

Mark Hamilton, Zhoutong Zhang, Bharath Hariharan et al.

Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. To solve this task, algorithms must produce features for every pixel that are both semantically meaningful and compact enough to form distinct clusters. Unlike previous works which achieve this with a single end-to-end framework, we propose to separate feature learning from cluster compactification. Empirically, we show that current unsupervised feature learning frameworks already generate dense features whose correlations are semantically consistent. This observation motivates us to design STEGO ($\textbf{S}$elf-supervised $\textbf{T}$ransformer with $\textbf{E}$nergy-based $\textbf{G}$raph $\textbf{O}$ptimization), a novel framework that distills unsupervised features into high-quality discrete semantic labels. At the core of STEGO is a novel contrastive loss function that encourages features to form compact clusters while preserving their relationships across the corpora. STEGO yields a significant improvement over the prior state of the art, on both the CocoStuff ($\textbf{+14 mIoU}$) and Cityscapes ($\textbf{+9 mIoU}$) semantic segmentation challenges.

unsupervised semantic segmentation self-supervised learning contrastive learning feature distillation clustering
Foundational AI Mar 15, 2022

Categorical Representation Learning and RG flow operators for algorithmic classifiers

Artan Sheshmani, Yizhuang You, Wenbo Fu et al.

Following the earlier formalism of the categorical representation learning (arXiv:2103.14770) by the first two authors, we discuss the construction of the "RG-flow based categorifier". Borrowing ideas from theory of renormalization group flows (RG) in quantum field theory, holographic duality, and hyperbolic geometry, and mixing them with neural ODE's, we construct a new algorithmic natural language processing (NLP) architecture, called the RG-flow categorifier or for short the RG categorifier, which is capable of data classification and generation in all layers. We apply our algorithmic platform to biomedical data sets and show its performance in the field of sequence-to-function mapping. In particular we apply the RG categorifier to particular genomic sequences of flu viruses and show how our technology is capable of extracting the information from given genomic sequences, find their hidden symmetries and dominant features, classify them and use the trained data to make stochastic prediction of new plausible generated sequences associated with new set of viruses which could avoid the human immune system. The content of the current article is part of the recent US patent application submitted by first two authors (U.S. Patent Application No.: 63/313.504).

Theoretical Physics Mar 2, 2022

Creating Simple, Interpretable Anomaly Detectors for New Physics in Jet Substructure

Layne Bradshaw, Spencer Chang, Bryan Ostdiek

Anomaly detection with convolutional autoencoders is a popular method to search for new physics in a model-agnostic manner. These techniques are powerful, but they are still a "black box," since we do not know what high-level physical observables determine how anomalous an event is. To address this, we adapt a recently proposed technique by Faucett et al., which maps out the physical observables learned by a neural network classifier, to the case of anomaly detection. We propose two different strategies that use a small number of high-level observables to mimic the decisions made by the autoencoder on background events, one designed to directly learn the output of the autoencoder, and the other designed to learn the difference between the autoencoder's outputs on a pair of events. Despite the underlying differences in their approach, we find that both strategies have similar ordering performance as the autoencoder and independently use the same six high-level observables. From there, we compare the performance of these networks as anomaly detectors. We find that both strategies perform similarly to the autoencoder across a variety of signals, giving a nontrivial demonstration that learning to order background events transfers to ordering a variety of signal events.

anomaly detection autoencoders interpretability jet physics new physics searches
Theoretical Physics Mar 2, 2022

Flow-based density of states for complex actions

Jan M. Pawlowski, Julian M. Urban

Emerging sampling algorithms based on normalizing flows have the potential to solve ergodicity problems in lattice calculations. Furthermore, it has been noted that flows can be used to compute thermodynamic quantities which are difficult to access with traditional methods. This suggests that they are also applicable to the density-of-states approach to complex action problems. In particular, flow-based sampling may be used to compute the density directly, in contradistinction to the conventional strategy of reconstructing it via measuring and integrating the derivative of its logarithm. By circumventing this procedure, the accumulation of errors from the numerical integration is avoided completely and the overall normalization factor can be determined explicitly. In this proof-of-principle study, we demonstrate our method in the context of two-component scalar field theory where the $O(2)$ symmetry is explicitly broken by an imaginary external field. First, we concentrate on the zero-dimensional case which can be solved exactly. We show that with our method, the Lee-Yang zeroes of the associated partition function can be successfully located. Subsequently, we confirm that the flow-based approach correctly reproduces the density computed with conventional methods in one- and two-dimensional models.

normalizing flows density of states density estimation sign problem monte carlo methods
Foundational AI Feb 25, 2022

Fault-Tolerant Neural Networks from Biological Error Correction Codes

Alexander Zlokapa, Andrew K. Tan, John M. Martyn et al.

It has been an open question in deep learning if fault-tolerant computation is possible: can arbitrarily reliable computation be achieved using only unreliable neurons? In the grid cells of the mammalian cortex, analog error correction codes have been observed to protect states against neural spiking noise, but their role in information processing is unclear. Here, we use these biological error correction codes to develop a universal fault-tolerant neural network that achieves reliable computation if the faultiness of each neuron lies below a sharp threshold; remarkably, we find that noisy biological neurons fall below this threshold. The discovery of a phase transition from faulty to fault-tolerant neural computation suggests a mechanism for reliable computation in the cortex and opens a path towards understanding noisy analog systems relevant to artificial intelligence and neuromorphic computing.

robustness fault-tolerant neural computation phase transitions grid code error correction analog fault tolerance
Theoretical Physics Feb 23, 2022

Flow-based sampling in the lattice Schwinger model at criticality

Michael S. Albergo, Denis Boyda, Kyle Cranmer et al.

Recent results suggest that flow-based algorithms may provide efficient sampling of field distributions for lattice field theory applications, such as studies of quantum chromodynamics and the Schwinger model. In this work, we provide a numerical demonstration of robust flow-based sampling in the Schwinger model at the critical value of the fermion mass. In contrast, at the same parameters, conventional methods fail to sample all parts of configuration space, leading to severely underestimated uncertainties.

normalizing flows lattice gauge theory topological freezing monte carlo methods equivariant neural networks
Theoretical Physics Feb 15, 2022

Identifying equivalent Calabi--Yau topologies: A discrete challenge from math and physics for machine learning

Vishnu Jejjala, Washington Taylor, Andrew Turner

We review briefly the characteristic topological data of Calabi--Yau threefolds and focus on the question of when two threefolds are equivalent through related topological data. This provides an interesting test case for machine learning methodology in discrete mathematics problems motivated by physics.

calabi-yau topology topological equivalence string theory triple intersection numbers classification
Astrophysics Feb 10, 2022

Topogivity: A Machine-Learned Chemical Rule for Discovering Topological Materials

Andrew Ma, Yang Zhang, Thomas Christensen et al.

Topological materials present unconventional electronic properties that make them attractive for both basic science and next-generation technological applications. The majority of currently known topological materials have been discovered using methods that involve symmetry-based analysis of the quantum wavefunction. Here we use machine learning to develop a simple-to-use heuristic chemical rule that diagnoses with a high accuracy whether a material is topological using only its chemical formula. This heuristic rule is based on a notion that we term topogivity, a machine-learned numerical value for each element that loosely captures its tendency to form topological materials. We next implement a high-throughput procedure for discovering topological materials based on the heuristic topogivity-rule prediction followed by ab initio validation. This way, we discover new topological materials that are not diagnosable using symmetry indicators, including several that may be promising for experimental observation.

materials discovery classification interpretability automated discovery topological invariants
Theoretical Physics Feb 7, 2022

Finite-Volume Pionless Effective Field Theory for Few-Nucleon Systems with Differentiable Programming

Xiangkai Sun, William Detmold, Di Luo et al.

Finite-volume pionless effective field theory provides an efficient framework for the extrapolation of nuclear spectra and matrix elements calculated at finite volume in lattice QCD to infinite volume, and to nuclei with larger atomic number. In this work, it is demonstrated how this framework may be implemented via a set of correlated Gaussian wavefunctions optimised using differentiable programming and via solution of a generalised eigenvalue problem. This approach is shown to be significantly more efficient than a stochastic implementation of the variational method based on the same form of correlated Gaussian wavefunctions, yielding comparably accurate representations of the ground-state wavefunctions with an order of magnitude fewer terms. The efficiency of representation allows such calculations to be extended to larger systems than in previous work. The method is demonstrated through calculations of the binding energies of nuclei with atomic number $A\in\{2,3,4\}$ in finite volume, matched to lattice QCD calculations at quark masses corresponding to $m_π=806$ MeV, and infinite-volume effective field theory calculations of $A\in\{2,3,4,5,6\}$ systems based on this matching.

effective field theory variational wavefunction optimization finite-volume eft differentiable programming lattice qcd
Astrophysics Jan 18, 2022

Photometry on Structured Backgrounds: Local Pixelwise Infilling by Regression

Andrew K. Saydjari, Douglas P. Finkbeiner

Photometric pipelines struggle to estimate both the flux and flux uncertainty for stars in the presence of structured backgrounds such as filaments or clouds. However, it is exactly stars in these complex regions that are critical to understanding star formation and the structure of the interstellar medium. We develop a method, similar to Gaussian process regression, which we term local pixelwise infilling (LPI). Using a local covariance estimate, we predict the background behind each star and the uncertainty on that prediction in order to improve estimates of flux and flux uncertainty. We show the validity of our model on synthetic data and real dust fields. We further demonstrate that the method is stable even in the crowded field limit. While we focus on optical-IR photometry, this method is not restricted to those wavelengths. We apply this technique to the 34 billion detections in the second data release of the Dark Energy Camera Plane Survey (DECaPS2). In addition to removing many $>3σ$ outliers and improving uncertainty estimates by a factor of $\sim 2-3$ on nebulous fields, we also show that our method is well-behaved on uncrowded fields. The entirely post-processing nature of our implementation of LPI photometry allows it to easily improve the flux and flux uncertainty estimates of past as well as future surveys.

photometric infilling regression uncertainty quantification structured background subtraction kernel methods
Foundational AI Jan 11, 2022

Cracking the Quantum Scaling Limit with Machine Learned Electron Densities

Joshua A. Rackers, Lucas Tecot, Mario Geiger et al.

A long-standing goal of science is to accurately solve the Schrödinger equation for large molecular systems. The poor scaling of current quantum chemistry algorithms on classical computers imposes an effective limit of about a few dozen atoms for which we can calculate molecular electronic structure. We present a machine learning (ML) method to break through this scaling limit and make quantum chemistry calculations of very large systems possible. We show that Euclidean Neural Networks can be trained to predict the electron density with high fidelity from limited data. Learning the electron density allows us to train a machine learning model on small systems and make accurate predictions on large ones. We show that this ML electron density model can break through the quantum scaling limit and calculate the electron density of systems of thousands of atoms with quantum accuracy.

equivariant neural networks electron density prediction quantum scaling limit geometric deep learning scalability
Astrophysics Jan 10, 2022

Constraining the Time of Gravitational Wave Emission from Core-Collapse Supernovae

Kiranjyot Gill, Griffin Hosseinzadeh, Edo Berger et al.

The advent of sensitive gravitational wave (GW) detectors, coupled with wide-field, high cadence optical time-domain surveys, raises the possibility of the first joint GW-electromagnetic (EM) detections of core-collapse supernovae (CCSNe). For targeted searches of GWs from CCSNe optical observations can be used to increase the sensitivity of the search by restricting the relevant time interval, defined here as the GW search window (GSW). The extent of the GSW is a critical factor in determining the achievable false alarm probability (FAP) for a triggered CCSN search. The ability to constrain the GSW from optical observations depends on how early a CCSN is detected, as well as the ability to model the early optical emission. Here we present several approaches to constrain the GSW, ranging in complexity from model-independent analytical fits of the early light curve, model-dependent fits of the rising or entire light curve, and a new data-driven approach using existing well-sampled CCSN light curves from {\it Kepler} and the Transiting Exoplanet Survey Satellite (TESS). We use these approaches to determine the time of core-collapse and its associated uncertainty (i.e., the GSW). We apply our methods to two Type II SNe that occurred during LIGO/Virgo Observing Run 3: SN\,2019fcn and SN\,2019ejj (both in the same galaxy at $d=15.7$ Mpc). Our approach shortens the duration of the GSW and improves the robustness of the GSW compared to techniques used in past GW CCSN searches.

gravitational waves light curve fitting shock breakout timing uncertainty quantification supernova classification
Foundational AI Dec 15, 2021

Invariance Through Latent Alignment

Takuma Yoneda, Ge Yang, Matthew R. Walter et al.

A robot's deployment environment often involves perceptual changes that differ from what it has experienced during training. Standard practices such as data augmentation attempt to bridge this gap by augmenting source images in an effort to extend the support of the training distribution to better cover what the agent might experience at test time. In many cases, however, it is impossible to know test-time distribution-shift a priori, making these schemes infeasible. In this paper, we introduce a general approach, called Invariance Through Latent Alignment (ILA), that improves the test-time performance of a visuomotor control policy in deployment environments with unknown perceptual variations. ILA performs unsupervised adaptation at deployment-time by matching the distribution of latent features on the target domain to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of challenging adaptation scenarios, including changes in lighting conditions, the content in the scene, and camera poses. We present results on calibrated control benchmarks in simulation -- the distractor control suite -- and a physical robot under a sim-to-real setup.

representation learning latent distribution matching robustness reinforcement learning transfer learning
Astrophysics Dec 10, 2021

Impact of Massive Binary Star and Cosmic Evolution on Gravitational Wave Observations II: Double Compact Object Rates and Properties

Floor S. Broekgaarden, Edo Berger, Simon Stevenson et al.

Making the most of the rapidly increasing population of gravitational-wave detections of black hole (BH) and neutron star (NS) mergers requires comparing observations with population synthesis predictions. In this work we investigate the combined impact from the key uncertainties in population synthesis modelling of the isolated binary evolution channel: the physical processes in massive binary-star evolution and the star formation history as a function of metallicity, $Z$, and redshift $z, \mathcal{S}(Z,z)$. Considering these uncertainties we create 560 different publicly available model realizations and calculate the rate and distribution characteristics of detectable BHBH, BHNS, and NSNS mergers. We find that our stellar evolution and $\mathcal{S}(Z,z)$ variations can impact the predicted intrinsic and detectable merger rates by factors $10^2$-$10^4$. We find that BHBH rates are dominantly impacted by $\mathcal{S}(Z,z)$ variations, NSNS rates by stellar evolution variations and BHNS rates by both. We then consider the combined impact from all uncertainties considered in this work on the detectable mass distribution shapes (chirp mass, individual masses and mass ratio). We find that the BHNS mass distributions are predominantly impacted by massive binary-star evolution changes. For BHBH and NSNS we find that both uncertainties are important. We also find that the shape of the delay time and birth metallicity distributions are typically dominated by the choice of $\mathcal{S}(Z,z)$ for BHBH, BHNS and NSNS. We identify several examples of robust features in the mass distributions predicted by all 560 models, such that we expect more than 95% of BHBH detections to contain a BH $\gtrsim 8\,\rm{M}_{\odot}$ and have mass ratios $\lesssim 4$. Our work demonstrates that it is essential to consider a wide range of allowed models to study double compact object merger rates and properties.

gravitational waves stellar evolution binary population synthesis compact object mergers uncertainty quantification
Theoretical Physics Dec 10, 2021

SymmetryGAN: Symmetry Discovery with Deep Learning

Krish Desai, Benjamin Nachman, Jesse Thaler

What are the symmetries of a dataset? Whereas the symmetries of an individual data element can be characterized by its invariance under various transformations, the symmetries of an ensemble of data elements are ambiguous due to Jacobian factors introduced while changing coordinates. In this paper, we provide a rigorous statistical definition of the symmetries of a dataset, which involves inertial reference densities, in analogy to inertial frames in classical mechanics. We then propose SymmetryGAN as a novel and powerful approach to automatically discover symmetries using a deep learning method based on generative adversarial networks (GANs). When applied to Gaussian examples, SymmetryGAN shows excellent empirical performance, in agreement with expectations from the analytic loss landscape. SymmetryGAN is then applied to simulated dijet events from the Large Hadron Collider (LHC) to demonstrate the potential utility of this method in high energy collider physics applications. Going beyond symmetry discovery, we consider procedures to infer the underlying symmetry group from empirical data.

generative adversarial networks symmetry discovery group theory inertial reference density automated discovery
Foundational AI Dec 9, 2021

Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation

Anthony Simeonov, Yilun Du, Andrea Tagliasacchi et al.

We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target (such as a robot gripper or a rack used for hanging) via category-level descriptors. We employ this representation for object manipulation, where given a task demonstration, we want to repeat the same task on a new object instance from the same category. We propose to achieve this objective by searching (via optimization) for the pose whose descriptor matches that observed in the demonstration. NDFs are conveniently trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints. Further, NDFs are SE(3)-equivariant, guaranteeing performance that generalizes across all possible 3D object translations and rotations. We demonstrate learning of manipulation tasks from few (5-10) demonstrations both in simulation and on a real robot. Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors. Project website: https://yilundu.github.io/ndf/.

neural descriptor fields equivariant neural networks geometric deep learning pose descriptor fields representation learning
Theoretical Physics Dec 8, 2021

Building Quantum Field Theories Out of Neurons

James Halverson

An approach to field theory is studied in which fields are comprised of $N$ constituent random neurons. Gaussian theories arise in the infinite-$N$ limit when neurons are independently distributed, via the Central Limit Theorem, while interactions arise due to finite-$N$ effects or non-independently distributed neurons. Euclidean-invariant ensembles of neurons are engineered, with tunable two-point function, yielding families of Euclidean-invariant field theories. Some Gaussian, Euclidean invariant theories are reflection positive, which allows for analytic continuation to a Lorentz-invariant quantum field theory. Examples are presented that yield dual theories at infinite-$N$, but have different symmetries at finite-$N$. Landscapes of classical field configurations are determined by local maxima of parameter distributions. Predictions arise from mixed field-neuron correlators. Near-Gaussianity is exhibited at large-$N$, potentially explaining a feature of field theories in Nature.

quantum field theory neural network qft reflection positivity stochastic processes large-n duality
Theoretical Physics Dec 8, 2021

PQ Axiverse

Mehmet Demirtas, Naomi Gendler, Cody Long et al.

We show that the strong CP problem is solved in a large class of compactifications of string theory. The Peccei-Quinn mechanism solves the strong CP problem if the CP-breaking effects of the ultraviolet completion of gravity and of QCD are small compared to the CP-preserving axion potential generated by low-energy QCD instantons. We characterize both classes of effects. To understand quantum gravitational effects, we consider an ensemble of flux compactifications of type IIB string theory on orientifolds of Calabi-Yau hypersurfaces in the geometric regime, taking a simple model of QCD on D7-branes. We show that the D-brane instanton contribution to the neutron electric dipole moment falls exponentially in $N^4$, with $N$ the number of axions. In particular, this contribution is negligible in all models in our ensemble with $N>17$. We interpret this result as a consequence of large $N$ effects in the geometry that create hierarchies in instanton actions and also suppress the ultraviolet cutoff. We also compute the CP breaking due to high-energy instantons in QCD. In the absence of vectorlike pairs, we find contributions to the neutron electric dipole moment that are not excluded, but that could be accessible to future experiments if the scale of supersymmetry breaking is sufficiently low. The existence of vectorlike pairs can lead to a larger dipole moment. Finally, we show that a significant fraction of models are allowed by standard cosmological and astrophysical constraints.

peccei-quinn mechanism string theory axion quality problem d-brane instantons symmetry breaking
Foundational AI Dec 7, 2021

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields

Dor Verbin, Peter Hedman, Ben Mildenhall et al.

Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF's parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. We show that together with a regularizer on normal vectors, our model significantly improves the realism and accuracy of specular reflections. Furthermore, we show that our model's internal representation of outgoing radiance is interpretable and useful for scene editing.

neural radiance fields reflected radiance parameterization view-dependent appearance disentangled representations loss function design
Theoretical Physics Dec 4, 2021

Machine Learning in Nuclear Physics

Amber Boehnlein, Markus Diefenthaler, Cristiano Fanelli et al.

Advances in machine learning methods provide tools that have broad applicability in scientific research. These techniques are being applied across the diversity of nuclear physics research topics, leading to advances that will facilitate scientific discoveries and societal applications. This Review gives a snapshot of nuclear physics research which has been transformed by machine learning techniques.

bayesian inference uncertainty quantification surrogate modeling lattice qcd monte carlo methods
Astrophysics Dec 1, 2021

Substructure Detection Reanalyzed: Dark Perturber shown to be a Line-of-Sight Halo

Atınç Çağan Şengül, Cora Dvorkin, Bryan Ostdiek et al.

Observations of structure at sub-galactic scales are crucial for probing the properties of dark matter, which is the dominant source of gravity in the universe. It will become increasingly important for future surveys to distinguish between line-of-sight halos and subhalos to avoid wrong inferences on the nature of dark matter. We reanalyze a sub-galactic structure (in lens JVAS B1938+666) that has been previously found using the gravitational imaging technique in galaxy-galaxy lensing systems. This structure has been assumed to be a satellite in the halo of the main lens galaxy. We fit the redshift of the perturber of the system as a free parameter, using the multi-plane thin-lens approximation, and find that the redshift of the perturber is $z_\mathrm{int} = 1.42\substack{+0.10 \\ -0.15}$ (with a main lens redshift of $z=0.881$). Our analysis indicates that this structure is more massive than the previous result by an order of magnitude. This constitutes the first dark perturber shown to be a line-of-sight halo with a gravitational lensing method.

gravitational lensing dark matter multi-plane lensing inverse problems bayesian inference
Theoretical Physics Dec 1, 2021

Infinite Neural Network Quantum States: Entanglement and Training Dynamics

Di Luo, James Halverson

We study infinite limits of neural network quantum states ($\infty$-NNQS), which exhibit representation power through ensemble statistics, and also tractable gradient descent dynamics. Ensemble averages of Renyi entropies are expressed in terms of neural network correlators, and architectures that exhibit volume-law entanglement are presented. A general framework is developed for studying the gradient descent dynamics of neural network quantum states (NNQS), using a quantum state neural tangent kernel (QS-NTK). For $\infty$-NNQS the training dynamics is simplified, since the QS-NTK becomes deterministic and constant. An analytic solution is derived for quantum state supervised learning, which allows an $\infty$-NNQS to recover any target wavefunction. Numerical experiments on finite and infinite NNQS in the transverse field Ising model and Fermi Hubbard model demonstrate excellent agreement with theory. $\infty$-NNQS opens up new opportunities for studying entanglement and training dynamics in other physics applications, such as in finding ground states.

quantum states neural network quantum states entanglement kernel methods quantum state neural tangent kernel
Experimental Physics Nov 30, 2021

Robust and Provably Monotonic Networks

Ouail Kitouni, Niklas Nolte, Mike Williams

The Lipschitz constant of the map between the input and output space represented by a neural network is a natural metric for assessing the robustness of the model. We present a new method to constrain the Lipschitz constant of dense deep learning models that can also be generalized to other architectures. The method relies on a simple weight normalization scheme during training that ensures the Lipschitz constant of every layer is below an upper limit specified by the analyst. A simple monotonic residual connection can then be used to make the model monotonic in any subset of its inputs, which is useful in scenarios where domain knowledge dictates such dependence. Examples can be found in algorithmic fairness requirements or, as presented here, in the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider. Our normalization is minimally constraining and allows the underlying architecture to maintain higher expressiveness compared to other techniques which aim to either control the Lipschitz constant of the model or ensure its monotonicity. We show how the algorithm was used to train a powerful, robust, and interpretable discriminator for heavy-flavor-quark decays, which has been adopted for use as the primary data-selection algorithm in the LHCb real-time data-processing system in the current LHC data-taking period known as Run 3. In addition, our algorithm has also achieved state-of-the-art performance on benchmarks in medicine, finance, and other applications.

lipschitz-constrained networks robustness monotonic residual connection classification trigger systems
Theoretical Physics Nov 22, 2021

Quantum reservoir computing using arrays of Rydberg atoms

Rodrigo Araiza Bravo, Khadijeh Najafi, Xun Gao et al.

Quantum computing promises to provide machine learning with computational advantages. However, noisy intermediate-scale quantum (NISQ) devices pose engineering challenges to realizing quantum machine learning (QML) advantages. Recently, a series of QML computational models inspired by the noise-tolerant dynamics on the brain have emerged as a means to circumvent the hardware limitations of NISQ devices. In this article, we introduce a quantum version of a recurrent neural network (RNN), a well-known model for neural circuits in the brain. Our quantum RNN (qRNN) makes use of the natural Hamiltonian dynamics of an ensemble of interacting spin-1/2 particles as a means for computation. In the limit where the Hamiltonian is diagonal, the qRNN recovers the dynamics of the classical version. Beyond this limit, we observe that the quantum dynamics of the qRNN provide it quantum computational features that can aid it in computation. To this end, we study a qRNN based on arrays of Rydberg atoms, and show that the qRNN is indeed capable of replicating the learning of several cognitive tasks such as multitasking, decision making, and long-term memory by taking advantage of several key features of this platform such as interatomic species interactions, and quantum many-body scars.

quantum reservoir computing recurrent networks hamiltonian systems rydberg atom arrays quantum computing
Astrophysics Nov 19, 2021

New limits on light dark matter - proton cross section from the cosmic large-scale structure

Keir K. Rogers, Cora Dvorkin, Hiranya V. Peiris

We set the strongest limits to-date on the velocity-independent dark matter (DM) - proton cross section $σ$ for DM masses $m = 10\,\mathrm{keV}$ to $100\,\mathrm{GeV}$, using large-scale structure traced by the Lyman-alpha forest: e.g., a 95% lower limit $σ< 6 \times 10^{-30}\,\mathrm{cm}^2$, for $m = 100\,\mathrm{keV}$. Our results complement direct detection, which has limited sensitivity to sub-GeV DM. We use an emulator of cosmological simulations, combined with data from the smallest cosmological scales used to-date, to model and search for the imprint of primordial DM-proton collisions. Cosmological bounds are improved by up to a factor of 25.

dark matter lyman-alpha forest constraints cosmological simulation emulation matter power spectrum suppression
Foundational AI Oct 28, 2021

Equivariant Contrastive Learning

Rumen Dangovski, Li Jing, Charlotte Loh et al.

In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge. In fact, the property of invariance is a trivial instance of a broader class called equivariance, which can be intuitively understood as the property that representations transform according to the way the inputs transform. Here, we show that rather than using only invariance, pre-training that encourages non-trivial equivariance to some transformations, while maintaining invariance to other transformations, can be used to improve the semantic quality of representations. Specifically, we extend popular SSL methods to a more general framework which we name Equivariant Self-Supervised Learning (E-SSL). In E-SSL, a simple additional pre-training objective encourages equivariance by predicting the transformations applied to the input. We demonstrate E-SSL's effectiveness empirically on several popular computer vision benchmarks, e.g. improving SimCLR to 72.5% linear probe accuracy on ImageNet. Furthermore, we demonstrate usefulness of E-SSL for applications beyond computer vision; in particular, we show its utility on regression problems in photonics science. Our code, datasets and pre-trained models are available at https://github.com/rdangovs/essl to aid further research in E-SSL.

self-supervised learning contrastive learning representation learning equivariant neural networks equivariant pretext task
Theoretical Physics Oct 15, 2021

Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

Charlotte Loh, Thomas Christensen, Rumen Dangovski et al.

Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labelled data needed to train the model; this poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Here, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three ``inexpensive'' and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: 1)~abundant unlabeled data, 2)~prior knowledge of symmetries or invariances and 3)~surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.

contrastive learning self-supervised learning surrogate modeling data-scarce learning transfer learning
Experimental Physics Oct 13, 2021

Challenges for Unsupervised Anomaly Detection in Particle Physics

Katherine Fraser, Samuel Homiller, Rashmish K. Mishra et al.

Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence on hyperparameters and the metric used, in the context of anomalous signal (top and $W$) jets in a QCD background. We find that the hyperparameter choices strongly affect the network performance and that the optimal parameters for one signal are non-optimal for another. In exploring the networks, we uncover a connection between the latent space of a variational autoencoder trained using mean-squared-error and the optimal transport distances within the dataset. We then show that optimal transport distances to representative events in the background dataset can be used directly for anomaly detection, with performance comparable to the autoencoders. Whether using autoencoders or optimal transport distances for anomaly detection, we find that the choices that best represent the background are not necessarily best for signal identification. These challenges with unsupervised anomaly detection bolster the case for additional exploration of semi-supervised or alternative approaches.

anomaly detection variational autoencoders optimal transport autoencoders jet physics
Astrophysics Oct 13, 2021

A neural simulation-based inference approach for characterizing the Galactic Center $γ$-ray excess

Siddharth Mishra-Sharma, Kyle Cranmer

The nature of the Fermi gamma-ray Galactic Center Excess (GCE) has remained a persistent mystery for over a decade. Although the excess is broadly compatible with emission expected due to dark matter annihilation, an explanation in terms of a population of unresolved astrophysical point sources e.g., millisecond pulsars, remains viable. The effort to uncover the origin of the GCE is hampered in particular by an incomplete understanding of diffuse emission of Galactic origin. This can lead to spurious features that make it difficult to robustly differentiate smooth emission, as expected for a dark matter origin, from more "clumpy" emission expected for a population of relatively bright, unresolved point sources. We use recent advancements in the field of simulation-based inference, in particular density estimation techniques using normalizing flows, in order to characterize the contribution of modeled components, including unresolved point source populations, to the GCE. Compared to traditional techniques based on the statistical distribution of photon counts, our machine learning-based method is able to utilize more of the information contained in a given model of the Galactic Center emission, and in particular can perform posterior parameter estimation while accounting for pixel-to-pixel spatial correlations in the gamma-ray map. This makes the method demonstrably more resilient to certain forms of model misspecification. On application to Fermi data, the method generically attributes a smaller fraction of the GCE flux to unresolved point sources when compared to traditional approaches. We nevertheless infer such a contribution to make up a non-negligible fraction of the GCE across all analysis variations considered, with at least $38^{+9}_{-19}\%$ of the excess attributed to unresolved point sources in our baseline analysis.

simulation-based inference normalizing flows density estimation posterior estimation bayesian inference
Foundational AI Oct 10, 2021

Mixture Model Auto-Encoders: Deep Clustering through Dictionary Learning

Alexander Lin, Andrew H. Song, Demba Ba

State-of-the-art approaches for clustering high-dimensional data utilize deep auto-encoder architectures. Many of these networks require a large number of parameters and suffer from a lack of interpretability, due to the black-box nature of the auto-encoders. We introduce Mixture Model Auto-Encoders (MixMate), a novel architecture that clusters data by performing inference on a generative model. Derived from the perspective of sparse dictionary learning and mixture models, MixMate comprises several auto-encoders, each tasked with reconstructing data in a distinct cluster, while enforcing sparsity in the latent space. Through experiments on various image datasets, we show that MixMate achieves competitive performance compared to state-of-the-art deep clustering algorithms, while using orders of magnitude fewer parameters.

autoencoders clustering sparse models generative models bayesian inference
Theoretical Physics Oct 7, 2021

Towards Quantum Simulations in Particle Physics and Beyond on Noisy Intermediate-Scale Quantum Devices

Lena Funcke, Tobias Hartung, Karl Jansen et al.

We review two algorithmic advances that bring us closer to reliable quantum simulations of model systems in high energy physics and beyond on noisy intermediate-scale quantum (NISQ) devices. The first method is the dimensional expressivity analysis of quantum circuits, which allows for constructing minimal but maximally expressive quantum circuits. The second method is an efficient mitigation of readout errors on quantum devices. Both methods can lead to significant improvements in quantum simulations, e.g., when variational quantum eigensolvers are used.

quantum simulation quantum computing variational quantum eigensolver readout error mitigation dimensional expressivity analysis
Theoretical Physics Oct 7, 2021

Pruning a restricted Boltzmann machine for quantum state reconstruction

Anna Golubeva, Roger G. Melko

Restricted Boltzmann machines (RBMs) have proven to be a powerful tool for learning quantum wavefunction representations from qubit projective measurement data. Since the number of classical parameters needed to encode a quantum wavefunction scales rapidly with the number of qubits, the ability to learn efficient representations is of critical importance. In this paper we study magnitude-based pruning as a way to compress the wavefunction representation in an RBM, focusing on RBMs trained on data from the transverse-field Ising model in one dimension. We find that pruning can reduce the total number of RBM weights, but the threshold at which the reconstruction accuracy starts to degrade varies significantly depending on the phase of the model. In a gapped region of the phase diagram, the RBM admits pruning over half of the weights while still accurately reproducing relevant physical observables. At the quantum critical point however, even a small amount of pruning can lead to significant loss of accuracy in the physical properties of the reconstructed quantum state. Our results highlight the importance of tracking all relevant observables as their sensitivity varies strongly with pruning. Finally, we find that sparse RBMs are trainable and discuss how a successful sparsity pattern can be created without pruning.

quantum states quantum state reconstruction restricted boltzmann machine sparse models magnitude-based pruning
Foundational AI Oct 7, 2021

Observation of enhanced free-electron radiation from photonic flatband resonances

Yi Yang, Charles Roques-Carmes, Steven E. Kooi et al.

Flatbands emerge from a myriad of structures such as Landau levels, Lieb and Kagome lattices, linegraphs, and more recently moire superlattices. They enable unique properties including slow light in photonics, correlated phases in electronics, and supercollimation in both systems. Despite these intense parallel efforts, flatbands have never been shown to affect the core light-matter interaction between electrons and photons, which is limited by a dimensionality mismatch. Here, we reveal that a photonic flatband can overcome this mismatch between localized electrons and extended photons and thus remarkably boost their light-matter interaction. We design flatband resonances in a silicon-on-insulator photonic crystal slab to control and enhance the radiation emission from free electrons by tuning their trajectory and velocity. In particular, we record a 100-fold radiation enhancement from the conventional diffraction-enabled Smith-Purcell radiation, and show the potential of our approach to achieve $10^6$-fold enhancements and beyond. The enhancement also enables us to perform polarization shaping of free electron radiation from multiple flatbands and demonstrate an approach to measure photonic bands via angle-resolved electron-beam measurements. Our results suggest flatbands as ideal test beds for strong light-electron interaction in various systems, with particular relevance for efficient and compact free-electron light sources and accelerators.

photonic flatband resonances free-electron radiation enhancement smith-purcell radiation photonic crystal slab momentum mismatch
Theoretical Physics Oct 6, 2021

Classical Shadows for Quantum Process Tomography on Near-term Quantum Computers

Ryan Levy, Di Luo, Bryan K. Clark

Quantum process tomography is a powerful tool for understanding quantum channels and characterizing properties of quantum devices. Inspired by recent advances using classical shadows in quantum state tomography [H.-Y. Huang, R. Kueng, and J. Preskill, Nat. Phys. 16, 1050 (2020).], we have developed ShadowQPT, a classical shadow method for quantum process tomography. We introduce two related formulations with and without ancilla qubits. ShadowQPT stochastically reconstructs the Choi matrix of the device allowing for an a-posteri classical evaluation of the device on arbitrary inputs with respect to arbitrary outputs. Using shadows we then show how to compute overlaps, generate all $k$-weight reduced processes, and perform reconstruction via Hamiltonian learning. These latter two tasks are efficient for large systems as the number of quantum measurements needed scales only logarithmically with the number of qubits. A number of additional approximations and improvements are developed including the use of a pair-factorized Clifford shadow and a series of post-processing techniques which significantly enhance the accuracy for recovering the quantum channel. We have implemented ShadowQPT using both Pauli and Clifford measurements on the IonQ trapped ion quantum computer for quantum processes up to $n=4$ qubits and achieved good performance.

quantum process tomography classical shadow tomography choi matrix reconstruction quantum computing quantum states
Astrophysics Oct 4, 2021

Inferring dark matter substructure with astrometric lensing beyond the power spectrum

Siddharth Mishra-Sharma

Astrometry -- the precise measurement of positions and motions of celestial objects -- has emerged as a promising avenue for characterizing the dark matter population in our Galaxy. By leveraging recent advances in simulation-based inference and neural network architectures, we introduce a novel method to search for global dark matter-induced gravitational lensing signatures in astrometric datasets. Our method based on neural likelihood-ratio estimation shows significantly enhanced sensitivity to a cold dark matter population and more favorable scaling with measurement noise compared to existing approaches based on two-point correlation statistics. We demonstrate the real-world viability of our method by showing it to be robust to non-trivial modeled as well as unmodeled noise features expected in astrometric measurements. This establishes machine learning as a powerful tool for characterizing dark matter using astrometric data.

dark matter simulation-based inference likelihood ratio astrometric lensing convolutional networks
Foundational AI Sep 28, 2021

Physics-Augmented Learning: A New Paradigm Beyond Physics-Informed Learning

Ziming Liu, Yunyue Chen, Yuanqi Du et al.

Integrating physical inductive biases into machine learning can improve model generalizability. We generalize the successful paradigm of physics-informed learning (PIL) into a more general framework that also includes what we term physics-augmented learning (PAL). PIL and PAL complement each other by handling discriminative and generative properties, respectively. In numerical experiments, we show that PAL performs well on examples where PIL is inapplicable or inefficient.

physics-augmented learning physics-informed neural networks generative physics properties lagrangian methods loss function design
Experimental Physics Sep 27, 2021

Presenting Unbinned Differential Cross Section Results

Miguel Arratia, Anja Butter, Mario Campanelli et al.

Machine learning tools have empowered a qualitatively new way to perform differential cross section measurements whereby the data are unbinned, possibly in many dimensions. Unbinned measurements can enable, improve, or at least simplify comparisons between experiments and with theoretical predictions. Furthermore, many-dimensional measurements can be used to define observables after the measurement instead of before. There is currently no community standard for publishing unbinned data. While there are also essentially no measurements of this type public, unbinned measurements are expected in the near future given recent methodological advances. The purpose of this paper is to propose a scheme for presenting and using unbinned results, which can hopefully form the basis for a community standard to allow for integration into analysis workflows. This is foreseen to be the start of an evolving community dialogue, in order to accommodate future developments in this field that is rapidly evolving.

unfolding unbinned unfolding density estimation simulation-based inference uncertainty quantification
Theoretical Physics Sep 3, 2021

Deep Set Auto Encoders for Anomaly Detection in Particle Physics

Bryan Ostdiek

There is an increased interest in model agnostic search strategies for physics beyond the standard model at the Large Hadron Collider. We introduce a Deep Set Variational Autoencoder and present results on the Dark Machines Anomaly Score Challenge. We find that the method attains the best anomaly detection ability when there is no decoding step for the network, and the anomaly score is based solely on the representation within the encoded latent space. This method was one of the top-performing models in the Dark Machines Challenge, both for the open data sets as well as the blinded data sets.

anomaly detection variational autoencoders deep set networks new physics searches collider physics
Foundational AI Aug 31, 2021

Machine-Learning media bias

Samantha D'Alonzo, Max Tegmark

We present an automated method for measuring media bias. Inferring which newspaper published a given article, based only on the frequencies with which it uses different phrases, leads to a conditional probability distribution whose analysis lets us automatically map newspapers and phrases into a bias space. By analyzing roughly a million articles from roughly a hundred newspapers for bias in dozens of news topics, our method maps newspapers into a two-dimensional bias landscape that agrees well with previous bias classifications based on human judgement. One dimension can be interpreted as traditional left-right bias, the other as establishment bias. This means that although news bias is inherently political, its measurement need not be.

dimensionality reduction spectral methods phrase bias representation learning media bias landscape
Foundational AI Aug 30, 2021

What You Can Learn by Staring at a Blank Wall

Prafull Sharma, Miika Aittala, Yoav Y. Schechner et al.

We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two moving people, or the activity of a person in the hidden scene. We train two convolutional neural networks using data collected from 20 different scenes, and achieve an accuracy of $\approx94\%$ for both tasks in unseen test environments and real-time online settings. Unlike other passive non-line-of-sight methods, the technique does not rely on known occluders or controllable light sources, and generalizes to unknown rooms with no re-calibration. We analyze the generalization and robustness of our method with both real and synthetic data, and study the effect of the scene parameters on the signal quality.

non-line-of-sight imaging convolutional networks classification signal detection indirect illumination model
Astrophysics Aug 27, 2021

Hardware-accelerated Inference for Real-Time Gravitational-Wave Astronomy

Alec Gunny, Dylan Rankin, Jeffrey Krupa et al.

The field of transient astronomy has seen a revolution with the first gravitational-wave detections and the arrival of multi-messenger observations they enabled. Transformed by the first detection of binary black hole and binary neutron star mergers, computational demands in gravitational-wave astronomy are expected to grow by at least a factor of two over the next five years as the global network of kilometer-scale interferometers are brought to design sensitivity. With the increase in detector sensitivity, real-time delivery of gravitational-wave alerts will become increasingly important as an enabler of multi-messenger followup. In this work, we report a novel implementation and deployment of deep learning inference for real-time gravitational-wave data denoising and astrophysical source identification. This is accomplished using a generic Inference-as-a-Service model that is capable of adapting to the future needs of gravitational-wave data analysis. Our implementation allows seamless incorporation of hardware accelerators and also enables the use of commercial or private (dedicated) as-a-service computing. Based on our results, we propose a paradigm shift in low-latency and offline computing in gravitational-wave astronomy. Such a shift can address key challenges in peak-usage, scalability and reliability, and provide a data analysis platform particularly optimized for deep learning applications. The achieved sub-millisecond scale latency will also be relevant for any machine learning-based real-time control systems that may be invoked in the operation of near-future and next generation ground-based laser interferometers, as well as the front-end collection, distribution and processing of data from such instruments.

gravitational waves inference-as-a-service convolutional networks signal detection hardware acceleration
Astrophysics Aug 17, 2021

Towards an Optimal Estimation of Cosmological Parameters with the Wavelet Scattering Transform

Georgios Valogiannis, Cora Dvorkin

Optimal extraction of the non-Gaussian information encoded in the Large-Scale Structure (LSS) of the universe lies at the forefront of modern precision cosmology. We propose achieving this task through the use of the Wavelet Scattering Transform (WST), which subjects an input field to a layer of non-linear transformations that are sensitive to non-Gaussianity in spatial density distributions through a generated set of WST coefficients. In order to assess its applicability in the context of LSS surveys, we apply the WST on the 3D overdensity field obtained by the Quijote simulations, out of which we extract the Fisher information in 6 cosmological parameters. It is subsequently found to deliver a large improvement in the marginalized errors on all parameters, ranging between $1.2-4\times$ tighter than the corresponding ones obtained from the regular 3D cold dark matter + baryon power spectrum, as well as a $50 \%$ improvement over the neutrino mass constraint given by the marked power spectrum. Through this first application on 3D cosmological fields, we demonstrate the great promise held by this novel statistic and set the stage for its future application to actual galaxy observations.

wavelet scattering transform non-gaussian statistics cosmological simulation feature extraction bayesian inference
Theoretical Physics Aug 4, 2021

Deep multi-task mining Calabi-Yau four-folds

Harold Erbin, Riccardo Finotello, Robin Schneider et al.

We continue earlier efforts in computing the dimensions of tangent space cohomologies of Calabi-Yau manifolds using deep learning. In this paper, we consider the dataset of all Calabi-Yau four-folds constructed as complete intersections in products of projective spaces. Employing neural networks inspired by state-of-the-art computer vision architectures, we improve earlier benchmarks and demonstrate that all four non-trivial Hodge numbers can be learned at the same time using a multi-task architecture. With 30% (80%) training ratio, we reach an accuracy of 100% for $h^{(1,1)}$ and 97% for $h^{(2,1)}$ (100% for both), 81% (96%) for $h^{(3,1)}$, and 49% (83%) for $h^{(2,2)}$. Assuming that the Euler number is known, as it is easy to compute, and taking into account the linear constraint arising from index computations, we get 100% total accuracy.

multi-task learning calabi-yau manifolds hodge number prediction string theory convolutional networks
Theoretical Physics Aug 3, 2021

Nonperturbative renormalization for the neural network-QFT correspondence

Harold Erbin, Vincent Lahoche, Dine Ousmane Samary

In a recent work arXiv:2008.08601, Halverson, Maiti and Stoner proposed a description of neural networks in terms of a Wilsonian effective field theory. The infinite-width limit is mapped to a free field theory, while finite $N$ corrections are taken into account by interactions (non-Gaussian terms in the action). In this paper, we study two related aspects of this correspondence. First, we comment on the concepts of locality and power-counting in this context. Indeed, these usual space-time notions may not hold for neural networks (since inputs can be arbitrary), however, the renormalization group provides natural notions of locality and scaling. Moreover, we comment on several subtleties, for example, that data components may not have a permutation symmetry: in that case, we argue that random tensor field theories could provide a natural generalization. Second, we improve the perturbative Wilsonian renormalization from arXiv:2008.08601 by providing an analysis in terms of the nonperturbative renormalization group using the Wetterich-Morris equation. An important difference with usual nonperturbative RG analysis is that only the effective (IR) 2-point function is known, which requires setting the problem with care. Our aim is to provide a useful formalism to investigate neural networks behavior beyond the large-width limit (i.e.~far from Gaussian limit) in a nonperturbative fashion. A major result of our analysis is that changing the standard deviation of the neural network weight distribution can be interpreted as a renormalization flow in the space of networks. We focus on translations invariant kernels and provide preliminary numerical results.

renormalization nn-qft correspondence quantum field theory functional renormalization group effective field theory
Foundational AI Jul 22, 2021

Discovering Sparse Interpretable Dynamics from Partial Observations

Peter Y. Lu, Joan Ariño, Marin Soljačić

Identifying the governing equations of a nonlinear dynamical system is key to both understanding the physical features of the system and constructing an accurate model of the dynamics that generalizes well beyond the available data. We propose a machine learning framework for discovering these governing equations using only partial observations, combining an encoder for state reconstruction with a sparse symbolic model. Our tests show that this method can successfully reconstruct the full system state and identify the underlying dynamics for a variety of ODE and PDE systems.

sparse models symbolic regression interpretability system identification autoencoders
Experimental Physics Jul 19, 2021

Neural Conditional Reweighting

Benjamin Nachman, Jesse Thaler

There is a growing use of neural network classifiers as unbinned, high-dimensional (and variable-dimensional) reweighting functions. To date, the focus has been on marginal reweighting, where a subset of features are used for reweighting while all other features are integrated over. There are some situations, though, where it is preferable to condition on auxiliary features instead of marginalizing over them. In this paper, we introduce neural conditional reweighting, which extends neural marginal reweighting to the conditional case. This approach is particularly relevant in high-energy physics experiments for reweighting detector effects conditioned on particle-level truth information. We leverage a custom loss function that not only allows us to achieve neural conditional reweighting through a single training procedure, but also yields sensible interpolation even in the presence of phase space holes. As a specific example, we apply neural conditional reweighting to the energy response of high-energy jets, which could be used to improve the modeling of physics objects in parametrized fast simulation packages.

conditional reweighting likelihood ratio simulation-based inference detector simulation loss function design
Theoretical Physics Jul 1, 2021

Flow-based sampling for multimodal and extended-mode distributions in lattice field theory

Daniel C. Hackett, Chung-Chun Hsieh, Sahil Pontula et al.

Recent results have demonstrated that samplers constructed with flow-based generative models are a promising new approach for configuration generation in lattice field theory. In this paper, we present a set of training- and architecture-based methods to construct flow models for targets with multiple separated modes (i.e.~vacua) as well as targets with extended/continuous modes. We demonstrate the application of these methods to modeling two-dimensional real and complex scalar field theories in their symmetry-broken phases. In this context we investigate different flow-based sampling algorithms, including a composite sampling algorithm where flow-based proposals are occasionally augmented by applying updates using traditional algorithms like HMC.

normalizing flows monte carlo methods lattice qcd mode collapse quantum field theory
Foundational AI Jun 29, 2021

Learning Task Informed Abstractions

Xiang Fu, Ge Yang, Pulkit Agrawal et al.

Current model-based reinforcement learning methods struggle when operating from complex visual scenes due to their inability to prioritize task-relevant features. To mitigate this problem, we propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors. For learning TIA, we introduce the formalism of Task Informed MDP (TiMDP) that is realized by training two models that learn visual features via cooperative reconstruction, but one model is adversarially dissociated from the reward signal. Empirical evaluation shows that TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks where natural and unconstrained visual distractions pose a formidable challenge.

reinforcement learning representation learning task-informed mdp distractor separation disentangled representations
Theoretical Physics Jun 18, 2021

Single electrons on solid neon as a solid-state qubit platform

Xianjing Zhou, Gerwin Koolstra, Xufeng Zhang et al.

Progress toward the realization of quantum computers requires persistent advances in their constituent building blocks - qubits. Novel qubit platforms that simultaneously embody long coherence, fast operation, and large scalability offer compelling advantages in the construction of quantum computers and many other quantum information systems. Electrons, ubiquitous elementary particles of nonzero charge, spin, and mass, have commonly been perceived as paradigmatic local quantum information carriers. Despite superior controllability and configurability, their practical performance as qubits via either motional or spin states depends critically on their material environment. Here we report our experimental realization of a new qubit platform based upon isolated single electrons trapped on an ultraclean solid neon surface in vacuum. By integrating an electron trap in a circuit quantum electrodynamics architecture, we achieve strong coupling between the motional states of a single electron and a single microwave photon in an on-chip superconducting resonator. Qubit gate operations and dispersive readout are implemented to measure the energy relaxation time $T_1$ of $15~μ$s and phase coherence time $T_2$ over $200~$ns. These results indicate that the electron-on-solid-neon qubit already performs near the state of the art as a charge qubit.

quantum computing circuit quantum electrodynamics qubit coherence quantum states charge qubit
Foundational AI Jun 18, 2021

The Principles of Deep Learning Theory

Daniel A. Roberts, Sho Yaida, Boris Hanin

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

effective field theory depth-to-width ratio renormalization neural tangent kernel representation learning
Foundational AI Jun 14, 2021

Toward Automatic Interpretation of 3D Plots

Laura E. Brandt, William T. Freeman

This paper explores the challenge of teaching a machine how to reverse-engineer the grid-marked surfaces used to represent data in 3D surface plots of two-variable functions. These are common in scientific and economic publications; and humans can often interpret them with ease, quickly gleaning general shape and curvature information from the simple collection of curves. While machines have no such visual intuition, they do have the potential to accurately extract the more detailed quantitative data that guided the surface's construction. We approach this problem by synthesizing a new dataset of 3D grid-marked surfaces (SurfaceGrid) and training a deep neural net to estimate their shape. Our algorithm successfully recovers shape information from synthetic 3D surface plots that have had axes and shading information removed, been rendered with a variety of grid types, and viewed from a range of viewpoints.

3d surface reconstruction convolutional networks inverse problems shape-from-contour synthetic dataset generation
Theoretical Physics Jun 10, 2021

Flow-based sampling for fermionic lattice field theories

Michael S. Albergo, Gurtej Kanwar, Sébastien Racanière et al.

Algorithms based on normalizing flows are emerging as promising machine learning approaches to sampling complicated probability distributions in a way that can be made asymptotically exact. In the context of lattice field theory, proof-of-principle studies have demonstrated the effectiveness of this approach for scalar theories, gauge theories, and statistical systems. This work develops approaches that enable flow-based sampling of theories with dynamical fermions, which is necessary for the technique to be applied to lattice field theory studies of the Standard Model of particle physics and many condensed matter systems. As a practical demonstration, these methods are applied to the sampling of field configurations for a two-dimensional theory of massless staggered fermions coupled to a scalar field via a Yukawa interaction.

normalizing flows pseudofermion method generative models lattice qcd quantum field theory
Foundational AI Jun 4, 2021

Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering

Vincent Sitzmann, Semon Rezchikov, William T. Freeman et al.

Inferring representations of 3D scenes from 2D observations is a fundamental problem of computer graphics, computer vision, and artificial intelligence. Emerging 3D-structured neural scene representations are a promising approach to 3D scene understanding. In this work, we propose a novel neural scene representation, Light Field Networks or LFNs, which represent both geometry and appearance of the underlying 3D scene in a 360-degree, four-dimensional light field parameterized via a neural implicit representation. Rendering a ray from an LFN requires only a single network evaluation, as opposed to hundreds of evaluations per ray for ray-marching or volumetric based renderers in 3D-structured neural scene representations. In the setting of simple scenes, we leverage meta-learning to learn a prior over LFNs that enables multi-view consistent light field reconstruction from as little as a single image observation. This results in dramatic reductions in time and memory complexity, and enables real-time rendering. The cost of storing a 360-degree light field via an LFN is two orders of magnitude lower than conventional methods such as the Lumigraph. Utilizing the analytical differentiability of neural implicit representations and a novel parameterization of light space, we further demonstrate the extraction of sparse depth maps from LFNs.

light field networks representation learning neural operators meta-learning prior inverse problems
Theoretical Physics Jun 1, 2021

Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators

Anindita Maiti, Keegan Stoner, James Halverson

Parameter-space and function-space provide two different duality frames in which to study neural networks. We demonstrate that symmetries of network densities may be determined via dual computations of network correlation functions, even when the density is unknown and the network is not equivariant. Symmetry-via-duality relies on invariance properties of the correlation functions, which stem from the choice of network parameter distributions. Input and output symmetries of neural network densities are determined, which recover known Gaussian process results in the infinite width limit. The mechanism may also be utilized to determine symmetries during training, when parameters are correlated, as well as symmetries of the Neural Tangent Kernel. We demonstrate that the amount of symmetry in the initialization density affects the accuracy of networks trained on Fashion-MNIST, and that symmetry breaking helps only when it is in the direction of ground truth.

symmetry preservation parameter-function duality quantum field theory network correlation functions group theory
Experimental Physics May 28, 2021

The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider

T. Aarrestad, M. van Beekveld, M. Bona et al.

We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 Billion simulated LHC events corresponding to $10~\rm{fb}^{-1}$ of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge.

anomaly detection new physics searches model-agnostic search collider physics variational autoencoders
Foundational AI May 21, 2021

Covariance-Free Sparse Bayesian Learning

Alexander Lin, Andrew H. Song, Berkin Bilgic et al.

Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.

bayesian inference covariance-free em sparse models uncertainty quantification posterior estimation
Theoretical Physics May 20, 2021

Preserving New Physics while Simultaneously Unfolding All Observables

Patrick Komiske, W. Patrick McCormack, Benjamin Nachman

Direct searches for new particles at colliders have traditionally been factorized into model proposals by theorists and model testing by experimentalists. With the recent advent of machine learning methods that allow for the simultaneous unfolding of all observables in a given phase space region, there is a new opportunity to blur these traditional boundaries by performing searches on unfolded data. This could facilitate a research program where data are explored in their natural high dimensionality with as little model bias as possible. We study how the information about physics beyond the Standard Model is preserved by full phase space unfolding using an important physics target at the Large Hadron Collider (LHC): exotic Higgs boson decays involving hadronic final states. We find that if the signal cross section is high enough, information about the new physics is visible in the unfolded data. We will show that in some cases, quantifiably all of the information about the new physics is encoded in the unfolded data. Finally, we show that there are still many cases when the unfolding does not work fully or precisely, such as when the signal cross section is small. This study will serve as an important benchmark for enhancing unfolding methods for the LHC and beyond.

unfolding new physics searches collider physics bsm unfolding fidelity classification
Experimental Physics May 4, 2021

A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

Giuseppe Di Guglielmo, Farah Fahim, Christian Herwig et al.

Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile. For our application, we consider the high-granularity calorimeter from the CMS experiment at the CERN Large Hadron Collider. The advantage of the machine learning approach is in the flexibility and configurability of the algorithm. By changing the neural network weights, a unique data compression algorithm can be deployed for each sensor in different detector regions, and changing detector or collider conditions. To meet area, performance, and power constraints, we perform a quantization-aware training to create an optimized neural network hardware implementation. The design is achieved through the use of high-level synthesis tools and the hls4ml framework, and was processed through synthesis and physical layout flows based on a LP CMOS 65 nm technology node. The flow anticipates 200 Mrad of ionizing radiation to select gates, and reports a total area of 3.6 mm^2 and consumes 95 mW of power. The simulated energy consumption per inference is 2.4 nJ. This is the first radiation tolerant on-detector ASIC implementation of a neural network that has been designed for particle physics applications.

autoencoders on-detector inference calorimetry trigger systems quantization-aware training
Foundational AI Apr 23, 2021

Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure

Samuel Kim, Peter Y. Lu, Charlotte Loh et al.

Bayesian optimization (BO) is a popular paradigm for global optimization of expensive black-box functions, but there are many domains where the function is not completely a black-box. The data may have some known structure (e.g. symmetries) and/or the data generation process may be a composite process that yields useful intermediate or auxiliary information in addition to the value of the optimization objective. However, surrogate models traditionally employed in BO, such as Gaussian Processes (GPs), scale poorly with dataset size and do not easily accommodate known structure. Instead, we use Bayesian neural networks, a class of scalable and flexible surrogate models with inductive biases, to extend BO to complex, structured problems with high dimensionality. We demonstrate BO on a number of realistic problems in physics and chemistry, including topology optimization of photonic crystal materials using convolutional neural networks, and chemical property optimization of molecules using graph neural networks. On these complex tasks, we show that neural networks often outperform GPs as surrogate models for BO in terms of both sampling efficiency and computational cost.

bayesian optimization surrogate modeling bayesian inference uncertainty quantification active learning
Astrophysics Apr 9, 2021

A Compound Poisson Generator approach to Point-Source Inference in Astrophysics

Gabriel H. Collin, Nicholas L. Rodd, Tyler Erjavec et al.

The identification and description of point sources is one of the oldest problems in astronomy; yet, even today the correct statistical treatment for point sources remains one of the field's hardest problems. For dim or crowded sources, likelihood based inference methods are required to estimate the uncertainty on the characteristics of the source population. In this work, a new parametric likelihood is constructed for this problem using Compound Poisson Generator (CPG) functionals which incorporate instrumental effects from first principles. We demonstrate that the CPG approach exhibits a number of advantages over Non-Poissonian Template Fitting (NPTF) - an existing method - in a series of test scenarios in the context of X-ray astronomy. These demonstrations show that the effect of the point-spread function, effective area, and choice of point-source spatial distribution cannot, generally, be factorised as they are in NPTF, while the new CPG construction is validated in these scenarios. Separately, an examination of the diffuse-flux emission limit is used to show that most simple choices of priors on the standard parameterisation of the population model can result in unexpected biases: when a model comprising both a point-source population and diffuse component is applied to this limit, nearly all observed flux will be assigned to either the population or to the diffuse component. A new parametrisation is presented for these priors which properly estimates the uncertainties in this limit. In this choice of priors, CPG correctly identifies that the fraction of flux assigned to the population model cannot be constrained by the data.

likelihood estimation compound poisson likelihood bayesian inference posterior estimation uncertainty quantification
Foundational AI Mar 31, 2021

Why is AI hard and Physics simple?

Daniel A. Roberts

We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theoretical physicists to work on AI as physicists. As a first step in that direction, we discuss an upcoming book on the principles of deep learning theory that attempts to realize this approach.

sparse models no-free-lunch theorem generalization interpretability deep learning theory
Astrophysics Mar 25, 2021

Machine Learning the 6th Dimension: Stellar Radial Velocities from 5D Phase-Space Correlations

Adriana Dropulic, Bryan Ostdiek, Laura J. Chang et al.

The Gaia satellite will observe the positions and velocities of over a billion Milky Way stars. In the early data releases, the majority of observed stars do not have complete 6D phase-space information. In this Letter, we demonstrate the ability to infer the missing line-of-sight velocities until more spectroscopic observations become available. We utilize a novel neural network architecture that, after being trained on a subset of data with complete phase-space information, takes in a star's 5D astrometry (angular coordinates, proper motions, and parallax) and outputs a predicted line-of-sight velocity with an associated uncertainty. Working with a mock Gaia catalog, we show that the network can successfully recover the distributions and correlations of each velocity component for stars that fall within ~5 kpc of the Sun. We also demonstrate that the network can accurately reconstruct the velocity distribution of a kinematic substructure in the stellar halo that is spatially uniform, even when it comprises a small fraction of the total star count.

stellar phase-space inference regression uncertainty quantification inverse problems density estimation
Theoretical Physics Mar 22, 2021

Modern Machine Learning and Particle Physics

Matthew D. Schwartz

Over the past five years, modern machine learning has been quietly revolutionizing particle physics. Old methodology is being outdated and entirely new ways of thinking about data are becoming commonplace. This article will review some aspects of the natural synergy between modern machine learning and particle physics, focusing on applications at the Large Hadron Collider. A sampling of examples is given, from signal/background discrimination tasks using supervised learning to direct data-driven approaches. Some comments on persistent challenges and possible future directions for the field are included at the end.

collider physics classification jet physics anomaly detection likelihood ratio
Foundational AI Mar 16, 2021

Deep learning: a statistical viewpoint

Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

benign overfitting overparametrization implicit regularization kernel methods regression
Experimental Physics Mar 9, 2021

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Farah Fahim, Benjamin Hawks, Christian Herwig et al.

Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.

scientific workflows fpga firmware synthesis quantization-aware training edge inference trigger systems
Astrophysics Mar 3, 2021

The Luminous and Double-Peaked Type Ic Supernova 2019stc: Evidence for Multiple Energy Sources

Sebastian Gomez, Edo Berger, Griffin Hosseinzadeh et al.

We present optical photometry and spectroscopy of SN\,2019stc (=ZTF19acbonaa), an unusual Type Ic supernova (SN Ic) at a redshift of $z=0.117$. SN\,2019stc exhibits a broad double-peaked light curve, with the first peak having an absolute magnitude of $M_r=-20.0$ mag, and the second peak, about 80 rest-frame days later, $M_r=-19.2$ mag. The total radiated energy is large, $E_{\rm rad}\approx 2.5\times 10^{50}$ erg. Despite its large luminosity, approaching those of Type I superluminous supernovae (SLSNe), SN\,2019stc exhibits a typical SN Ic spectrum, bridging the gap between SLSNe and SNe Ic. The spectra indicate the presence of Fe-peak elements, but modeling of the first light curve peak with radioactive heating alone leads to an unusually high nickel mass fraction of $f_{\rm Ni}\approx 31\%$ ($M_{\rm Ni}\approx 3.2$ M$_\odot$). Instead, if we model the first peak with a combined magnetar spin-down and radioactive heating model we find a better match with $M_{\rm ej}\approx 4$ M$_\odot$, a magnetar spin period of $P_{\rm spin}\approx 7.2$ ms and magnetic field of $B\approx 10^{14}$ G, and $f_{\rm Ni}\lesssim 0.2$ (consistent with SNe Ic). The prominent second peak cannot be naturally accommodated with radioactive heating or magnetar spin-down, but instead can be explained as circumstellar interaction with $\approx 0.7$ $M_\odot$ of hydrogen-free material located $\approx 400$ AU from the progenitor. Including the remnant mass leads to a CO core mass prior to explosion of $\approx 6.5$ M$_\odot$. The host galaxy has a metallicity of $\approx 0.26$ Z$_\odot$, low for SNe Ic but consistent with SLSNe. Overall, we find that SN\,2019stc is a transition object between normal SNe Ic and SLSNe.

supernova classification magnetar spin-down multi-peak light curves circumstellar interaction stellar evolution
Theoretical Physics Mar 3, 2021

Real-time lattice gauge theory actions: unitarity, convergence, and path integral contour deformations

Gurtej Kanwar, Michael L. Wagman

The Wilson action for Euclidean lattice gauge theory defines a positive-definite transfer matrix that corresponds to a unitary lattice gauge theory time-evolution operator if analytically continued to real time. Hoshina, Fujii, and Kikukawa (HFK) recently pointed out that applying the Wilson action discretization to continuum real-time gauge theory does not lead to this, or any other, unitary theory and proposed an alternate real-time lattice gauge theory action that does result in a unitary real-time transfer matrix. The character expansion defining the HFK action is divergent, and in this work we apply a path integral contour deformation to obtain a convergent representation for U(1) HFK path integrals suitable for numerical Monte Carlo calculations. We also introduce a class of real-time lattice gauge theory actions based on analytic continuation of the Euclidean heat-kernel action. Similar divergent sums are involved in defining these actions, but for one action in this class this divergence takes a particularly simple form, allowing construction of a path integral contour deformation that provides absolutely convergent representations for U(1) and SU(N) real-time lattice gauge theory path integrals. We perform proof-of-principle Monte Carlo calculations of real-time U(1) and SU(3) lattice gauge theory and verify that exact results for unitary time evolution of static quark-antiquark pairs in (1 + 1)D are reproduced.

lattice gauge theory path integral contour deformation monte carlo methods sign problem real-time transfer matrix
Foundational AI Feb 24, 2021

On the Minimal Error of Empirical Risk Minimization

Gil Kur, Alexander Rakhlin

We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression, both in the random and the fixed design settings. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data. In the fixed design setting, we show that the error is governed by the global complexity of the entire class. In contrast, in random design, ERM may only adapt to simpler models if the local neighborhoods around the regression function are nearly as complex as the class itself, a somewhat counter-intuitive conclusion. We provide sharp lower bounds for performance of ERM for both Donsker and non-Donsker classes. We also discuss our results through the lens of recent studies on interpolation in overparameterized models.

empirical risk minimization regression local complexity minimax lower bounds overparameterized models
Theoretical Physics Feb 16, 2021

Topological Obstructions to Autoencoding

Joshua Batson, C. Grace Haaf, Yonatan Kahn et al.

Autoencoders have been proposed as a powerful tool for model-independent anomaly detection in high-energy physics. The operating principle is that events which do not belong to the space of training data will be reconstructed poorly, thus flagging them as anomalies. We point out that in a variety of examples of interest, the connection between large reconstruction error and anomalies is not so clear. In particular, for data sets with nontrivial topology, there will always be points that erroneously seem anomalous due to global issues. Conversely, neural networks typically have an inductive bias or prior to locally interpolate such that undersampled or rare events may be reconstructed with small error, despite actually being the desired anomalies. Taken together, these facts are in tension with the simple picture of the autoencoder as an anomaly detector. Using a series of illustrative low-dimensional examples, we show explicitly how the intrinsic and extrinsic topology of the dataset affects the behavior of an autoencoder and how this topology is manifested in the latent space representation during training. We ground this analysis in the discussion of a mock "bump hunt" in which the autoencoder fails to identify an anomalous "signal" for reasons tied to the intrinsic topology of $n$-particle phase space.

autoencoders anomaly detection topological obstructions manifold learning phase space topology
Astrophysics Feb 13, 2021

On the convergence of group-sparse autoencoders

Emmanouil Theodosis, Bahareh Tolooshams, Pranay Tankala et al.

Recent approaches in the theoretical analysis of model-based deep learning architectures have studied the convergence of gradient descent in shallow ReLU networks that arise from generative models whose hidden layers are sparse. Motivated by the success of architectures that impose structured forms of sparsity, we introduce and study a group-sparse autoencoder that accounts for a variety of generative models, and utilizes a group-sparse ReLU activation function to force the non-zero units at a given layer to occur in blocks. For clustering models, inputs that result in the same group of active units belong to the same cluster. We proceed to analyze the gradient dynamics of a shallow instance of the proposed autoencoder, trained with data adhering to a group-sparse generative model. In this setting, we theoretically prove the convergence of the network parameters to a neighborhood of the generating matrix. We validate our model through numerical analysis and highlight the superior performance of networks with a group-sparse ReLU compared to networks that utilize traditional ReLUs, both in sparse coding and in parameter recovery tasks. We also provide real data experiments to corroborate the simulated results, and emphasize the clustering capabilities of structured sparsity models.

autoencoders sparse models group-sparse relu convergence analysis representation learning
Theoretical Physics Feb 8, 2021

Few-nucleon matrix elements in pionless effective field theory in a finite volume

W. Detmold, P. E. Shanahan

Pionless effective field theory in a finite volume (FVEFT$_{π\!/}$) is investigated as a framework for the analysis of multi-nucleon spectra and matrix elements calculated in lattice QCD (LQCD). By combining FVEFT$_{π\!/}$ with the stochastic variational method, the spectra of nuclei with atomic number $A\in\{2,3\}$ are matched to existing finite-volume LQCD calculations at heavier-than-physical quark masses corresponding to a pion mass $m_π=806$ MeV, thereby enabling infinite-volume binding energies to be determined using infinite-volume variational calculations. Based on the variational wavefunctions that are constructed in this approach, the finite-volume matrix elements of various local operators are computed in FVEFT$_{π\!/}$ and matched to LQCD calculations of the corresponding QCD operators in the same volume, thereby determining the relevant one and two-body EFT counterterms and enabling an extrapolation of the LQCD matrix elements to infinite volume. As examples, the scalar, tensor, and axial matrix elements are considered, as well as the magnetic moments and the isovector longitudinal momentum fraction.

finite-volume pionless eft effective field theory nuclear matrix elements lattice qcd finite-volume extrapolation
Theoretical Physics Jan 29, 2021

Path integral contour deformations for observables in $SU(N)$ gauge theory

William Detmold, Gurtej Kanwar, Henry Lamm et al.

Path integral contour deformations have been shown to mitigate sign and signal-to-noise problems associated with phase fluctuations in lattice field theories. We define a family of contour deformations applicable to $SU(N)$ lattice gauge theory that can reduce sign and signal-to-noise problems associated with complex actions and complex observables. For observables, these contours can be used to define deformed observables with identical expectation value but different variance. As a proof-of-principle, we apply machine learning techniques to optimize the deformed observables associated with Wilson loops in two dimensional $SU(2)$ and $SU(3)$ gauge theory. We study loops consisting of up to 64 plaquettes and achieve variance reduction of up to 4 orders of magnitude.

path integral contour deformations lattice gauge theory sign and signal-to-noise problems variance reduction monte carlo methods
Experimental Physics Jan 20, 2021

The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics

Gregor Kasieczka, Benjamin Nachman, David Shih et al.

A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders.

anomaly detection new physics searches collider physics model-agnostic search jet physics
Theoretical Physics Jan 20, 2021

Introduction to Normalizing Flows for Lattice Field Theory

Michael S. Albergo, Denis Boyda, Daniel C. Hackett et al.

This notebook tutorial demonstrates a method for sampling Boltzmann distributions of lattice field theories using a class of machine learning models known as normalizing flows. The ideas and approaches proposed in arXiv:1904.12072, arXiv:2002.02428, and arXiv:2003.06413 are reviewed and a concrete implementation of the framework is presented. We apply this framework to a lattice scalar field theory and to U(1) gauge theory, explicitly encoding gauge symmetries in the flow-based approach to the latter. This presentation is intended to be interactive and working with the attached Jupyter notebook is recommended.

normalizing flows lattice gauge theory equivariant neural networks monte carlo methods symmetry preservation
Experimental Physics Jan 18, 2021

E Pluribus Unum Ex Machina: Learning from Many Collider Events at Once

Benjamin Nachman, Jesse Thaler

There have been a number of recent proposals to enhance the performance of machine learning strategies for collider physics by combining many distinct events into a single ensemble feature. To evaluate the efficacy of these proposals, we study the connection between single-event classifiers and multi-event classifiers under the assumption that collider events are independent and identically distributed (IID). We show how one can build optimal multi-event classifiers from single-event classifiers, and we also show how to construct multi-event classifiers such that they produce optimal single-event classifiers. This is illustrated for a Gaussian example as well as for classification tasks relevant for searches and measurements at the Large Hadron Collider. We extend our discussion to regression tasks by showing how they can be phrased in terms of parametrized classifiers. Empirically, we find that training a single-event (per-instance) classifier is more effective than training a multi-event (per-ensemble) classifier, as least for the cases we studied, and we relate this fact to properties of the loss function gradient in the two cases. While we did not identify a clear benefit from using multi-event classifiers in the collider context, we speculate on the potential value of these methods in cases involving only approximate independence, as relevant for jet substructure studies.

collider physics classification likelihood ratio per-ensemble learning ensemble methods
Experimental Physics Jan 13, 2021

Fast convolutional neural networks on FPGAs with hls4ml

Thea Aarrestad, Vladimir Loncar, Nicolò Ghielmetti et al.

We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,μ$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.

convolutional networks fpga inference model compression trigger systems quantization-aware training
Astrophysics Dec 24, 2020

Detection and Parameter Estimation of Gravitational Waves from Binary Neutron-Star Mergers in Real LIGO Data using Deep Learning

Plamen G. Krastev, Kiranjyot Gill, V. Ashley Villar et al.

One of the key challenges of real-time detection and parameter estimation of gravitational waves from compact binary mergers is the computational cost of conventional matched-filtering and Bayesian inference approaches. In particular, the application of these methods to the full signal parameter space available to the gravitational-wave detectors, and/or real-time parameter estimation is computationally prohibitive. On the other hand, rapid detection and inference are critical for prompt follow-up of the electromagnetic and astro-particle counterparts accompanying important transients, such as binary neutron-star and black-hole neutron-star mergers. Training deep neural networks to identify specific signals and learn a computationally efficient representation of the mapping between gravitational-wave signals and their parameters allows both detection and inference to be done quickly and reliably, with high sensitivity and accuracy. In this work we apply a deep-learning approach to rapidly identify and characterize transient gravitational-wave signals from binary neutron-star mergers in real LIGO data. We show for the first time that artificial neural networks can promptly detect and characterize binary neutron star gravitational-wave signals in real LIGO data, and distinguish them from noise and signals from coalescing black-hole binaries. We illustrate this key result by demonstrating that our deep-learning framework classifies correctly all gravitational-wave events from the Gravitational-Wave Transient Catalog, GWTC-1 [Phys. Rev. X 9 (2019), 031040]. These results emphasize the importance of using realistic gravitational-wave detector data in machine learning approaches, and represent a step towards achieving real-time detection and inference of gravitational waves.

gravitational waves convolutional networks signal detection classification real-time gravitational-wave inference
Foundational AI Nov 27, 2020

Field of Junctions: Extracting Boundary Structure at Low SNR

Dor Verbin, Todd Zickler

We introduce a bottom-up model for simultaneously finding many boundary elements in an image, including contours, corners and junctions. The model explains boundary shape in each small patch using a 'generalized M-junction' comprising M angles and a freely-moving vertex. Images are analyzed using non-convex optimization to cooperatively find M+2 junction values at every location, with spatial consistency being enforced by a novel regularizer that reduces curvature while preserving corners and junctions. The resulting 'field of junctions' is simultaneously a contour detector, corner/junction detector, and boundary-aware smoothing of regional appearance. Notably, its unified analysis of contours, corners, junctions and uniform regions allows it to succeed at high noise levels, where other methods for segmentation and boundary detection fail.

boundary detection junction modeling robustness loss function design curvature regularization
Experimental Physics Nov 6, 2020

Quasi Anomalous Knowledge: Searching for new physics with embedded knowledge

Sang Eon Park, Dylan Rankin, Silviu-Marian Udrescu et al.

Discoveries of new phenomena often involve a dedicated search for a hypothetical physics signature. Recently, novel deep learning techniques have emerged for anomaly detection in the absence of a signal prior. However, by ignoring signal priors, the sensitivity of these approaches is significantly reduced. We present a new strategy dubbed Quasi Anomalous Knowledge (QUAK), whereby we introduce alternative signal priors that capture some of the salient features of new physics signatures, allowing for the recovery of sensitivity even when the alternative signal is incorrect. This approach can be applied to a broad range of physics models and neural network architectures. In this paper, we apply QUAK to anomaly detection of new physics events at the CERN Large Hadron Collider utilizing variational autoencoders with normalizing flow.

anomaly detection new physics searches signal priors variational autoencoders normalizing flows
Theoretical Physics Oct 28, 2020

Learning to Unknot

Sergei Gukov, James Halverson, Fabian Ruehle et al.

We introduce natural language processing into the study of knot theory, as made natural by the braid word representation of knots. We study the UNKNOT problem of determining whether or not a given knot is the unknot. After describing an algorithm to randomly generate $N$-crossing braids and their knot closures and discussing the induced prior on the distribution of knots, we apply binary classification to the UNKNOT decision problem. We find that the Reformer and shared-QK Transformer network architectures outperform fully-connected networks, though all perform well. Perhaps surprisingly, we find that accuracy increases with the length of the braid word, and that the networks learn a direct correlation between the confidence of their predictions and the degree of the Jones polynomial. Finally, we utilize reinforcement learning (RL) to find sequences of Markov moves and braid relations that simplify knots and can identify unknots by explicitly giving the sequence of unknotting actions. Trust region policy optimization (TRPO) performs consistently well for a wide range of crossing numbers and thoroughly outperformed other RL algorithms and random walkers. Studying these actions, we find that braid relations are more useful in simplifying to the unknot than one of the Markov moves.

knot theory braid word representation reinforcement learning transformers classification
Experimental Physics Oct 22, 2020

Mapping Machine-Learned Physics into a Human-Readable Space

Taylor Faucett, Jesse Thaler, Daniel Whiteson

We present a technique for translating a black-box machine-learned classifier operating on a high-dimensional input space into a small set of human-interpretable observables that can be combined to make the same classification decisions. We iteratively select these observables from a large space of high-level discriminants by finding those with the highest decision similarity relative to the black box, quantified via a metric we introduce that evaluates the relative ordering of pairs of inputs. Successive iterations focus only on the subset of input pairs that are misordered by the current set of observables. This method enables simplification of the machine-learning strategy, interpretation of the results in terms of well-understood physical concepts, validation of the physical model, and the potential for new insights into the nature of the problem itself. As a demonstration, we apply our approach to the benchmark task of jet classification in collider physics, where a convolutional neural network acting on calorimeter jet images outperforms a set of six well-known jet substructure observables. Our method maps the convolutional neural network into a set of observables called energy flow polynomials, and it closes the performance gap by identifying a class of observables with an interesting physical interpretation that has been previously overlooked in the jet substructure literature.

interpretability decision ordering metric jet physics convolutional networks energy flow polynomials
Experimental Physics Oct 19, 2020

Enhancing searches for resonances with machine learning and moment decomposition

Ouail Kitouni, Benjamin Nachman, Constantin Weisser et al.

A key challenge in searches for resonant new physics is that classifiers trained to enhance potential signals must not induce localized structures. Such structures could result in a false signal when the background is estimated from data using sideband methods. A variety of techniques have been developed to construct classifiers which are independent from the resonant feature (often a mass). Such strategies are sufficient to avoid localized structures, but are not necessary. We develop a new set of tools using a novel moment loss function (Moment Decomposition or MoDe) which relax the assumption of independence without creating structures in the background. By allowing classifiers to be more flexible, we enhance the sensitivity to new physics without compromising the fidelity of the background estimation.

moment decomposition loss function design new physics searches background sculpting classification
Theoretical Physics Oct 1, 2020

Elliptic stable envelopes and hypertoric loop spaces

Michael McBreen, Artan Sheshmani, Shing-Tung Yau

This paper relates the elliptic stable envelopes of a hypertoric variety X with the K-theoretic stable envelopes of the loop hypertoric space, ℒ˜X. It thus points to a possible categorification of elliptic stable envelopes.

elliptic stable envelopes hypertoric geometry k-theoretic stable envelopes loop space categorification
Foundational AI Apr 9, 2020

Twisted Quasimaps and Symplectic Duality for Hypertoric Spaces

Michael McBreen, Artan Sheshmani, Shing-Tung Yau

We study moduli spaces of twisted quasimaps to a hypertoric variety $X$, arising as the Higgs branch of an abelian supersymmetric gauge theory in three dimensions. These parametrise general quiver representations whose building blocks are maps between rank one sheaves on $\mathbb{P}^1$, subject to a stability condition, associated to the quiver, involving both the sheaves and the maps. We show that the singular cohomology of these moduli spaces is naturally identified with the Ext group of a pair of holonomic modules over the `quantized loop space' of $X$, which we view as a Higgs branch for a related theory with infinitely many matter fields. We construct the coulomb branch of this theory, and find that it is a periodic analogue of the coulomb branch associated to $X$. Using the formalism of symplectic duality, we derive an expression for the generating function of twisted quasimap invariants in terms of the character of a certain tilting module on the periodic coulomb branch. We give a closed formula for this generating function when $X$ arises as the abelianisation of the $N$-step flag quiver.

symplectic duality hypertoric varieties quiver representations quantum field theory tilting modules
Experimental Physics Sep 9, 2019

Search for low mass vector resonances decaying into quark-antiquark pairs in proton-proton collisions at $\sqrt{s} =$ 13 TeV

CMS Collaboration

A search for low mass narrow vector resonances decaying into quark-antiquark pairs is presented. The analysis is based on data collected in 2017 with the CMS detector at the LHC in proton-proton collisions at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 41.1 fb$^{-1}$. The results of this analysis are combined with those of an earlier analysis based on data collected at the same collision energy in 2016, corresponding to 35.9 fb$^{-1}$. Signal candidates will be recoiling against initial state radiation and are identified as energetic, large-radius jets with two pronged substructure. The invariant jet mass spectrum is probed for a potential narrow peaking signal over a smoothly falling background. No evidence for such resonances is observed within the mass range of 50-450 GeV. Upper limits at the 95% confidence level are set on the coupling of narrow resonances to quarks, as a function of the resonance mass. For masses between 50 and 300 GeV these are the most sensitive limits to date. This analysis extends the earlier search to a mass range of 300-450 GeV, which is probed for the first time with jet substructure techniques.

collider physics jet physics jet substructure new physics searches signal detection
Theoretical Physics Aug 1, 2019

Strictification and gluing of Lagrangian distributions on derived schemes with shifted symplectic forms

Dennis Borisov, Ludmil Katzarkov, Artan Sheshmani et al.

A strictification result is proved for isotropic distributions on derived schemes equipped with negatively shifted homotopically closed $2$-forms. It is shown that any derived scheme over $\mathbb{C}$ equipped with a $-2$-shifted symplectic structure, and having a Hausdorff space of classical points, admits a globally defined Lagrangian distribution as a dg $\mathbb{C}^{\infty}$-manifold.

shifted symplectic structures derived algebraic geometry lagrangian methods moduli spaces of sheaves group theory
Foundational AI May 27, 2019

AI Feynman: a Physics-Inspired Method for Symbolic Regression

Silviu-Marian Udrescu, Max Tegmark

A core challenge for both physics and artificial intellicence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult test set, we improve the state of the art success rate from 15% to 90%.

regression symmetry preservation separability detection physics-informed neural networks automated discovery