Note: This indico page is used for handling abstract submission. To register for the conference please go to the EuCAIFCon24 registration website.
Conference start: Tuesday, 30 April, 9:00 am
Conference end: Friday, 3 May, 1:00 pm
The first “European AI for Fundamental Physics Conference” (EuCAIFCon) will be held in Amsterdam, from 30 April to 3 May 2024. The aim of this event is to provide a platform for establishing new connections between AI activities across various branches of fundamental physics, by bringing together researchers that face similar challenges and/or use similar AI solutions. The conference will be organized “horizontally”: sessions are centered on specific AI methods and themes, while being cross-disciplinary regarding the scientific questions.
EuCAIFCon is organized by EuCAIF, and hosted by the University of Amsterdam, Nikhef and Radboud University. EuCAIF is a new European initiative for advancing the use of Artificial Intelligence (AI) in Fundamental Physics. Members are working on particle physics, astroparticle physics, nuclear physics, gravitational wave physics, cosmology, theoretical physics as well as simulation and computational infrastructure.
Following EuCAIF activities
If you like to follow the activities of EuCAIF please join the following e-group: eucaif-info@cern.ch
How? If you would like to apply for membership of a CERN e-group, visit http://cern.ch/egroups and search for the e-group (e.g. eucaif-info) you would like to join.
Important dates
4 December 2023 | Abstract submission open |
26 January 2024 | Extended abstract submission deadline (23:59 CET) (previously 15 Jan) |
7 February 2024 | Anticipated abstract acceptance notification (postponed from 31 Jan 2024) |
16 January 2024 | Registration open (postponed from 4 Jan 2024) |
29 February 2024 | Early bird registration closed |
14 April 2024 | Registration closed (please contact us if you just missed the deadline) |
30 April - 3 May 2024 | Conference |
Note: We are happy to also advertise the AHEAD GW meeting on Monday, 29 April, which is orgasied adjecant to EuCAIFCon24.
The event is supported by
We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification.
Detected Gravitational Waves are goldmines of information on the compact binary emitting systems. Usually MCMC techniques infer parameter's values in a 15-dimensional parameter space in an accurate way, but they are very lengthy. On the other hand, Physics-Informed Neural Networks (PINNs) are a rapidly emerging branch of Supervised Machine Learning, devoted precisely to solve physical problems. This talk will discuss how PINNs can help to speed up the inference process and how this new ML approach can be applied in current (LIGO-Virgo-KAGRA) and future Gravitational Wave experiments (ex. Einstein Telescope).
Generative models, particularly normalizing flows, have recently been proposed to speed up lattice field theory sample generation. We have explored the role symmetry considerations and ML concepts like transfer learning may have, by applying novel continuous normalizing flows to a scalar field theory. Beyond that, interesting connections exist between renormalization group theory and generative models, as pointed out in recent papers, which should be further explored.
Nested sampling has become an important tool for inference in astronomical data analysis. However, it is often computationally expensive to run. This poses a challenge for certain applications, such as gravitational-wave inference. To address this, we previously introduced nessai, a nested sampling algorithm that incorporates normalizing flows to accelerate gravitational-wave inference by up to a factor of four compared to our baseline. However, we showed that it was limited by the underlying nested sampling algorithm.
In this talk, we present an improved version of nessai, called i-nessai, that addresses the main bottlenecks. To achieve this, we design a modified nested sampling algorithm based on importance nested sampling that tailors specifically to normalizing flows. We demonstrate that this approach eliminates the aforementioned bottlenecks and is an order of magnitude faster than our baseline.
This presentation will highlight the impactful role of machine learning (ML) in high energy nuclear physics, particularly in studying QCD matter under extreme conditions. The presentation will focus on three key applications: analyzing heavy ion collisions, reconstructing neutron star Equation of State (EoS), and advancing lattice field theory studies.
In heavy ion collisions, ML techniques are crucial for deciphering complex data patterns, offering deeper insights into QCD matter properties. For neutron star EoS reconstruction, ML assists in reversing the TOV equation, revealing the nature of dense matter in these stars. In lattice field theory, ML transforms computational approaches, especially in inverse problem solving and generative tasks, fostering new understanding and efficiencies.
The talk will discuss the blend of data-driven and physics-informed approaches in ML, addressing challenges and future directions in this interdisciplinary field. The aim is to showcase ML as a vital tool for progressing high energy nuclear physics research.
Recently, machine learning has become a popular tool in lattice field theory. Here I will report on some applications of (lattice) field theory methods to further understand ML, illustrated using the Restricted Boltzmann Machine and stochastic quantisation as simple examples.
We propose a quantum version of a generative diffusion model. In this algorithm, artificial neural networks are replaced with parameterized quantum circuits, in order to directly generate quantum states. We present both a full quantum and a latent quantum version of the algorithm; we also present a conditioned version of these models. The models' performances have been evaluated using quantitative metrics complemented by qualitative assessments. An implementation of a simplified version of the algorithm has been executed on real NISQ quantum hardware.
Traditionally, machine-learning methods have mostly focused on making predictions without providing explicit probability distributions. However, the importance of predicting probability distributions lies in its understanding of the model’s level of confidence and the range of potential outcomes. Unlike point estimates, which offer a single value, probability distributions offer a range of potential outcomes and their likelihoods. MDNs enable capturing complex and multi-modal distributions. Unfortunately, the modeling process encounters challenges in N-dimensional spaces due to the introduction of covariance parameters.
Normalizing flows (NF) are generative models that learn to transform a simple probability distribution into a more complex one through a series of invertible mappings. Conditional normalizing flows (cNF) are an extension of NF that incorporate conditioning variables, which acting as inputs guide the generation of the probability distribution in a context-specific manner. While the conventional use of NF and cNF primary focus on generating data samples, an emerging trend employs them as a probabilistic framework for regression problems. Therefore, cNF have emerged as a valuable option for addressing probability and covariance estimation challenges in N-dimensional spaces.
Commonly, in any observational field, the data reduction pre-processing and the prediction of physical quantities like the redshift are two independent steps of the data analysis. In the case of astronomy, the information from the observed images is compressed into photometric catalogs, which contain the light captured by the telescope using different photometric filters. Unfortunately, this approach results in the loss of valuable details from the image, which can significantly impact the accuracy of subsequent inferences. Moreover, errors introduced at each stage of the image processing propagate in subsequent estimations, compromising the quality of derived quantities like the photometric redshift.
In this project, we are developing a cNF to predict the photometry and the photometric redshift of a galaxy directly from its image observations. The NF learns the transformations from a simple N-dimensional Gaussian distribution to the joint probability distribution of the photometry and the photometric redshift conditioned on the observed images. By applying the cNF directly to the images, we can bypass the data pre-processing step and reduce the probability of introducing errors. This allows the network to learn the mapping between the images and the photometry and photometric redshift directly without the need for intermediate steps and data compression that introduce noise or biases. Furthermore, the cNF can use the full information content of the images to make predictions.
Moreover, by predicting both the photometry and the photometric redshift simultaneously, the cNF benefits from multi-task learning. This approach involves training a model to perform multiple related tasks simultaneously, allowing it to learn shared representations that benefit all tasks. Our unified model enables the extraction of common data representations, which enhance the overall performance by exploiting the synergies between the photometry and redshift.
The integration of data reduction with the prediction of derived quantities holds significant relevance across all observational fields. Using a cNF architecture enables the entire process to be addressed from a probabilistic approach. Moreover, our end-to-end approach avoids the need for data compression, preserving the integrity of information. This accelerates the transformation of raw observations into scientific measurements, minimizing the risk of errors introduced during manual processing.
Off-shell effects in large LHC backgrounds are crucial for precision predictions and, at the same time, challenging to simulate. We show how a generative diffusion network learns off-shell kinematics given the much simpler on-shell process. It generates off-shell configurations fast and precisely, while reproducing even challenging on-shell features.
New radio telescopes, such as the SKA, will revolutionise our understanding of the Universe. They can detect the faintest distant galaxies and provide high-resolution observations of nearby galaxies. This allows for detailed statistical studies and insights into the formation and evolution of galaxies across cosmic time. These telescopes also play a crucial role in unravelling the physical processes of the early Universe before galaxies even formed. By tracing the hydrogen 21cm emission line in particular, we can see into the Dark Ages, an epoch of the Universe dominated by neutral hydrogen and depleted of light.
In this presentation, we explore the challenges posed by new radio surveys, particularly those arising from high-resolution images of millions of galaxies with increasingly complex radio structures. These galaxies often require visual inspection, but the sheer volume of data from these surveys exceeds the capacity for manual analysis. We are using machine learning algorithms and neural networks to more efficiently process data from the LoTSS survey conducted by the LOFAR telescope, a SKA pathfinder. We are also looking ahead and preparing for future challenges. In order to measure the large-scale distribution of neutral hydrogen in the universe, we are using simulated cosmological volumes of the 21cm line emission and exploring the novel line intensity mapping technique, which promises to be the next generation of cosmological and astrophysical probe. We are using diffusion networks to create samples of 21-cm maps. We will present our preliminary results and offer a glimpse into the potential of the synergy between these advanced techniques.
We investigate the possibility to apply quantum machine learning techniques for data analysis, with particular regard to an interesting use-case in high-energy physics. We propose an anomaly detection algorithm based on a parametrized quantum circuit. This algorithm was trained on a classical computer and tested with simulations as well as on real quantum hardware. Tests on NISQ devices were performed with IBM quantum computers. For the execution on quantum hardware, specific hardware-driven adaptations were devised and implemented. The quantum anomaly detection algorithm was able to detect simple anomalies such as different characters in handwritten digits as well as more complex structures such as anomalous patterns in the particle detectors produced by the decay products of long-lived particles produced at a collider experiment. For the high-energy physics application, the performance was estimated in simulation only, as the quantum circuit was not simple enough to be executed on the available quantum hardware platform. This work demonstrates that it is possible to perform anomaly detection with quantum algorithms; however, as an amplitude encoding of classical data is required for the task, due to the noise level in the available quantum hardware platform, the current implementation cannot outperform classic anomaly detection algorithms based on deep neural networks.
Estimating unknown parameters of open quantum systems is an important task that is common to many branches of quantum technologies, from metrology to computing. When open quantum systems are monitored and a signal is continuously acquired, this signal can be used to efficiently extract information about the interactions in the system. Previous works have demonstrated a Bayesian framework for the inference of the parameters of a model Hamiltonian for the monitored system, where the posterior distribution over the unknown parameters can be obtained from the data signal. While this Bayesian framework is optimal in the sense of information retrieval, it can be numerically expensive and it relies on the modeling assumptions entering the definition of the likelihood function. In this paper, we introduce a fast and reliable inference method based on artificial neural networks. We compare this new parameter inference method with the Bayesian framework with extensive numerical experiments on a two-level quantum system. The precision of artificial neural networks is comparable to Bayesian posterior estimation, while being computationally much cheaper in inference (after a training phase).
Reference: A preprint of this work is available at this arxiv link
We currently find ourselves in the era of noisy intermediate-scale quantum (NISQ) computing, where quantum computing applications are limited yet promising. In this work I will overview two algorithms for computing the ground state and dynamics of the transverse field Ising model as a testbed for more complex models. The Variational Quantum Eigensolver (VQE) algorithm leverages quantum circuits to offload the task of exploring an exponentially large Hilbert space to a quantum system (that naturally lives in this space). Conversely, I will show how a classical algorithm, Variational Monte Carlo (VMC), can achieve similar results by modeling the wavefunction as a Restricted Boltzmann Machine (RBM), without the need for quantum computing resources. To conclude, further work will be presented to explore, benchmark and leverage both quantum and classical machine-learned representation of quantum states.
Tracking charged particles in high-energy physics experiments is one of the most computationally demanding steps in the data analysis pipeline.
As we approach the High Luminosity LHC era, with an estimate increase in the number of proton-proton interactions per beam collision by a factor 3-5 (from 50 to 140-200 primary interactions per collision on average), particle tracking will become even more problematic due to the massive increment in the volume of data to be analysed.
Currently the problem is being tackled using various algorithmic approaches. The best classical algorithms are local and scale worse than quadratically with the number of particle hits in the detector layers. Promising results are coming from global approaches. In particular, among these, we are investigating the possibility of using machine learning techniques in combination with quantum computing.
In our work we represent charged particle tracks as a graph data structure, and we train a hybrid graph neural network composed of classical and quantum layers. We report recent results on the use of these technologies, with emphasis on the computational aspects involved in the code development within different programming frameworks, such as Jax, Pennylane and IBM Qiskit.
We give an outlook on the expected performance in terms of accuracy and efficiency, and also characterise the role of GPUs as computational accelerators in the emulation of quantum computing resources.
Tensor Networks (TNs) is a computational paradigm used for representing quantum many-body systems. Recent works show how TNs can be applied to perform Machine Learning (ML) tasks, yielding comparable results to standard supervised learning techniques. In particular [1] leveraged Tree Tensor Networks (TTNs) to achieve the classification of particle flavor state in the context of High Energy Physics.
In this work, we want to analyze the use of TTNs in high-frequency real-time applications like online trigger systems. Indeed, TTN-based algorithms can be deployed in online trigger boards, by exploiting low latency hardware like FPGA. Besides, FPGAs are known to be suitable for inherently concurrent tasks like matrix multiplications. When implementing biologically inspired neural network on FPGA the goal is to keep the design as small as possible to cope with the resource limitations. Pruning is the primary technique adopted for removing parameters that do not substantially contribute to the performance of the ML task. On the other hand, quantum features of the TTN like quantum correlations or entanglement entropy can be mapped to properties of the network that can help in the pruning process identifying unnecessary features or nodes [2]. This makes TTNs good candidates for efficient hardware implementation.
We will show different implementations of a TTNs classifier on FPGA capable of performing inference on a classical dataset used for ML benchmarking. A preparatory analysis of bond dimensions, features ordering, and weight quantization, done in the training phase, will lead to the choice of the TTN architecture. The generated TTNs will be deployed on hardware accelerator. Using an FPGA integrated in a server we will completely offload the inference of the TTN. Finally, a projection of the needed resources for the hardware implementation of a classifier for the application in HEP will be provided by comparing how different degrees of parallelism obtained in hardware affect physical resources and latency.
[1] Timo Felser et al. “Quantum-inspired machine learning on high-energy physics data”. In: npj Quantum Information (2021).
[2] Yoav Levine et al. “Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design”. In: International Conference on Learning Representations (2018)
The Large-Sized Telescope (LST) is one of three telescope types being built as part of the Cherenkov Telescope Array Observatory (CTAO) to cover the lower energy range between 20 GeV and 200 GeV. The Large-Sized Telescope prototype (LST-1), installed at the La Palma Observatory Roque de Los Muchachos, is currently being commissioned and has successfully taken data since November 2019. The construction of three more LSTs at La Palma is underway. A next generation camera that can be used in future LSTs is being designed. One of the main challenges for the advanced camera is the 1GHz sampling rate baseline that produces 72 Tbps of data. After filtering events resulting from random coincidences of background light sources (night sky background, star light, sensor noise), the data rate must be brought down to 24 Gbps, corresponding to an event rate of about 30 kHz. At this stage, a software stereo trigger featuring deep learning inference running on a high-speed FPGA will lower the final event rate to < 10 kHz.
To achieve such a large reduction, several trigger levels are being designed and will be implemented in FPGA. The final trigger stage is a real-time deep learning algorithm currently under development. In this talk, we will focus on porting this algorithm to FPGAs using two different approaches: the Intel AI Suite and the hls4ml packages. Then we compare the performance obtained in FPGAs against running it in GPUs.
Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are in development for the upgraded LHC detectors, we have developed a multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability. We present the developed methodologies and the results of the implementation in a working engineering pipeline used as pre-processing stage to high level synthesis tools. We show how it is possible to build ultra-light deep neural networks in practice, by applying the method to a realistic HEP use-case: a toy simulation of one of the triggers planned for the HL-LHC. Moreover we explored an array of xAI methods based on different approaches, and we tested their capabilities in the HEP use-case., and as a result, we obtained an array of potentially easy-to-understand and human-readable explanations of models’ predictions, describing for each of them strengths and drawbacks in this particular scenario, providing an interesting atlas on the convergent application of multiple xAI algorithms in a realistic context.
Track finding in high-density environments is a key challenge for experiments at modern accelerators. In this presentation we describe the performance obtained running machine learning models for a typical Muon High Level Trigger at LHC experiments. These models are designed for hit position reconstruction and track pattern recognition with a tracking detector, on a commercially available Xilinx FPGA: Alveo U50, Alveo U250, and Versal VCK5000. We compare the inference times obtained on a CPU, on a GPU and on the FPGA cards. These tests are done using TensorFlow libraries as well as the TensorRT framework, and software frameworks for AI-based applications acceleration. The inference times obtained are compared to the needs of present and future experiments at LHC.
In the realm of high-energy physics, the advent of machine learning has revolutionized data analysis, especially in managing the vast volumes of data produced by particle detectors.
Facing the challenge of analyzing unlabelled, high-volume detector data, advanced machine learning solutions become indispensable.
Our research introduces a machine learning approach that effectively bridges the gap between simulated training data and real-world detector data.
Anchored in domain adaptation principles, our technique uniquely leverages both simulated data (with known signal/background distinctions) and real-world data, thereby enhancing model accuracy and applicability.
Central to our methodology is the use of a low-memory, high-performance stochastic binary neural network.
This network is specifically designed for implementation on Field-Programmable Gate Arrays (FPGAs), which offers the dual advantages of high-speed data processing and adaptability, essential for real-time physics data analysis.
Our results not only demonstrate the theoretical robustness of our model but also its practical efficacy, highlighted by significant improvements in accuracy and throughput in a high-energy physics case study -- Flavours of Physics: Finding $\tau \to \mu \mu \mu$ [1].
The FPGA implementation underscores our model's potential in delivering real-time, efficient data processing solutions in physics research, paving the way for new advancements in the field.
[1} kaggle. Flavours of Physics: Finding $\tau \to \mu \mu \mu$. https://www.kaggle.com/c/flavours-of-physics/overview
Abstract: Large-scale physics experiments generating high data rates impose significant demands on the data acquisition system (DAQ). The Deep Underground Neutrino Experiment (DUNE) is a next-generation experiment for neutrino science at the Fermi National Accelerator Laboratory in Batavia, Illinois. It will consist of a massive detector operating continually for over a decade, resulting in several TB/s of data. This data must be processed in real time to detect rare events of interest and stored for further offline processing to prevent the need for otherwise extensive and expensive offline data processing. Accordingly, we designed convolutional neural networks (CNNs) capable of detecting these rare events with 90.69% efficiency and rejecting background noise with an efficiency of 99.80%; thereby demonstrating the viability of CNN-based algorithms for this use case. Deployment of such machine learning models on hardware has been made easy with modern tools like hl4ml and HLS. The deployment of this model on Xilinx alveo-u250 accelerator card has shown promising performance while meeting resource budget and latency targets by a large margin. This illustrates the practicality of deploying real-time AI on FPGAs for this application, with the potential for expanding the model to achieve classification of a broader set of event topologies with higher precision.
Recent years have shown that more and more tasks can be effectively aided by AI. Often supervised learning methods, which are based on labelled data, lead to excellent results. Artificial neural networks, that were trained on this data, allow to make accurate predictions, also for cases, that were not explicitly covered by the training data potentially leading to a more optimal solution for a problem. This comes at the cost of generating a large dataset for the training, which often becomes the bottleneck of this method. However, with the radically decreasing simulation time needed to perform Finite Element Method simulations of vector fields — such as magnetic fields — it now becomes feasible to generate vast datasets within a reasonable amount of time. This development now allows engineers to use supervised learning techniques to aid them in the initial design phase of magnets. We introduce a method for optimising the design parameters of magnets using Deep Neural Networks and showcase it with an example.
The lack of new physics discoveries at the LHC calls for an effort to to go beyond model-driven analyses. In this talk I will present the New Physics Learning Machine, a methodology powered by machine learning to perform a signal-agnostic and multivariate likelihood ratio test (arXiv:2305.14137). I will focus on an implementation based on kernel methods, which is efficient and scalable while maintaining high flexibility (arXiv:2204.02317). I will present recent results on model selection and multiple testing for improved chance of detection, as well as applications to model-independent searches of new physics, online data quality monitoring (arXiv:2303.05413), and the evaluation of simulators and generative models.
The early inspiral from stellar-mass black hole binaries can emit milli-Hertz gravitational wave signals, making them detectable sources for space-borne gravitational wave missions like TianQin. However, the traditional matched filtering technique poses a significant challenge for analyzing these kinds of signals, as it requires an impractically high number of templates ranging from 10^31 to 10^40.
Our proposed search strategy comprises two key elements: firstly, we employ incremental principal component analysis (IPCA) to reduce the dimensionality of simulated signals. Subsequently, we analyze the data using convolutional neural networks (CNN).
The trained IPCA model demonstrates high compression efficiency, achieving a cumulative variance ratio of 95.6% when applied to 10^6 simulated sBBH signals.
To evaluate the CNN detection model, we generate a receiver operating characteristic curve using test signals with varying signal-to-noise ratios. At a false alarm rate of 0.01, the corresponding true alarm rate for signals with a signal-to-noise ratio of 50 is 87.2%.
Due to poor observational constraints on the low-mass end of the subhalo mass function, the detection of dark matter (DM) subhalos lacking a visible counterpart on sub-galactic scales would provide valuable information about the nature of DM. Novel indirect probes for DM substructure within the Milky Way (MW) are stellar wakes, which are perturbations of the stellar medium induced by DM subhalos and which encode information about the mass properties of the perturber. The dramatic increase in high-precision observations from current and future stellar surveys of our Galaxy (e.g. Gaia satellite) encourages the gravitational detection of these low-mass subhalos using deep learning techniques. As these methods have already been effective in unravelling the stellar substructure of the MW, we now employ them on MW-like galaxy simulations to explore the Galactic dark substructure. Motivated by the above, our work estimates the feasibility of using supervised and unsupervised deep learning methods on simulations and synthetic Gaia observations to detect disturbances in the stellar phase-space induced by orbiting DM subhalos. Furthermore, we expand on the above with findings from an ongoing study of stellar wakes in windtunnel-like N-body simulations, where we investigate the detectability of the wakes as a function of Galactocentric radius and subhalo mass.
As a new era of gravitational wave detections rapidly unfolds, the importance of having accurate models for their signals becomes increasingly important.
The best model for gravitational waves are the fully-fledged simulations of General Relativity, although their daunting cost make it prohibitive to perform data analysis. To alleviate this, the community has developed a variety of approximate models, which upon calibration from the detailed simulations are accurate and fast to evaluate.
This program requires the exploration of a large and complex parameter space with expensive simulations. We will argue that Active Learning, a data-driven strategy to explore parameter space with costly experiments, is particularly relevant in this scenario reducing computational cost, time and human bias.
This talk will be partly based on https://arxiv.org/abs/2311.11311
The formation mechanism of supermassive black holes is yet unknown, despite their presence in nearly every galaxy, including the Milky Way. As stellar evolution predicts that stars cannot collapse to black holes $\gtrsim 50 - 130\, \text{M}_{\odot}$ due to pair-instability, plausible formation mechanisms include the hierarchical mergers of intermediate-mass black holes (IMBHs). The direct observation of IMBH populations would not only strengthen the possible evolutionary link between stellar and supermassive black holes, but unveil the details of the pair-instability mechanism and elucidate their influence in galaxy formation. Conclusive observation of IMBHs remained elusive until the detection of gravitational-wave (GW) signal GW190521, which lies with high confidence in the mass gap predicted by the pair-instability mechanism. Despite falling in the sensitivity band of current GW detectors, IMBH searches are challenging due to their similarity to transient bursts of detector noise, known as glitches.
In this work, we enhance a matched filtering algorithm using Machine Learning. In particular, we employ a multi-layer perceptron network that targets IMBHs, distinguishing them from glitches in real single-detector data from the third observing run. Our algorithm successfully recovers over $90\%$ of simulated IMBH signals in O3a and over $70\%$ in O3b. Furthermore, we detect GW190521, GW190403_051519, GW190426_190642 and GW190909_11414 with high confidence.
We introduce an innovative approach to combinatorial optimization problems through Physics-Informed Graph Neural Networks (GNNs). We combine the structural advantages of GNNs with physics-based algorithms, enhancing solution accuracy and computational efficiency. With respect to available literature we were able to design and train a deep graph neural network model able to solve the graph colouring problem in an unsupervised way. Our method shows promising results, demonstrating the potential of merging domain-specific knowledge with machine learning, and opening possibile pathways in computational optimization problems of interest in both theoretical and experimental fundamental physics.
The ongoing search for physics beyond the Standard Model imposes a growing demand for highly sensitive anomaly detection methods. Various approaches to anomaly detection exist, and prominent techniques include semi-supervised and unsupervised training of neural networks. While semi-supervised approaches often require sophisticated methods for precise background estimation, unsupervised methods can have sub-optimal signal sensitivity if they do not use specific information on the new signals. We propose an innovative hybrid approach leveraging unsupervised learning for detailed data-driven background estimation of a signal-sensitive region in conjunction with a semi-supervised classification technique for optimized signal sensitivity. The background estimation uses two simultaneously trained and decorrelated autoencoders with an auxiliary network, which enables detailed background estimation via likelihood ratio estimation. The classification technique uses the Classification WithOut LAbels (CWOLA) method on the estimated distributions. We will present the new method, show its performance on the LHCO2020 dataset, and embed the new approach in the landscape of existing anomaly detection methods.
The High Luminosity upgrade for the Large Hadron Collider (HL-LHC) is due to come online in 2029. This will result in an unprecedented throughput of collision event data. Identifying and analysing meaningful signals within this information poses a formidable challenge in the search for new physics. The demand for automatic tools capable of physically-aware and data-driven inference, which can scale to meet the needs of HL-LHC, has never been higher.
In response to this, our research explores a data-driven approach, leveraging machine learning on Monte Carlo (MC) truth data. Event generators, including MadGraph5 and Pythia8, are used to simulate collision events. Low-level momentum information of the detector-level particles is extracted from these events, to form point cloud representations. These point clouds are clustered to form "jets", a low-energy representation of heavy particles produced during the high-energy collisions at the Large Hadron Collider. Traditional generalised-kT algorithms cluster particles by successively merging particle pairs based on their proximity and energy using a parameterised "distance function". In contrast, our approach assumes that more meaningful pairwise relationships exist between cluster constituents. These relationships are modelled using an Infrared and Collinear (IRC) safe Graph Neural Network (GNN). Training our model directly on low-level data, with MC truth supervision, enables it to act as a bridge from the theoretical principles embedded in event generators, to analyse experimental data. Additionally, it eliminates the need for the customary jet grooming and pruning techniques typically following generalised-kT. We explore our IRC safe GNN's performance to reconstruct boosted Higgs and top-quark signals, and contrast this with traditional techniques.
Our existing body of work focuses on the analysis of boosted Higgs reconstructions. Results indicate our model offers improved mass peak reconstruction, in terms of width and location, when compared with generalised-kT algorithms. The application of our architecture to the more complex colour structure of events containing top-quarks is under active investigation.
The Data-Directed paradigm (DDP) represents an innovative approach to efficiently investigate new physics across diverse spectra, which are in the presence of smoothly falling Standard Model (SM) backgrounds. Diverging from the conventional analysis employed in collider particle physics, DDP eliminates the necessity for a simulated or functionally derived background estimate. Instead, it directly forecasts statistical significance by utilizing a convolutional neural network trained to regress log-likelihood-based significance. This novel methodology enables the identification of mass bumps directly from the data, circumventing the need for background estimation and saving significant analysis time.
By employing a trained network to detect mass bumps in the data, the DDP approach holds the potential to significantly enhance the discovery reach by exploring numerous uncharted regions. The efficiency of this method has been demonstrated through its successful identification of various beyond standard model particles in simulation data. A detailed presentation of the methodology and recent advancements will be provided.
Markov chain Monte Carlo (MCMC) simulations is a very powerful approach to tackle a large variety of problems in all computational science. The recent advances in machine learning techniques have provided new ideas in the domain of Monte Carlo simulations. The ability of artificial neural networks to model a very wide class of probability distributions through the Variational Autoregressive Network (VAN) approach allow for instance to approximate the free energy of statistical systems.
The study focuses on a lattice model with local $Z_2$ group symmetry, specifically chosen for its simplicity and the presence of local gauge symmetry. In lattice gauge theories, $Z_2$ represents link degrees of freedom, taking discrete values of +1 or -1 for different gauge field states. Autoregressive neural networks are employed for $Z_2$ gauge systems to capture dependencies among lattice sites, facilitating the generation of realistic configurations. The joint probability of spins $p(s)$, described by Boltzmann's distribution, enables exploration of the system's behavior under diverse conditions.
In the VAN approach, the probability is written as a product of conditional probabilities of consecutive spins using Bayes rule. The goal of the training is to find parameters, $\theta$, such that the probability distribution modeled by the neural network $q_{\theta}(\bar{s})$ resembles $p(\bar{s})$, where $q_{\theta}(\bar{s})$ is sampling probability distribution and $\bar{s}$ is final configuration. For that, Kullback–Leibler (KL) divergence is the loss function used to determine the closeness between these two distributions. Training of the network employs the policy gradient approach in reinforcement learning, which unbiasedly estimates the gradient of variational parameters. The emphasis of the study lies in exploring the applicability of Neural Markov Chain Monte Carlo (NMCMC) simulations using Variational Autoregressive Network (VAN) in exploring the dynamics of systems with local gauge symmetry.
[1] D. Wu, L. Wang, and P. Zhang Phys. Rev. Lett., vol. 122, p. 080602, Mar. 2019.
[2] P. Bialas, P. Korcyl, and T. Stebel,Phys. Rev. E, vol. 107, no. 1, p. 015303,
2023.
KM3NeT is a research infrastructure housing two underwater Cherenkov telescopes located in the Mediterranean Sea. It consists of two configurations which are currently under construction: ARCA with 230 detection units corresponding to 1 cubic kilometre of instrumented water volume and ORCA with 115 detection units corresponding to a mass of 7 Mton. The ARCA (Astroparticle Research with Cosmics in the Abyss) detector aims at studying neutrinos with energies in the TeV-PeV range coming from distant astrophysical sources, while the ORCA (Oscillation Research with Cosmics in the Abyss) detector is optimised for atmospheric neutrino oscillation studies at energies of a few GeV. Artificial intelligence is increasingly used in KM3NeT for data processing and analysis, aiming to provide a better performance on event reconstruction and significantly faster inference times compared to traditional reconstruction techniques. Classical machine learning algorithms, mainly decision trees for event-type classification, have been in use since the beginning of the project. These have been followed by deep learning algorithms such as Convolutional Neural Networks (CNNs) and recently Graph Neural Networks (GNNs), which have been successfully employed for event classification and neutrino property regression tasks. In this talk, the artificial intelligence techniques used in KM3NeT, the advances in the various physics analyses as well as the impact on the physics reach of KM3NeT detectors will be presented.
Particle physics detectors introduce distortions in the observed data due to their finite resolution and other experimental factors, the task of correcting for these effects is known as unfolding. While traditional unfolding methods are restricted to binned distributions of a single observable, recently proposed ML-based methods enable unbinned, high-dimensional unfolding over the entire phase space. In this talk I will introduce some popular methods and present recent work where we compare their strengths and weaknesses as well as benchmark their performance. We find that they work well and are ready for wide-spread use within experiments.
Statistical anomaly detection empowered by AI is a subject of growing interest in high-energy physics and astrophysics. AI provides a multidimensional and highly automatized solution to enable signal-agnostic data validation, and new physics searches.
The unsupervised nature of the anomaly detection task combined with the highly complex nature of the LHC and astrophysical data give rise to a set of yet unaddressed challenges for AI.
A particular challenge is the choice of an optimized and tuned AI model architecture that is highly expressive, interpretable and incorporates physics knowledge.
Under the assumption that the anomalous effects are mild perturbations of the nominal data distribution, sparse models represent an ideal family of candidates for an anomalous classifier. We build a sparse model based on kernel methods to construct a local representation of an anomaly score in weakly supervised problems. We apply dictionary learning techniques to optimize the kernels’ location over input data, inducing the model’s attention towards anomalies-enriched regions. The resulting models are simple, expressive, and at the same time interpretable. They offer a direct handle to model experimental resolution constraints, and quantify the full magnitude of both the statistical and systematic significance of an anomaly score.
Projects such as the imminent Vera C. Rubin Observatory are critical tools for understanding cosmological questions like the nature of dark energy. By observing huge numbers of galaxies, they enable us to map the large scale structure of the Universe. To do this, however, we need reliable ways of estimating galaxy redshifts from only photometry. I will present an overview of our pop-cosmos forward modelling framework for photometric galaxy survey data, a novel approach which connects photometric redshift inference to a physical picture of galaxy evolution. Within pop-cosmos, we model galaxies as draws from a population prior distribution over redshift, mass, dust properties, metallicity, and star formation history. These properties are mapped to photometry using an emulator for stellar population synthesis (speculator/photulator), followed by the application of a learned model for a survey’s noise properties. Application of selection cuts enables the generation of mock galaxy catalogues. This naturally enables us to use simulation-based inference to solve the inverse problem of calibrating the population-level prior on physical parameters from a deep photometric galaxy survey. The resulting model can then be used to derive accurate redshift distributions for upcoming photometric surveys, for instance for facilitating weak lensing and clustering science. We use a diffusion model as a flexible population-level prior, and optimise its parameters by minimising the Wasserstein distance between forward-simulated photometry and the real survey data. I will show applications of this framework to COSMOS data, and will demonstrate how we are able to extract the redshift distribution, and make inference about galaxy physics, from our learned population prior. I will also discuss validation approaches applicable to simulation based fitting approaches.
Quantifying tension between different experimental efforts aiming to constrain the same physical models is essential for validating our understanding of the Universe. A commonly used metric of tension is the ratio, R, of the joint Bayesian evidence to the product of individual evidences for two experimental datasets under some common model. R can be interpreted as a measure of our relative confidence in a dataset given knowledge of another. The statistic has been widely adopted by the community as an appropriately Bayesian way of quantifying tensions, however it has a non-trivial dependence on the prior that is not always accounted for properly. We propose using Neural Ratio Estimators (NREs) to calibrate the prior dependence of the R statistic. We show that the output of an NRE corresponds to the tension statistic between two datasets if the network is trained on simulations of both experiments observables. Such an NRE can then be used to derive the distribution of all possible values of R, within one's model prior, for observations from the two experiments that are not in tension. The observed R for the real datasets, derived using Nested Sampling, can then be compared with this distribution to give a prior dependent determination of how in tension the experiments truly are. We demonstrate the method with a toy example from 21-cm Cosmology and discuss possible applications to Planck and BAO data.
In this talk we propose a Physics based AI framework for precise radiometer calibration in global 21cm cosmology. These experiments aim to study formation of the first stars and galaxies by detecting the faint 21-cm radio emission from neutral hydrogen. The global or sky-averaged signal is predicted to be five orders of magnitude dimmer than the foregrounds. Therefore detection of the signal requires precise calibration of the instrument receiver, which non-trivially amplifies the signals detected by the antenna. Current analytic methods appear insufficient, causing a major bottleneck in all such experiments. Unlike other methods, our receiver calibration approach is expected to be agnostic to in-field variations in temperature and environment. For the first time we propose the use of an encoder-decoder neural network framework for calibration of global 21-cm cosmology experiments.
Simulation-based inference is undergoing a renaissance in statistics and machine learning. With several packages implementing the state-of-the-art in expressive AI [mackelab/sbi] [undark-lab/swyft], it is now being effectively applied to a wide range of problems in the physical sciences, biology, and beyond.
Given the rapid pace of AI, there is little expectation that the implementations of the future will resemble these current first generation neural networks. This talk will present a new framework for simulation-based inference, linear simulation-based inference (lsbi), which abstracts the core principles of SBI from the specific details of machine learning, implementing a plug-and-play framework of linear and mixture models.
lsbi has several use-cases:
An evolving code-driven PyPI/conda research package is available at:
https://github.com/handley-lab/lsbi
PolySwyft is an implementation of a sequential simulation-based nested sampler by merging two algorithms that are commonly used for Bayesian inference: PolyChord and swyft. PolySwyft uses the NRE functionality of swyft and generates a new joint training dataset with PolyChord to iteratively estimate more accurate posterior distributions. PolySwyft can be terminated using pre-defined rounds similar to swyft or be executed in an automated mode using a KL-divergence termination criterion between current posterior estimates. We demonstrate the capabilities of PolySwyft on multimodal toy problems where the ground truth posterior is known.
The Galactic centre serves as a laboratory for fundamental physics, particularly in the context of indirect dark matter searches. This study explores the potential of the James Webb Space Telescope to shed light on self-annihilating, sub-GeV dark matter candidates by examining their influence on exoplanet overheating and providing sensitivity estimates via probabilistic programming languages.
Neural network emulators are frequently used to speed up the computation of physical models in physics. However, they generally include only a few input parameters due to the difficulty of generating a dense enough grid of training data pairs in high dimensional parameter spaces. This becomes particularly apparent for cases where they replace physical models that take a long time to compute. We utilize an active learning technique called query by dropout committee to achieve a performance comparable to training data generated on a grid, but with fewer required training examples. We also find that the emulator generalizes better compared to grid-based training: We are able to suppress poor performance which occurs in particular areas of parameter space in grid-based training. Using these methods, we train an emulator on a numerical model of the accretion flow and emission of an accreting disk around a supermassive black hole in order to infer the physical properties of the black hole. Our neural network emulator can approximate the physical simulator to 1% precision or better and achieve 10^4 times speedup over the original model.
Mutual information is one of the basic information-theoretic measures of correlations between different subsystems. It may carry interesting physical information about the phase of the system. It is notoriously difficult to estimate as it involves sums over all possible system and subsystem states. In this talk, I describe a direct approach to estimate the bipartite mutual information using generative neural networks. Our method is based on Monte Carlo sampling. I demonstrate it on the Ising model using autoregressive neural networks and on the $\phi^4$ scalar field theory using conditional normalizing flows. Our approach allows studying arbitrary geometries of subsystems. I discuss the validity of the expected area law which governs the scaling of the mutual information with the volume for both systems.
The efficient simulation of particle propagation and interaction within the detectors of the Large Hadron Collider (LHC) is of primary importance for precision measurements and new physics searches. The most computationally expensive step of the simulation pipeline is the generation of calorimeter showers, and will become ever more costly and high-dimensional as the LHC moves into its high luminosity era. Modern generative networks show great promise to become vastly faster calorimeter shower emulator altertnatives, as shown by a number of architecutres proposed in the context of the Fast Calorimeter Simulation Challenge 2022. Among them, Normalizing Flows (NFs) appear to be a particularly precise option. However, the bijective nature of the NFs tampers with their scalability. We thus propose a two-step approach for calorimer shower simulation: first we learn a lower-dimensional manifold structure with an auto-encoder, and then perform density estimation on this manifold with a NF. Our approach, lies on the notion that the seemingly high-dimensional data of high energy physics experiments, actually has a much lower intrinsic dimensionality. In machine learning, this is known as the manifold hypothesis, which states that high-dimensional data is supported on low-dimensional manifolds. By reducing the dimensionality of the data we enable faster training and generation of high-dimensional data.
Simulation is the crucial connection between particle physics theory and experiment. Our ability to simulate particle collision based on first principles allows us to analyze and understand the vast amount of data of the Large Hadron Collider (LHC) experiments. This, however, comes at a cost: A lot of computational resources are needed to simulate all necessary interactions to the required precision. Among these simulations, the interactions of the particles with the detector material, especially the calorimeters, which measure the particles' energies, is the most expensive one.
In recent years, surrogate models based on deep generative models have shown great results for fast and faithful alternatives. Given the constraints that such surrogates have to fulfill on timing and precision, models based on normalizing flows have the best potential to become such alternatives. In my talk, I will review recent progress in normalizing-flow-based alternatives for calorimeter simulation and how they compare in terms of quality and timing.
Theory predictions for the LHC require precise numerical phase-space integration and generation of unweighted events. We combine machine-learned multi-channel weights with a normalizing flow for importance sampling to improve classical methods for numerical integration. By integrating buffered training for potentially expensive integrands, VEGAS initialization, symmetry-aware channels, and stratified training, we elevate the performance in both efficiency and accuracy. We empirically validate these enhancements through rigorous tests on diverse LHC processes, including VBS and W+jets
Traditional physics simulations are fundamental in the field of particle physics. Common simulation tools like Geant4, are very precise, but comparatively slow. Generative machine learning can be used to speed up such simulations.
Calorimeter data can be represented either as images or as point clouds, i.e. permutation-invariant lists of measurements.
We advance the generative models for calorimeter showers on three frontiers:
1) increasing the number of conditional features for precise energy- and angle-wise generation with the bounded bottleneck auto-encoder (BIB-AE),
(2) improving generation fidelity using a normalizing flow model, dubbed Layer-to-Layer-Flows'' (L2LFlows),
(3) developing a diffusion model for geometry-independent calorimeter point cloud scalable to $\mathcal{O}(1000)$ points, called CaloClouds, and distilling it into a consistency model for fast single-shot sampling.
The simulation of calorimeter showers is computationally intensive, leading to the development of generative models as substitutes. We propose a framework for designing generative models for calorimeter showers that combines the strengths of voxel and point cloud approaches to improve both accuracy and computational efficiency. Our approach employs a pyramid-shaped design, where the base of the pyramid encompasses all calorimeter cells. Each subsequent level corresponds to a pre-defined clustering of cells from the previous level, which aggregates their energy. The pyramid culminates in a single cell that contains the total energy of the shower. Within this hierarchical framework, each model learns to calculate the energy of the hit cells at the current level and determines which cells are hit on the lower level. Importantly, each model only focuses on the 'hit' cells at its level. The final models solely determine the energy of individual hit cells. To accommodate differences in the hit cell cardinality across levels, we introduce two new Set Normalizing Flows which utilize Set Transformers and Deep Sets. Moreover, we propose a newly designed dequantization technique tailored for learning boolean values. We validate the framework on multiple datasets, including CaloChallenge.
The diffusion model has demonstrated promising results in image generation, recently becoming mainstream and representing a notable advancement for many generative modeling tasks. Prior applications of the diffusion model for both fast event and de- tector simulation in high energy physics have shown exceptional performance, provid- ing a viable solution to generate sufficient statistics within a constrained computational budget in preparation for the High Luminosity LHC. However, many of these applica- tions suffer from slow generation with large sampling steps and face challenges in find- ing the optimal balance between sample quality and speed. The study focuses on the latest benchmark developments in most efficient ODE/SDE-based samplers, schedulers, and fast convergence training techniques. We test on the public CaloChallenge and JetNet datasets with the designs implemented on the existing architecture, the performance of the generated classes surpass previous models, achieving significant speedup via various evaluation metrics.
Sampling techniques are a stalwart of reliable inference in the physical sciences, with the nested sampling paradigm emerging in the last decade as a ubiquitous tool for model fitting and comparison. Parallel developments in the field of generative machine learning have enabled advances in many applications of sampling methods in scientific inference pipelines.
This work explores the synergy of the latest developments in diffusion models and nested sampling. I will review the challenges of precise model comparison in high dimension, and explore how score based generative models can provide a solution. This work builds towards a public code that can apply out of the box to many established hard problems in fundamental physics, as well as providing potential to extend precise inference to problems that are intractable with classical methods. I will motivate some potential applications at the frontiers of inference that can be unlocked with these methods.
Quantum entanglement, a fundamental concept for understanding physics at atomic and subatomic scales, is explored in this presentation. We introduce a novel technique for computing quantum entanglement (Rényi) entropy, grounded on the replica trick and leveraging the abilities of generative neural networks for accurate partition function calculations. The approach is demonstrated on the 1-dimensional quantum Ising model, employing autoregressive networks and Neural Importance Sampling (NIS) for unbiased estimation. Numerical results are presented for the Rényi entropy and entropic c-function, illustrating the efficacy of the proposed methodology. This work contributes to the evolving landscape of quantum physics and underlines the transformative potential of generative neural networks in exploring entanglement phenomena.
The Pierre Auger Observatory, located in the Argentinian Pampa, is the world's largest cosmic-ray experiment. It offers the most precise measurements of cosmic particles at ultra-high energies by measuring their induced air showers. The centerpiece of the Observatory is the surface detector (SD) consisting of over 1,660 water-Cherenkov detectors that cover an area of 3,000 km$^2$ and measure the arrival time distribution of shower particles at the ground. Due to its hybrid design, the SD array is overlooked by 27 fluorescence telescopes, enabling independent cosmic-ray measurements and the absolute calibration of the SD.
Traditionally, the analysis of the SD data is based on a few observables, such as the integrated signals of the signal traces or the arrival times measured at the different stations.
With the advent of deep learning unique potential for improved reconstructions emerged since the time-resolved signals can be exploited with unprecedented detail.
In this contribution, we will summarize the successful efforts of the Pierre Auger Collaboration in developing novel deep-learning-based strategies to improve event reconstruction and data analysis to shed new light on the composition and origin of cosmic rays. Furthermore, we will discuss employed strategies to improve the robustness of machine-learning models and methods to reduce the often-existing domain gap between training data and measured data taken under real operating conditions and estimate remaining systematic uncertainties.
Machine learning, especially Deep Learning, has become a valuable tool for researchers in High Energy Physics (HEP) to process and analyse their data. Popular Python-based machine learning libraries, such as Keras and PyTorch, offer good solutions for training deep learning models also in CPU or GPU environments. However, they do not always provide a good solution for inference. They may only support their own models, often provide only a Python API, or be constrained by heavy dependencies.
To solve this problem, we have developed a tool called SOFIE, within the ROOT/TMVA project. SOFIE takes externally trained deep learning models in ONNX format or Keras and PyTorch native formats and generates C++ code that can be easily included and invoked for fast inference of the model. The code has a minimal dependency and can be easily integrated into the data processing and analysis workflows of the HEP experiments.
We will present the latest developments in SOFIE, which include the support for parsing Graph Neural Networks trained with the Python Graph Net library, as well as the support for ONNX operators needed for transformer models.
We will also show how SOFIE can be used to generate code for accelerators, such as GPU using SYCL, in order to achieve optimal performance in large model evaluations. We will present benchmarks and comparisons with other existing inference tools, such as ONNXRunTime, using deep learning models used by the LHC experiments.
Successfully and accurately inferring the properties of compact binary mergers observed by facilities including Virgo and LIGO requires accurate and fast waveform models. Direct calculation from general relativity is not currently feasible, and approximations that are used to produce tractable models necessarily induce errors.
Using Gaussian process regression (GPR), we have developed a technique to quantify the systematic errors in an approximate waveform model. We have incorporated our model into a parameter estimation pipeline, which allows the waveform’s uncertainty to be included in the posterior probability distribution over the astrophysical properties inferred for a GW signal’s source.
We present early results of parameter estimation using different GPR-backed waveform models which incorporate waveform uncertainty. These techniques will be vital to performing accurate parameter estimation in the high signal-to-noise regimes anticipated in third generation detectors such as the Einstein Telescope.
The next-generation ground-based gamma-ray observatory, the Cherenkov Telescope Array Observatory (CTAO), will consist of two arrays of tens of imaging atmospheric Cherenkov telescopes (IACTs) to be built in the Northern and Southern Hemispheres, aiming to improve the sensitivity of current-generation instruments by a factor of five to ten. Three different sizes of IACTs are proposed to cover an energy range from 20 GeV to more than 300 TeV. This contribution focuses on the analysis scheme of the Large-Sized Telescope (LST), which is in charge of the reconstruction of the lower energy gamma rays of tens of GeV. The Large-Sized Telescope prototype (LST-1) of CTAO is in the final stage of its commissioning phase collecting a significant amount of observational data.
The working principle of IACTs consists of the observation of extended air showers (EASs) initiated by the interaction of very-high-energy (VHE) gamma rays and cosmic rays with the atmosphere. Cherenkov photons induced by a given EAS are recorded in fast-imaging cameras containing the spatial and temporal development of the EAS together with the calorimetric information. The properties of the originating VHE particle (type, energy and incoming direction) can be inferred from those recordings by reconstructing the full-event using machine learning techniques. We explore a novel full-event reconstruction technique based on deep convolutional neural networks (CNNs) applied on calibrated waveforms of the IACT camera pixels using CTLearn. CTLearn is a package that includes modules for loading and manipulating IACT data and for running deep learning models, using pixel-wise camera data as input.
In-beam gamma-ray spectroscopy, particularly with high-velocity recoil nuclei, necessitates precise Doppler correction. The Advanced GAmma Tracking Array (AGATA) represents a groundbreaking development in gamma-ray spectrometers, boasting the ability to track gamma-rays within the detector. This capability leads to exceptional position resolution which ensures optimal Doppler corrections.
AGATA's design features high-purity germanium crystals, with each crystal divided electrically into 36 segments for enhanced detection accuracy. The core of AGATA's position resolution lies in the Pulse Shape Analysis (PSA) algorithm, responsible for pinpointing gamma-ray interaction locations. This algorithm functions by matching observed signals with a pre-established database of signals. However, the current model of relying solely on simulated signals for the PSA database presents limitations. In contrast, utilizing experimental data for building the PSA database promises significant improvements in accuracy and efficiency.
The experimental data is acquired by scanning the crystal using collimated gamma-ray sources. Utilizing what is known as the Strasbourg Scanning Table, the crystal is scanned both horizontally and vertically, the gathered signals are then matched using the Pulse Shape Coincidence Scan (PSCS) algorithm to be assigned to a unique 3D position. The PSCS is notably time-intensive, requiring approximately several days to analyse entire datasets.
In this work, we propose a new algorithm to replace the PSCS, based on machine learning techniques. Specifically, we employed Long Short-Term Memory (LSTM) networks, renowned for their robustness and their ability to decipher time series. The loss function has been adapted to incorporate Strasbourg’s scanning table specificities. The processing time of the signals was brought down to only about an hour using this model. Different metrics were used to compare our new results to the PSCS reference, indicating a greater consistency and accuracy.
This contribution presents the ERC-funded project NuRadioOpt, which aims to substantially increase the detection rate of ultra-high-energy (UHE) cosmic neutrinos for large in-ice radio arrays such as the Radio Neutrino Observatory Greenland (RNO-G, under construction) and the envisioned IceCube-Gen2 project. These detectors consist of autonomous compact detector stations with very limited power (~20W) and bandwidth (10kB/s) that record the signals from multiple antennas at approx. 1 Gsamples/s. I will present neural networks replacing the threshold-based trigger foreseen for future detectors that increase the detection rate of UHE neutrinos by up to a factor of two at negligible additional cost. As the expected detection rates are low, a neural-network-based trigger will substantially enhance the science capabilities of UHE neutrino detectors, e.g., IceCube-Gen2 will be able to measure the neutrino-nucleon cross-section at EeV energies with more than 2x smaller uncertainty. I will report on an efficient FPGA implementation to run the neural networks under strict power constraints and show lab tests demonstrating the performance under realistic conditions. I will briefly report on a new DAQ system, currently under development, using recent advances in fast, low-power ADCs to run the algorithms in real-time and give an outlook of how neuromorphic computing could further increase power efficiency.
We present the preparation, deployment, and testing of an autoencoder trained for unbiased detection of new physics signatures in the CMS experiment Global Trigger test crate FPGAs during LHC Run 3. The Global Trigger makes the final decision whether to readout or discard the data from each LHC collision, which occur at a rate of 40 MHz, within a 50 ns latency. The Neural Network makes a prediction for each event within these constraints, which can be used to select anomalous events for further analysis. The implementation occupies a small percentage of the resources of the system Virtex 7 FPGA in order to function in parallel to the existing logic. The GT test crate is a copy of the main GT system, receiving the same input data, but whose output is not used to trigger the readout of CMS, providing a platform for thorough testing of new trigger algorithms on live data, but without interrupting data taking. We describe the methodology to achieve ultra low latency anomaly detection, and present the integration of the DNN into the GT test crate, as well as the monitoring, testing, and validation of the algorithm during proton collisions.
"Data deluge" refers to the situation where the sheer volume of new data generated overwhelms the capacity of institutions to manage it and researchers to use it[1]. Data Deluge is becoming a common problem in industry and big science facilities like the synchrotron laboratory MAX IV and the Large Hadron Collider at CERN[2].
As a novel solution to this problem, a small cross-disciplinary collaboration of researchers has developed a machine learning-based data compression tool called "Baler". Developed as an open-source project[3] and delivered as an easy-to-use pip package[4], the machine learning-based technique of Baler allows researchers to derive lossy compression algorithms tailored to their data sets[5]. This compression method yields substantial data reduction and can compress scientific data to 1% of its original size.
With recent successes, Baler performed compression and decompression of data on field-programmable gate arrays (FPGAs). This "real-time" compression enables data to be compressed at high rates and transferred in greater amounts over small bandwidths, allowing Baler to extend its reach into the field of bandwidth compression.
This contribution will bring an overview of the Baler software tool and results from Particle Physics, X-ray ptychography, Computational Fluid Dynamics, and Telecommunication.
[1] https://www.hpe.com/us/en/what-is/data-deluge.html
[2] https://cerncourier.com/a/time-to-adapt-for-big-data/
[3] https://github.com/baler-collaboration/baler
[4] https://pypi.org/project/baler-compressor/
[5] https://arxiv.org/abs/2305.02283
The Phase-II upgrade of the LHC will increase its instantaneous luminosity by a factor of 5-7 leading to the HL-LHC. The ATLAS Liquid Argon (LAr) calorimeter measures the energy of particles produced in LHC collisions. In order to enhance the ATLAS physics discovery potential in the blurred environment created by the pileup, it is crucial to have an excellent energy resolution and an accurate detection of the energy-deposit time.
The energy computation is currently done using optimal filtering algorithms that assume a nominal pulse shape of the electronic signal. Up to 200 simultaneous proton-proton collisions are expected at the HL-LHC which leads to a high rate of overlapping signals in a given calorimeter channel. This results in a significant energy degradation especially for low time-gap between two consecutive pulses. We developed several neural network (NN) architectures showing significant performance improvements with respect to the filtering algorithms. These NNs are capable to recover the degraded performance in the low-time gap region by using the information from past events.
The energy computation is performed in real-time using dedicated electronic boards based on FPGAs. FPGAs are chosen for their capacity to treat large amount of data (O(1Tb/s) per FPGA) with low latency (O(1000ns)). The back-end electronic boards for the Phase-II upgrade of the LAr calorimeter will use the next high-end generation of INTEL FPGAs with increased processing power. This is a unique opportunity to develop more complex algorithms on these boards. Several hundreds of channels should be treated by each FPGA and thus several hundreds of NNs should run on one FPGA. The energy computation should be done at a fixed latency of the order of 100 ns. The main challenge is to meet these stringent requirements in the firmware implementation.
Special effort was dedicated to minimize the needed computational operations while optimizing the NNs architectures. Each internal operation of the NNs is optimized during the firmware implementation. This includes complex mathematical functions implementation in LookUp Tables (LUTs), quantization of arithmetic operations using fixed-point representations and rounding, and optimisation of the usage of FPGA logic elements. The firmware implementation results are compared to software and the resolution due to firmware approximations was found to be around 1%.
NN algorithms based on CNN, RNN, and LSTM architectures will be presented. The improvement of the energy resolution compared to the legacy filter algorithms will be discussed. The results of firmware implementation in VHDL and Quartus HLS will be presented. The implementation results on Stratix 10 and Agilex INTEL FPGAs, including the resource usage, latency, and operation frequency will be reported. Optimised implementations in VHDL are shown to fit the stringent requirements of the LASP firmware specifications. Additionally a test on a Stratix 10 INTEL development kit of the one NN implementation will presented.
The expected increase in the recorded dataset for future upgrades of the main experiments at the Large Hadron Collider (LHC) at CERN, including the LHCb detector, while having a limited bandwidth, comes with computational challenges that classic computing struggles to solve. Emerging technologies such as Quantum Computing (QC), which exploits the principles of superposition and interference, have great potential to play a major role in solving these issues.
Significant progress has been made in the field of QC applied for particle physics, laying the ground for applications closer to a realistic scenario, especially for track reconstruction of charged trajectories within experimental setups like the LHCb. This is one of the biggest computational challenges for such an experiment as it must be performed at a high rate of 1010 tracks per second while maintaining a very high reconstruction efficiency.
In this talk, the application of two of the most well-known QC algorithms will be presented to deal with track reconstruction at one of the main LHC experiments, LHCb: the Harrod-Hassidim-Lloyd (HHL) algorithm for solving linear systems of equations and the Quantum Approximate Optimization Algorithm (QAOA), specialized in combinatorial problems. Results running the algorithms using increasingly complex simulated events will be shown, including actual LHCb simulated samples. Finally, ongoing and future work to make the running of these algorithms efficient in QC hardware will be discussed.
We have been studying the use of deep neural networks (DNNs) to identify and locate primary vertices (PVs) in proton-proton collisions at the LHC. Previously reported results demonstrate that a hybrid architecture, using a fully connected network (FCN) as the first stage and a convolutional neural network (CNN) as the second stage provides better efficiency than the default heuristic algorithm for the same low false positive rate. The input features are individual track parameters and the output is a list of PV positions in the beam direction.
More recently, we have studied how replacing the hybrid architecture with a Graph Neural Network (GNN) can improve the predictions of PV positions directly from tracks parameters, and also enable tracks-to-vertex associations. The latter opens the way to additional predictions of the positions of secondary vertex position (SV), and SV-to-PV association. For the first time, we report the results of these preliminary studies, and discuss the advantages and disadvantages of using GNNs compared to our hybrid FC+CNN architecture.
Continuing from our prior work \citep{10.1093/mnras/stac3797}, where a single detector data of the Einstein Telescope (ET) was evaluated for the detection of binary black hole (BBHs) using deep learning (DL). In this work we explored the detection efficiency of BBHs using data combined from all the three proposed detectors of ET, with five different lower frequency cutoff ($F_{low}$): 5 Hz, 10 Hz, 15 Hz, 20 Hz and 30 Hz, and the same previously used SNR ranges of: 4-5, 5-6, 6-7, 7-8 and >8. Using ResNet model (which had the best overall performance on single detector data), the detection accuracy has improved from $60\%$, $60.5\%$, $84.5\%$, $94.5\%$ and $98.5\%$ to $78.5\%$, $84\%$, $99.5\%$, $100\%$ and $100\%$ for sources with SNR of 4-5, 5-6, 6-7, 7-8 and >8 respectively. The results show a great improvement in accuracy for lower SNR ranges: 4-5, 5-6 and 6-7 by $18.5\%$, $24.5\%$, $13\%$ respectively, and by $5.5\%$ and $1.5\%$ for higher SNR ranges: 7-8 and >8 respectively. In a qualitative evaluation, ResNet model was able to detect sources at 86.601 Gpc, with 3.9 averaged SNR (averaged SNR from the three detectors) and 13.632 chirp mass at 5 Hz. It was also shown that the use of the three detectors combined data is appropriate for near-real-time detection, and can be significantly improved using more powerful setup.
Machine learning (ML) plays a significant role in data mining at the High Energy Physics experiments. An overview of ML applications at the ATLAS experiments will be shown, with highlights in Physics Beyond the Standard model searches using anomaly detection and active learning. Additionally, advances in the object reconstruction and improvements in simulation using ML will be shown.
In particle collider experiments, such as the ATLAS and CMS experiments at CERN, high-energy particles collide and shatter into a plethora of charged particles traversing a silicon detector and leaving energy deposits, or hits, on the detector modules. The reconstruction of charged-particle trajectories (tracks) from these hits, an integral part in any physics program at the Large Hadron Collider (LHC), ranks among the most demanding tasks in offline computing, and, due to an increased level of pileup, faces steep challenges in computational resources and complexity in the upcoming High Luminosity phase (HL-LHC). Track pattern recognition algorithms based on Graph Neural Networks (GNNs) have been demonstrated as a promising approach to these problems [1,2,3,4]. In this contribution, we present the first machine learning pipeline developed for track reconstruction in silicon detectors. Motivated by the ATLAS ITk, we propose to apply this AI algorithm at an early stage in the processing chain, on every recorded event using raw data from the tracking detector. We discuss machine learning techniques employed in various stages of our pipeline, including building graphs from detector outputs, graph filtering, edge classification with GNNs, and graph segmentation to yield tracks. We address the unique memory and time constraints associated with running a deep-learning algorithm at a low-level data processing stage, and how we meet these requirements with our model design. The pipeline's physics and computational performance will be demonstrated, along with optimisations that reduce computational cost without affecting physics performance. We also describe the challenges to deployment in the HL-LHC and our steps toward a seamless integration into existing analysis software at CERN, highlighting our commitment to advancing AI-based track reconstruction for high-energy physics.
Reference:
[1] Biscarat, Catherine et al. “Towards a realistic track reconstruction algorithm based on graph neural networks for the HL-LHC”. In: EPJ Web Conf. 251 (2021), p. 03047. doi: 10.1051/epjconf/202125103047. URL: https://doi.org/10.1051/epjconf/202125103047.
[2] Xiangyang Ju et al. “Performance of a geometric deep learning pipeline for HL-LHC particle tracking”. In: The European Physical Journal C 81.10 (Oct. 2021). issn: 1434-6052. doi: 10.1140/epjc/s10052-021-09675-8. URL: http://dx.doi.org/10.1140/epjc/s10052-021-09675-8.
[3] Sylvain Caillou et al. Physics Performance of the ATLAS GNN4ITk Track Reconstruction Chain. Tech. rep. Geneva: CERN, 2023. URL: https://cds.cern.ch/record/2871986.
[4] Heberth Torres. “Physics Performance of the ATLAS GNN4ITk Track Reconstruction Chain”. In: (2023). URL: https://cds.cern.ch/record/2876457.
Research on Universe and Matter (ErUM) at major infrastructures such as CERN or large observatories, jointly conducted with university groups, is an important driver for the digital transformation. In Germany, about 20.000 scientists are working on ErUM-related sciences and can benefit from actual methods of artificial intelligence. The central networking and transfer office ErUM-Data-Hub provides support by designing, organizing and performing schools and workshops for young and expert scientists in the areas of big data, machine learning, sustainable computing and many more. We present the actual achievements of the ErUM-Data-Hub in the German ErUM community since its implementation in early 2022.
Isotopic composition measurements of singly-charged Cosmic Rays (CR) provide essential insights into CR transport in the Galaxy. The Alpha Magnetic Spectrometer (AMS-02) can identify singly-charged isotopes up to about 10 GeV/n. However, their identification presents challenges due to the small abundance of CR deuterons compared to the proton background. In particular, a high accuracy for the velocity measured by the Ring Imaging Cherenkov Detector (RICH) is needed to achieve a good isotopic mass separation over a wide range of energies.
The velocity measurement with the RICH is particularly challenging for Z=1 isotopes due to the low number of photons produced in the Cherenkov rings. This faint signal is easily disrupted by noisy hits leading to a misrecostruction of the particles' ring. Hence, an efficient background reduction process is needed to ensure the quality of the reconstructed Cherenkov rings and provide a correct measurement of the particles' velocity. Machine Learning methods, particularly Boosted Decision Trees, are well suited for this task, but their performance relies on the choice of the features needed for their training phase. While physics-driven feature selection methods based on the knowledge of the detector are often used, Machine Learning algorithms for automated feature selection can provide a helpful alternative that optimises the classification method's performance. We compare five algorithms for selecting the feature samples for RICH background reduction, achieving the best results with the Random Forest method. We also test its performance against the physics-driven selection method, obtaining better results.
The interTwin project, funded by the European Commission, is at the forefront of leveraging 'Digital Twins' across various scientific domains, with a particular emphasis on physics and earth observation. One of the most advanced use-cases of interTwin is event generation for particle detector simulation at CERN. interTwin enables particle detector simulations to leverage AI methodologies on cloud to high-performance computing (HPC) resources by using itwinai - the AI workflow and method lifecycle module of interTwin.
The itwinai module, a comprehensive solution for AI workflow and method lifecycle developed collaboratively by CERN and the Julich Supercomputing Center (JSC), serves as the cornerstone for researchers, data scientists, and software engineers engaged in developing, training, and maintaining AI-based methods for scientific applications, such as the particle event generation. Its role is advancing interdisciplinary scientific research through the synthesis of learning and computing paradigms. This framework stands as a testament to the commitment of the interTwin project towards co-designing and implementing an interdisciplinary Digital Twin Engine. Its main functionalities and contributions are:
Distributed Training: itwinai offers a streamlined approach to distributing existing code across multiple GPUs and nodes, automating the training workflow. Leveraging industry-standard backends, including PyTorch Distributed Data Parallel (DDP), TensorFlow distributed strategies, and Horovod, it provides researchers with a robust foundation for efficient and scalable distributed training. The successful deployment and testing of itwinai on JSC's HDFML cluster underscore its practical applicability in real-world scenarios.
Hyperparameter Optimization: One of the core functionalities of itwinai is its hyperparameter optimization, which plays a crucial role in enhancing model accuracy. By intelligently exploring hyperparameter spaces, itwinai eliminates the need for manual parameter tuning. The functionality, empowered by RayTune, contributes significantly to the development of more robust and accurate scientific models.
Model Registry: A key aspect of itwinai is its provision of a robust model registry. This feature allows researchers to log and store models along with associated performance metrics, thereby enabling comprehensive analyses in a convenient manner. The backend, leveraging MLFlow, ensures seamless model management, enhancing collaboration and reproducibility.
In line with the theme of the first European AI for Fundamental Physics Conference of connecting AI activities across various branches of fundamental physics, interTwin and its use-cases empowered by itwinai are positioned at the intersection of AI, computation and fundamental physics. itwinai emerges as a catalyst for the synthesis of physics-based simulations with novel machine learning and AI-based methods. This interconnectedness is essential for addressing grand challenges, such as developing novel materials for building new computing platforms for AI for physics.
In conclusion, itwinai is a valuable and versatile resource, empowering researchers and scientists to embark on collaborative and innovative scientific research endeavors across diverse domains. The integration of physics-based digital twins and AI frameworks broadens possibilities for exploration and discovery through itwinai’s user-friendly interface and powerful functionalities.
Nested sampling is a tool for posterior estimation and model comparison across a wide variety of cross-disciplinary fields, and is used in Simulation Based Inference and AI emulation. This talk explores the performance and accuracy gains to be made in high dimensional nested sampling by rescuing the discarded likelihood evaluations available in present nested sampling runs, and is thus useful to all users of nested sampling.
Phenomenological analyses in beyond the Standard Model (BSM) theories assess the viability of BSM models by testing them against current experimental data, aiming to explain new physics signals. However, these analyses face significant challenges. The parameter space in BSM models are commonly large and high dimensional. The regions capable of accommodating a combination of experimental results, is often sparse and potentially disconnected. Moreover, the numerical evaluation for each configuration computationally expensive.
To address these challenges, our work introduces a batched Multi-Objective Constrained Active Search approach. Physical observables and statistical tests, such as particle masses and $\chi^2$-tests from experimental data respectively, are defined as the objectives with pre-defined constraints. We use probabilistic models as surrogates for the objectives to enhance sample efficiency, and a volume-based active sampling strategy, that uses the surrogates to effectively characterise and populate satisfactory regions within the parameter space of BSM models.
We employ the algorithm with the B-L SSM model to accommodate results an observed signal at $\sim 95$ GeV in neutral scalar searches in $h \rightarrow \gamma \gamma$ channel. We found that the algorithm efficiently identifies satisfactory regions in the parameter space of the B-L SSM model, improving previous studies in this model. We conclude by outlining future directions for this research, and sharing the developed tools for the community's use.
The GeV gamma-ray sky, as observed by the Fermi Large Area Telescope (Fermi LAT), harbours a plethora of localised point-like sources. At high latitudes ($|b| >30^{\circ}$), most of these sources are of extragalactic origin. The source-count distribution as a function of their flux, $\mathrm{d}N/\mathrm{d}S$, is a well-established quantity to summarise this population. We employ sequential simulation-based inference using the truncated marginal neural ratio estimation (TMNRE) algorithm on 15 years of Fermi LAT data to infer the $\mathrm{d}N/\mathrm{d}S$ distribution between 1 and 10 GeV in this part of the sky. While our approach allows us to cross-validate existing results in the literature, we demonstrate that we can go further than mere parameter inference. We derive a source catalogue of detected sources at high latitudes in terms of position and flux obtained from a self-consistently determined detection threshold based on the LAT's instrument response functions and utilised gamma-ray background models.
Type Ia supernovae (SNae Ia) are instrumental in constraining cosmological parameters, particularly dark energy. State-of-the-art likelihood-based analyses scale poorly to future large datasets, are limited to simplified probabilistic descriptions of e.g. peculiar velocities, photometric redshift uncertainties, instrumental noise, and selection effects, and must explicitly sample a high-dimensional latent posterior to infer the few parameters of interest, which makes them inefficient.
I present a wholistic simulation-based approach to SN Ia cosmology that addresses these issues. I demonstrate cosmological inference from 100 000 mock SNae Ia, as expected from future surveys like LSST, using truncated marginal neural ratio estimation and a method that uses the approximate posteriors to construct regions with guaranteed frequentist coverage. Using an improved simulator and neural network, I also preform parameter estimation and principled Bayesian model comparison from real light curve data, examining the interplay of dust extinction and magnitude differences related to host stellar mass. Lastly, I discuss a simulation-based treatment of selection effects and non-Ia contamination, wherein the simulator produces varying-size data sets processed by a set-based neural network. In the future, these components will be combined in a grand unified cosmological analysis of type Ia supernovae.
The Euclid space telescope will measure the shapes and redshifts of billions of galaxies, probing the growth of cosmic structures with an unprecedented precision. However, the increased quality of these data also means a significant increase in the number of nuisance parameters, making the cosmological inference a very challenging task. In this talk, I discuss the first application of Marginal Neural Ration Estimation (MNRE) (a recent approach in so-called simulation-based inference) to Euclid primary observables, like cosmic shear and galaxy-clustering spectra. Using expected Euclid experimental noise, I show how it’s possible to recover the posterior distribution for the cosmological parameters using an order of magnitude fewer simulations than conventional likelihood-based methods. This result supports that MNRE is a powerful framework to analyse Euclid data (and potentially any cosmic data), allowing to extend the model complexity beyond what its currently achievable with standard methods.
In some sense, the detection of a stochastic gravitational wave background (SGWB) is one of the most subtle GW analysis challenges facing the community in the next-generation detector era. For example, at an experiment such as LISA, to extract the SGWB contributions, we must simultaneously: detect and analyse thousands of highly overlapping sources including massive binary black holes mergers and galactic binaries; constrain and characterise the instrumental noise (which will not be known fully pre-flight and may be non-stationary); and finally separate the SGWB components that might be astrophysical or cosmological in origin. In this talk, I will discuss the application of simulation-based inference techniques, implemented in the code saqqara, to this analysis problem, focussing on the ability of SBI to strike a balance between the potentially conflicting goals of precision, scalability, and computational cost.
COSMOPOWER is a state-of-the-art Machine Learning framework adopted by all major Large-Scale Structure (LSS) and Cosmic Microwave Background (CMB) international collaborations for acceleration of their cosmological inference pipelines. It achieves orders-of-magnitude acceleration by replacing the expensive computation of cosmological power spectra, traditionally performed with a Boltzmann code, with neural emulators.
I will present recent additions to COSMOPOWER which render it into a fully-differentiable library for cosmological inference. I will demonstrate how it is possible to use its differentiable emulators to enable scalable and efficient statistical inference by means of hierarchical modelling and simulation-based inference. Leveraging the benefits of automatic differentiation, XLA optimisation, and the ability to run on GPUs and TPUs, COSMOPOWER allows for efficient sampling through gradient-based methods and significantly enhances the performance of neural density estimation for simulation-based inference, augmenting it with the simulator gradients. Additionally, I will show how COSMOPOWER allows the user to create end-to-end pipelines that achieve unparalleled accuracy in the final cosmological constraints, due to their ability to efficiently sample an unprecedentedly large number of nuisance parameters for the modelling of systematic effects.
With new astronomical surveys, we are entering a data-driven era in cosmology. Modern machine learning methods are up for the task to optimally learn the Universe from low to high redshift. In 3D, tomography of the large-scale structure (LSS) via the 21cm line of hydrogen targeted by the SKA (Square Kilometre Array) can both teach about properties of sources and gaseous media between, while producing data rates of TB/s. In this talk I first showcase the use of networks that are tailored to directly infer fundamental properties from such tomographic data. I compare network models and highlight how a comparably simple 3D architecture (3D-21cmPIE-Net) that mirrors the data structure performs best. I present well-interpretable gradient-based saliency and discuss robustness against foregrounds and systematics via transfer learning. I complement these findings with a discussion of lower redshift results for the recent SKA Data Challenge, where hydrogen sources were to be detected and characterised in a TB cube. I will highlight my team’s lessons-learned; our networks performed especially well when asked to characterise flux and size of sources over several orders in signal-to-noise. Finally, moving from 3D to 1D, for the classification infrastructure group of the new ESO workhorse 4MOST (4-metre Multi-Object Spectroscopic Telescope), I detail the official object classification pipeline layer set to efficiently group the ~40,000 spectra per night (40 million in total) the instrument will collect.
This talk presents a novel approach to dark matter direct detection using anomaly-aware machine learning techniques in the DARWIN next-generation dark matter direct detection experiment. I will introduce a semi-unsupervised deep learning pipeline that falls under the umbrella of generalized Simulation-Based Inference (SBI), an approach that allows one to effectively learn likelihoods straight from simulated data, without the need for complex functional dependence on systematics or nuisance parameters. I also present an inference procedure to detect non-background physics utilizing an anomaly function derived from the loss functions of the semi-unsupervised architecture. The pipeline's performance is evaluated using pseudo-data sets in a sensitivity forecasting task, and the results suggest that it offers improved sensitivity over traditional methods.
PolyChord
was originally advertised encouraging users to experiment with their own clustering algorithms. Identifying clusters of nested sampling live points is critical for PolyChord
to perform nested sampling correctly. We have updated the Python
interface of PolyChordLite
to allow straightforward substitution of different clustering methods.
Recent reconstructions of the primordial matter power spectrum $\mathcal P_\mathcal R(k)$ with a flex-knot revealed that the K-Nearest-Neighbours algorithm used by PolyChord
cannot reliably detect the two posterior modes caused by cosmic variance and detector resolution. After exploring a number of different algorithms, we have found the X-means algorithm to be a reliable substitute for the power spectrum reconstruction.
This work prompted the development of additions fo the post-processing tool anesthetic
, allowing posterior modes corresponding to different live pint clusters to be analysed and plotted independently.
This study explores the inference of BSM models and their parameters from kinematic distributions of collider signals through an n-channel 1D-Convolutional Neural Network (n1D-CNN). Our approach enables simultaneous inference from distributions of any fixed number of observables. As our training data are computationally expensive simulations, we also introduce a novel data augmentation technique that fully utilizes generated data. This involves adapting our architecture to include auxiliary information as additional inputs, allowing inference from any signal regions using the same trained network. To illustrate our approach, we apply the method to mono-X signals for inferring parameters of dark matter models.
Sensitivity forecasts inform the design of experiments and the direction of theoretical efforts. To arrive at representative results, Bayesian forecasts should marginalize their conclusions over uncertain parameters and noise realizations rather than picking fiducial values. However, this is typically computationally infeasible with current methods for forecasts of an experiment’s ability to distinguish between competing models. We thus propose a novel simulation-based methodology utilizing neural Bayes ratio estimators capable of providing expedient and rigorous Bayesian model comparison forecasts without relying on restrictive assumptions.
Flavour-tagging, the identification of jets originating from b and c quarks, is a critical component of the physics programme of the ATLAS experiment. Current flavour-tagging algorithms rely on the outputs of “low level” taggers, which are a mixture of manually optimised, physically informed algorithms and machine learning models. A new approach, instead uses a single machine learning model which is trained end-to-end and does not require inputs from existing low level taggers, leading to reduced overall complexity and enhanced performance. The model uses a Graph Neural Network/Transformer architecture to combine information from a variable number of tracks within a jet in order to simultaneously predict the flavour of the jet, the partitioning of tracks in the jet into vertices, and information about the physical origin of the tracks. The auxiliary training tasks are shown to improve performance, whilst also providing insight into the physics of the jet and increasing the explainability of the model. This approach compares favourably with existing state of the art methods, in particular in the challenging high transverse momenta environment, and for b- vs c-jet discrimination leading to improved c-tagging.
Please note that this is an ATLAS talk. If selected, a spaker will be appointed later on.
Our primary objective is to achieve a pioneering measurement of the challenging $gg\rightarrow ZH$ process in Large Hadron Collider (LHC) data to extract new physics contributions in the context of the Standard Model Effective Field Theory (SMEFT) framework. By leveraging the power of multi-head attention mechanism within Transformer encoders, we developed an innovative approach to efficiency capture long-range dependencies and contextual information in sequences of particle-collision-event final-state objects. This new technique enhances our ability to extract SMEFT parameters that are not well constrained by other measurements and deepens our understanding of fundamental interactions within the Higgs-boson sector. This presentation showcases the versatility of Transformer networks beyond their original domain and presents new opportunities for advanced data-driven physics research at the LHC.
Generative networks are promising tools for fast event generation for the LHC, yet struggle to meet the required precision when scaling up to particles in the final state. We employ the flexibility of autoregressive transformers to tackle this challenge, focusing on Z and top quark pair production with additional jets. We demonstrate the use of classifiers in combination with the autoregressive transformer to further improve the precision of the generated distributions.
Particle track reconstruction is a fundamental aspect of experimental analysis in high-energy particle physics. Conventional methodologies for track reconstruction are suboptimal in terms of efficiency in anticipation of the High Luminosity phase of the Large Hadron Collider. This has motivated researchers to explore the latest developments in deep learning for their scalability and potential enhanced inference efficiency.
We assess the feasibility of three Transformer-inspired model architectures for hit clustering and classification. The first model uses an encoder-decoder architecture to reconstruct a track auto-regressively, given the coordinates of the first few hits. The second model employs an encoder-only architecture as a classifier, using predefined labels for each track. The third model, also utilising an encoder-only configuration, regresses track parameters, and subsequently assigns clusters in the track parameter space to individual tracks.
We discuss preliminary studies on a simplified dataset, showing high success rates for all models under consideration, alongside our latest results using the TrackML dataset from the 2018 Kaggle challenge. Additionally, we present our journey in the adaptation of models and training strategies, addressing the tradeoffs among training efficiency, accuracy, and the optimisation of sequence lengths within the memory constraints of the hardware at our disposal.
Supervised learning has been used successfully for jet classification and to predict a range of jet properties, such as mass and energy. Each model learns to encode jet features, resulting in a representation that is tailored to its specific task. But could the common elements underlying such tasks be combined in a single foundation model to extract features generically? To address this question, we explore self-supervised learning (SSL), inspired by its applications in the domains of computer vision and natural language processing. Besides offering a simpler and more resource-effective route when learning multiple tasks, SSL can be trained on unlabeled data, e.g. large sets of collision data. We demonstrate that a jet representation obtained through SSL can be readily fine-tuned for downstream tasks of jet kinematics prediction and jet classification. Compared to existing studies in this direction, we use a realistic full-coverage calorimeter simulation, leading to results that more faithfully reflect the prospects at real collider experiments.
The Advanced Virgo interferometer is a complex machine constantly monitored by a vast array of sensors, producing the auxiliary channels datastream. Many analytical tools aid in the task of navigating the information cointained in the $\sim 10^5$ channels, but the limitations of the linear algorithms can hinder their capabilities of correctly assessing the health of the instrument. In this work we propose to exploit the non-linearity and the flexibility of Transformers algorithms to build an unsupervised tool capable of detecting anomalies in the auxiliary channels. The algorithm was able to flag periods of anomalous behaviors, performing real-time inference and give a quantitative measure of the health of each channel. This will help operators in quickly detecting previously hard-to-diagnose problems that arise in the instrument.
In 2015, the first gravitational wave from a binary black hole merger was detected and since then, Ligo-Virgo-Kagra have observed many binary black hole mergers. However, identifying these cosmic events is computationally expensive. Therefore, fast data analysis will be essential in order to make future gravitational-wave observations a success. Template banks are used to identify potential gravitational-wave events but can take weeks to generate. In this research, machine learning is used to accelerate template bank generation by replacing direct computation of the ‘match’ with a multilayered-perceptron (LearningMatch). The model is able to predict the ‘match’ to 1% accuracy and is 3 orders of magnitude faster than current methods. Once the trained model is integrated into the template bank generation algorithm (TemplateGeNN), a template bank can be generated in hours rather than weeks!
In the LHCb experiment, during Run2, more than 90% of the computing resources available to the Collaboration were used for detector simulation. The detector and trigger upgrades introduced for Run3 allow to collect larger datasets that, in turn, will require larger simulated samples. Despite the use of a variety of fast simulation options, the demands for simulations will far exceed the pledged resources.
To face upcoming and future requests for simulated samples, we propose Lamarr, a novel framework implementing a flash-simulation paradigm via parametric functions and deep generative models.
Integrated within the general LHCb Simulation software framework, Lamarr provides analysis-level variables taking as input particles from physics generators, and parameterizing the detector response and the reconstruction algorithms. Lamarr consists of a pipeline of machine-learning-based modules that allow, for selected sets of particles, to introduce reconstruction errors or infer high-level quantities via (non-)parametric functions.
Good agreement is observed by comparing key reconstructed quantities obtained with Lamarr against those from the existing detailed Geant4-based simulation. A reduction of at least two orders of magnitude in the computational cost for the detector modeling phase of the LHCb simulation is expected when adopting Lamarr.
With metallic-magnetic calorimeters (MMCs) - like the maXs-detector series developed within this collaboration - promising new tools for high precision x-ray spectroscopy application have become available. Because of their unique working principles, MMCs combine several advantages over conventional energy- and wavelength-dispersive photon detectors. They can reach spectral resolving powers of up to $E / \Delta E \approx 6000$ (at $60\, \text{keV}$) [1] - comparable to crystal spectrometers. At the same time, they cover a broad spectral range of typically $1-100\, \text{keV}$ similar to semiconductor detectors. Combined with their excellent linearity [2] and a sufficiently fast rise time - e.g., for coincidence measurement schemes as shown in [3] - they are particularly well suited for fundamental physics research in atomic physics. However, because of their high sensitivity, external sources of noise like fluctuating magnetic fields or physical vibrations lead to measurement artifacts like temperature dependant sensitivity drifts or the occurrence of satellite peaks (see for example [5]). Thus a shift from traditional analog to a digital signal processing is necessary to exploit the detector's full potential.
During several successful benchmark experiments [3-6] a comprehensive signal analysis software framework was developed. Though, setting up the detectors and analyzing their complex behavior involves a multitude of numerical values and hardware settings to be optimized in the process. This also requires several manual steps which becomes increasingly more difficult to manage with a growing number of pixels per detector. Therefore, the usage of artificial intelligence to help with the simplification of the process and a possible improvement of the results is planned for future investigation. Starting with a simple peak characterization for a more precise identification of false-positive trigger events, up to more demanding tasks like an auto-tuning procedure to optimize the various setting of the SQUID read-out per pixel, MMC operation gives rise to a plentitude of opportunities to utilize novel AI technologies. In this work we will present our first steps and future plans regarding potential synergies between our quantum sensor technologies and AI-based algorithms for fundamental atomic physics research.
$^1$ J. Geist, Ph.D. Thesis, Ruprecht-Karls-Universität Heidelberg, Germany (2020)
$^2$ C. Pies et al., J. Low Temp. Phys. 167 (2012) 269–279
$^3$ P. Pfäfflein et al., Physica Scripta 97 (2022) 0114005
$^4$ M.O. Herdrich et al., X-Ray Spectrometry 49 (2020) 184–187
$^5$ M.O. Herdrich et al., Atoms 11 (2023) 13
$^6$ M.O. Herdrich et al., Eur. Phys. J. D 77 (2023) 125
Jet tagging is a crucial classification task in high energy physics. Recently the performance of jet tagging has been significantly improved by the application of deep learning techniques. In this talk, we introduce a new architecture for jet tagging: the particle dual attention transformer (P-DAT). This novel transformer architecture stands out by concurrently capturing both global and local information, while maintaining computational efficiency. Regarding the self attention mechanism, we have extended the established attention mechanism between particles to encompass the attention mechanism between particle features. The particle attention module computes particle level interactions across all the particles, while the channel attention module computes attention scores between particle features, which naturally captures jet level interactions by taking all particles into account. These two kinds of attention mechanisms can complement each other. Furthermore, we incorporate both the pairwise particle interactions and the pairwise jet feature interactions in the attention mechanism. We demonstrate the effectiveness of the P-DAT architecture in classic top tagging and quark–gluon discrimination tasks, achieving competitive performance compared to other benchmark strategies.
The matrix element method is the LHC inference method of choice for limited statistics. We present a dedicated machine learning framework, based on efficient phase-space integration, a learned acceptance and transfer function. It is based on a choice of INN and diffusion networks, and a transformer to solve jet combinatorics. We showcase this setup for the CP-phase of the top Yukawa coupling in associated Higgs and single-top production.
Uncertainty quantification (UQ) is crucial for reliable predictions in inverse problems, where the model parameters are inferred from limited and noisy data. Monte Carlo methods offer a powerful approach to quantifying uncertainty in inverse problems, but their effectiveness hinges on the accuracy of the input data. This talk explores the robustness of an inverse problem methodology that utilises Monte Carlo methods for uncertainty estimate in conjunction with a dense neural network to model the Parton Distribution Functions (PDFs) , i.e. the functions that parametrise the momentum distribution of the elementary components of protons.
We employ a closure testing methodology to assess the faithfulness of the estimated uncertainties and evaluate the robustness of our fitting procedure under erroneous uncertainty estimates in the input data. Our results demonstrate the effectiveness of our methodology in handling inaccurate input uncertainty and highlight its potential for robust UQ in inverse problems.
The problem of comparing two high-dimensional samples to test the null hypothesis that they are drawn from the same distribution is a fundamental question in statistical hypothesis testing. This study presents a comprehensive comparison of various non-parametric two-sample tests, specifically focusing on their statistical power in high-dimensional settings. The tests are built from univariate tests and are selected for their computational efficiency, as they all possess closed-form expressions as functions of the marginal empirical distributions. We use toy mixture of Gaussian models with dimensions ranging from 5 to 100 to evaluate the performance of different test-statistics: mean of 1D Kolmogorov-Smirnov (KS) tests-statistics, sliced KS test-statistic, and sliced-Wasserstein distance. We also add to the comparison two recently proposed multivariate two-sample tests, namely the Fr\'echet and kernel physics distances and compare all test-statistics against a likelihood ratio test, which serves as the gold standard due to the Neyman-Pearson lemma. All tests are implemented in Python using \textsc{TensorFlow2} and made available on \textsc{GitHub} \href{https://github.com/NF4HEP/GenerativeModelsMetrics}{\faGithub}. This allows us to leverage hardware acceleration for efficient computation of the test-statistic distribution under the null hypothesis on Graphic Processing Units. Our findings reveal that while the likelihood ratio test-statistic remains the most powerful, certain non-parametric tests exhibit competitive performance in specific high-dimensional scenarios. This study provides valuable insights for practitioners in selecting the most appropriate two-sample test for evaluating generative models, thereby contributing to the broader field of model evaluation and statistical hypothesis testing.
Particle physics experiments entail the collection of large data samples of complex information. In order to produce and detect low probability processes of interest (signal), a huge number of particle collisions must be carried out. This type of experiments produces huge sets of observations where most of them are of no interest (background). For this reason, a mechanism able to differentiate rare signals buried in immense backgrounds is required. The use of Machine Learning algorithms for this task allows to efficiently process huge amounts of complex data, automate the classification of event categories and produce signal-enriched filtered datasets more suitable for subsequent physics study. Although the classification of large imbalanced datasets has been undertaken in the past, the generation of predictions with their corresponding uncertainties is quite infrequent. In particle physics, as well as in other scientific domains, point estimations are considered as an incomplete answer if uncertainties are not presented. As a benchmark, we present a real case study where we compare three methods that estimate the uncertainty of Machine Learning algorithms predictions in the identification of the production and decay of top-antitop quark pairs in collisions of protons at the Large Hadron Collider at CERN. Datasets of detailed simulations of the signal and background processes elaborated by the CMS experiment are used. Three different techniques that provide a way to quantify prediction uncertainties for classification algorithms are proposed and evaluated: dropout training in deep neural networks as approximate Bayesian inference, variance estimation across an ensemble of trained deep neural networks, and Probabilistic Random Forest. All of them exhibit an excellent discrimination power with a model uncertainty measure that turns out to be small, showing that the predictions are precise and robust.
Adversarial deep learning techniques are based on changing input distributions (adversaries), with the goal of causing false classifications when input to a deep neural network classifier. Adversaries aim to maximize the output error while only exerting minimal perturbations to the input data. Moreover, various techniques to defend against such attacks have been developed in the past. While rooted in AI Safety, adversarial deep learning offers a range of techniques that could potentially enhance high-energy physics deep learning models. Additionally, it might provide new opportunities to gather insights into the systematic uncertainties of deep neural networks. While adversarsial deep learning has triggered immense interest in the recent years in all kind of fields, its possible applications in the context of high-energy physics (HEP) have not yet been studied in detail.
In this work, we employ adversarial deep learning techniques on multiple neural networks from within the high-energy physics domain, all reconstructed using publicly available data from the CERN Open Data portal to ensure reproducibility. Through the utilization of adversarial attacks and defense techniques, we not only assess the robustness of these networks but additionally aim for the construction of HEP networks portraying larger robustness and better generalization capabilities.
The quantum-chromodynamic substructure of hadrons at the smallest scales relies critically on the accurate interpretation of abundant experimental data generated by large-scale infrastructures such as the Large Hadron Collider. Comparing a multitude of measured cross sections with the latest higher-order theory predictions, we probe the validity of the standard model of particles with unparalleled precision. This relies upon a thorough understanding of the quantum-chromodynamic substructure of hadrons, encoded by their parton distribution functions (PDFs). Given that PDFs cannot be computed from first-principles, , we need to directly constrain them from observations.
In the NNPDF approach, this constraint procedure leverages neural networks to produce an accurate and reliable fit of proton PDFs. One of the primary challenges is choosing the optimal network architecture that achieves a good accuracy and generalizability to unseen data in the same kinematic region. In this context, we introduce a sophisticated strategy for hyperparameter optimization, which is based on novel metrics that take into account the entire distribution of a Monte Carlo ensemble of fitted PDFs. This procedure becomes feasible by training thousands of neural networks in parallel using GPUs. We compare various hyper-optimization loss functions and explore their impact on the determination of the proton PDFs and the estimated fit uncertainty. Our approach also holds relevance for similar applications of supervised scientific machine learning, where the robust identification of hyperparameters poses similar challenges.
In recent years, disparities have emerged within the context of the concordance model regarding the estimated value of the Hubble constant H0 [1907.10625] using Cosmic Microwave Background (CMB) and Supernovae data (commonly referred to as the Hubble tension), the clustering σ8 [1610.04606] using CMB and weak lensing, and the curvature ΩK [1908.09139, 1911.02087] using CMB and lensing/BAO, and between CMB datasets. The study of these discrepancies between different observed datasets, which are predicted to be in agreement theoretically by a cosmological model, is called tension quantification.
We approach this problem by producing a re-usable library of machine learning emulators across a grid of cosmological models through detecting cosmological tensions between datasets from the DiRAC allocation (DP192). This library will be released at this conference as part of the package unimpeded ( https://github.com/handley-lab/unimpeded) and serve as an analogous grid to the Planck Legacy Archive (PLA), but machine learning enhanced and expanded to enable not only parameter estimation (currently available with the MCMC chains on PLA), but also allowing cosmological model comparison and tension quantification. These are implemented with piecewise normalising flows [2305.02930] as part of the package margarine [2205.12841], though alternative density estimation methods can be used. The combination of nested sampling and density estimation allows us to obtain the same posterior distributions as one would have found from a full nested sampling run over all nuisance parameters, but many orders of magnitude faster. This allows users to use the existing results of cosmological analyses without the need to re-run on supercomputers.
One of the most important challenges in High Energy Physics today is to find rare new physics signals among an abundance of Standard Model proton-proton collisions, also known as anomaly detection. Deep Learning (DL) based techniques for this anomaly detection problem are increasing in popularity [1]. One such DL technique is the Deep SVDD model [2], which shows great results when applied to the anomaly detection problem in particle physics [3]. These Deep SVDD models are relatively computational inexpensive, which is promising for real-time implementation in particle detectors such as the ATLAS detector at LHC. For the Atlas detector specifically the incoming event rate is 40MHz and is brought down to a final collection rate of 300Hz using the three staged trigger system. Therefore, any model deployed within this trigger system, especially the first level trigger, requires a high throughput.
The progress of these DL techniques could be further improved by utilizing special neuromorphic hardware, i.e. hardware specifically designed to accelerate machine learning tasks. A promising candidate is the Analog In-Memory Computing (AIMC) platform, such as the HERMES core [4]. In this hardware the Matrix Vector Multiplication (MVM), a prominent part of inference in machine learning tasks, is done in-memory. This mitigates the von Neuman bottle neck giving access to potential faster and more energy efficient computations.
In this work we investigate an implementation of Deep SVDD models on an AIMC platform for unsupervised new physics detection at 40 MHz, using a dataset specifically generated for unsupervised new physics detection at 40 MHz at the LHC [5]. First, we investigated the energy consumption and throughput of these Deep SVDD on CPUs and GPUs, which we then compared to the estimated performance of an AIMC platform [4]. We predict that the state-of-the-art AIMC is up to 1000x more energy efficient than CPUs and GPUs at a throughput that is 10x faster than CPUs and GPUs [6]. This suggest high potential for faster and more sustainable anomaly detection in fundamental physics and beyond.
[1] G. Kasieczka et al., The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics, Rep. Prog. Phys. 84, 124201 (2021), doi:10.1088/13616633/ac36b9.
[2] Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E. & Kloft, M.. (2018). Deep One-Class Classification. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4393-4402 Available from https://proceedings.mlr.press/v80/ruff18a.html.
[3] S. Caron, L. Hendriks, and R. Verheyen, “Rare and Different: Anomaly Scores from a combination of likelihood and out-of-distribution models to detect new physics at the LHC,” SciPost Phys. 12, 77 (2022).
[4] R. Khaddam-Aljameh, M. Stanisavljevic, J. Fornt Mas, G. Karunaratne, M. Brändli, F. Liu, A. Singh, S. M. Müller, U. Egger, A. Petropoulos, T. Antonakopoulos, K. Brew, S. Choi, I. Ok, F. L. Lie, N. Saulnier, V. Chan, I. Ahsan, V. Narayanan, S. R. Nandakumar, M. Le Gallo, P. A. Francese, A. Sebastian, and E. Eleftheriou, “HERMES-core—a 1.59-TOPS/mm2 PCM on 14-nm CMOS in-memory compute core using 300-ps/LSB linearized CCO-based ADCs,” IEEE Journal of Solid-State Circuits 57, 1027–1038 (2022).
[5] E. Govorkova, E. Puljak, T. Aarrestad, M. Pierini, K. A. Wo ́zniak, and J. Ngadiuba, “LHC physics dataset for unsupervised New Physics detection at 40 MHz,” Scientific Data 9, 1–7 (2022).
[6] Dominique J. Kösters, Bryan A. Kortman, Irem Boybat, Elena Ferro, Sagar Dolas, Roberto Ruiz de Austri, Johan Kwisthout, Hans Hilgenkamp, Theo Rasing, Heike Riel, Abu Sebastian, Sascha Caron, Johan H. Mentink; Benchmarking energy consumption and latency for neuromorphic computing in condensed matter and particle physics. APL Mach. Learn. 1 March 2023; 1 (1): 016101. https://doi.org/10.1063/5.0116699
The Compressed Baryonic Matter (CBM) experiment, located at the Facility for Antiproton and Ion Research (FAIR) accelerator complex in Darmstadt, Germany, aims to study the phase diagram of strongly interacting matter in the realm of high net baryon densities and moderate temperatures. The SIS-100 accelerator ring at FAIR produces accelerated beams up to the energies of about 30 GeV for protons, and 12A GeV for heavy ions. The identification of muon pairs produced via the decay of vector mesons has been identified as one of the most significant physics observable for characterizing the hot and dense matter created in the collisions. The Muon Chamber (MuCh) detector system is being built to identify the muon pairs in a background mostly populated by muons from weak decay of pions and kaons produced in the collisions.
We will report our present simulation results on the physics performance for the reconstruction of low mass vector mesons in central Au+Au collisions at the beam energy 8A GeV using various machine learning models and compare the results with the traditional dimuon analysis. We treated the decay of low-mass vector mesons into dimuons as our signal simulated using the PLUTO event generator and incorporated into background events generated by the UrQMD event generator passing through the MuCh detector system. The performance of the different machine learning algorithms to improve the reconstruction efficiency (ϵ) without compromising the Signal-to-Background ratio (S/B) will be reported. A comparison with the conventional dimuon analysis software will also be presented.
Machine Learning (ML) techniques have been employed for the high energy physics (HEP) community since the early 80s to deal with a broad spectrum of problems. This work explores the prospects of using Deep Learning techniques to estimate elliptic flow (v2) in heavy-ion collisions at the RHIC and LHC energies. A novel method is developed to process the input
observables from track-level information. The proposed DNN model is trained with Pb-Pb collisions at √sNN=5.02 TeV minimum bias events simulated with AMPT model. The predictions from the ML technique are compared to both simulation and experiment. The Deep Learning model seems to preserve the centrality and energy dependence of v2 for the LHC and RHIC energies. The DNN model is also quite successful in predicting
the pT dependence of v2. When subjected to event simulation with additional noise, the proposed DNN model still keeps the robustness and prediction accuracy intact up to a reasonable extent.
Refs.: Phys.Rev.D 107 (2023) 9, 094001, Phys.Rev.D 105 (2022) 11, 114022
One of the main challenges in astronomical imaging is getting as much signal as possible with as little noise as possible. The better the signal, the more sure one can be that the science done with the images is sound. However increasing the signal-to-noise-ratio on the detector is hard and expensive. Therefore a lot of research is focused on improving post-processing techniques to gain as much information from images as possible.
We introduce BGRem, a tool to remove the background noise for optical astronomical images. It leverages a state-of-the-art diffusion model combined with an attention U-net architecture, ensuring precise and reliable background removal without affecting the sources much. It has also shown to increase the performance of Source extractors Source localisation feature when used as a pre-processing step.
The Dark Matter Particle Explorer (DAMPE) is the largest calorimeter-based space-borne experiment. Since its launch in December 2015, DAMPE detects electrons, positrons and gamma rays from few GeV to 10 TeV, as well as protons and heavier nuclei from 10 GeV to 100 TeV. The study of galactic and extragalactic gamma-ray sources and diffuse emissions as well as the search for dark-matter signatures in the gamma-ray flux are main objectives of the DAMPE mission. In this contribution we present a convolutional neural network (CNN) model developed for the gamma-ray identification with the DAMPE calorimeter. It is shown that this method significantly outperforms all the existing algorithms, both in gamma-ray efficiency and proton rejection. Good agreement between simulation and real data is demonstrated.
The Cherenkov Telescope Array (CTA) is entering its production phase and the upcoming data will drastically improve the point source sensitivity compared to previous imaging atmospheric Cherenkov telescopes. The Galactic Plane Survey (GPS), proposed as one of the Key Science Projects for CTA observation will focus on the observation of the inner galactic region ($|b|<6^∘$).
Here we discuss our recent results from extending our Deep Learning (DL)-based pipeline, AutoSource-ID, (see the talk by F. Stoppa et.al, at this conference) for classifying gamma-ray sources with different extensions and predicting the fluxes with uncertainties. This pipeline was initially developed for detecting point sources at optical and gamma-ray wavelengths and tested using MeerLicht (optical) and Fermi-LAT (gamma-ray) data. For the CTA data, we are using Gammapy for simulation which already includes the updated instrument response functions. We test the classification and flux prediction capability for simulated gamma-ray sources lying in the inner galactic region (specifically $0^∘
The Ring Imaging Cherenkov (RICH) detector is integral to the CBM experiment's electron identification process, aiming to distinguish electrons and suppress pions in the study of dielectronic decay channels of vector mesons. This study is crucial for exploring the phase diagram of strongly interacting matter under conditions of high net baryon densities and moderate temperatures, as encountered at FAIR energies. A critical challenge for the RICH detector is the efficient identification of ring-like structures from the numerous hits on its photodetector plane. To address this, we introduce a novel ring recognition algorithm leveraging convolutional neural networks (CNNs).
Our approach involves the deployment of a segmentation-like model named Ring Center Net (RCNet). This model is adept at identifying the centers of Cherenkov rings. Subsequently, we apply a modified, parameter-reduced standard chi-square circle fitting method to accurately determine the complete ring parameters. RCNet has been rigorously tested on a custom dataset designed to emulate the hit patterns expected on the RICH detector’s photodetector plane, with a ring density surpassing that anticipated in CBM experiments. Notably, our method operates independently of prior track information, enhancing its suitability for real-time triggering applications.
One of the standout features of RCNet is its convolutional structure, offering exceptional adaptability to various input sizes. This flexibility is achieved through strategic padding, allowing the model to process input from photodetector planes with diverse geometries. Consequently, RCNet can fully utilize the data from hits on the RICH detector's photodetector plane. Moreover, the model's pixel classification approach in the center finding algorithm minimizes restrictions on the number of rings that can be processed in a single input. Empirical tests demonstrate that RCNet achieves an impressive 94% efficiency in detecting rings with over 14 hits within a 128x128 image containing up to 30 rings.
The report details the development and testing of RCNet, emphasizing its potential to significantly enhance electron identification in high-energy and heavy-ion physics experiments.
This research introduces a physics-driven graph neural network (GNN) [1] tailored for the identification and reconstruction of $\Lambda$ hyperons in the WASA-FRS [2] experiment. The reconstructed $\Lambda$ hyperons serve as calibration processes, essential for the primary objective of the experiment, namely to detect hypertritons. This GNN is based upon successfully developed machine learning algorithms by the High Energy Nuclear Physics Laboratory (HENP) at RIKEN, Japan, and it has shown to signficantly enhance the tracking and event election performances specific to $\Lambda$ hyperon studies. Generally, it leverages off-vertex decay tracks of long-lived (weakly decaying) particles, providing a competitive alternative to, or benchmark for, traditional Kalman filter approaches. Furthermore, the performance of GNN can be validated on complementary channels recently studied with HADES [3] at GSI, Germany, such as $p+p \rightarrow\Lambda + 𝐾^{0}_{𝑆} + p + π^+$ [4]. Ultimately, the success of this model enables the use of GNN across diverse experiments and physics analyses.
[1] H. Ekawa et al., Eur. Phys. J. A 59 103 (2023).
[2] T.R. Saito et al., Nature Reviews Physics 3, 803 (2021).
[3] G. Agakichiev et al., Eur. Phys. J. A 41 243–277 (2009).
[4] J. Adamczewski-Musch et al., Eur. Phys. J. A 57, 138 (2021).
The upcoming silicon-based sampling calorimeters, such as the high-granularity calorimeter of the CMS experiment, will have unprecedented granularity in both the lateral and longitudinal dimensions. We expect these calorimeters to greatly benefit from machine learning-based reconstruction techniques. With the novel idea of interpreting the multiple sampling layers of calorimeters in the $\eta$ -- $\phi$ plane as colors in an RGB image. A convolutional neural network-based object detection framework, You Only Look Once, in short YOLO, was used for particle reconstruction in a fast (~1 ms on NVIDIA RTX 4090) and efficient manner. This study goes over the excellent performance of the model in reconstructing particles, e.g., muons, electrons/photons, and their direction in the $\eta$ -- $\phi$ plane, with excellent pileup rejection at 200 pileup interactions. The presentation also goes over the future perspectives of energy reconstruction with minimal modifications.
Recent experiments with high-energy heavy ion beams challenge the current understanding of light hypernuclei (sub-atomic nuclei exhibiting strangeness), particularly the hypertriton [1,2,3,4,5,6,7,8]. This perplexing situation, known as the "hypertriton puzzle," is the focal point of our European-Japanese collaboration between CSIC – Spain, GSI-FAIR – Germany and RIKEN – Japan within the Super-FRS Experiment Collaboration and the Emulsion-ML collaboration. Employing deep learning techniques, our groundbreaking research addresses this puzzle through two distinct experiments [9]. The first involves studying hypernuclear states using heavy ion beams, where a Graph Neural Network aids in track finding, crucial for the ongoing data analysis [10]. Simultaneously, the second experiment focuses on identifying hypernuclei in nuclear emulsions irradiated by kaon beams, utilizing Mask R-CNN for object detection [11]. These efforts aim to achieve world-leading precision in measuring hypertriton lifetime and binding energy.
In the first experiment, the WASA-FRS experiment of Super-FRS Experiment Collaboration, conducted with heavy ion beams at 2 GeV/u on a fixed carbon target at GSI-FAIR, the WASA detector system and Fragment separator FRS were employed during the first quarter of 2022 [9]. The Graph Neural Network model plays a pivotal role in overcoming the challenges associated with induced reactions, particularly in the track finding procedure, given the large combinatorial background in the forward direction. Preliminary analyses with the Graph Neural Network model that we have developed demonstrate its high efficacy in finding track candidates and in estimating particle momentum and charge, providing promising insights into the hypertriton observation [10].
In the second experiment, we, the Emulsion-ML collaboration, lead the search and identification of hypernuclei with AI in nuclear emulsions irradiated by kaon beams at the J-PARC E07 experiment [12]. To efficiently analyze a substantial amount of image data, we introduced Mask R-CNN model for object detection [11]. Overcoming the scarcity of training data for rare hypernuclear events, a generative adversarial network (GAN) model was developed using Geant4 simulation data and real background images of nuclear emulsions [11]. The ongoing analysis has already led to the unique identification of events associated with hypertriton decay, and the measurement of the hypertriton binding energy is in progress [9].
This contribution will present a comprehensive approach to addressing the hypertriton puzzle via deep learning methods, offering significant contributions to the field of hypernuclei research.
[1] The STAR Collaboration, Science 328, 58 (2010).
[2] C. Rappold, et al., Nucl. Phys. A 913, 170 (2013).
[3] J. Adam, et al., for ALICE collaboration, Phys. Lett. B 754, 360 (2016).
[4] L. Adamczyk, et al., Phys. Rev. C 97, 054909 (2018)
[5] J. Chen, et al., Phys. Rep. 760, 1 (2018).
[6] S. Acharya, et al., Phys. Lett. B 797, 134905 (2019)
[7] J. Adam, et al., Nat. Phys. 16, 409 (2020).
[8] S. Acharya et al. (ALICE Collaboration) Phys. Rev. Lett. 131, 102302 (2023).
[9] T.R. Saito et al., Nature Reviews Physics 3, 803 (2021).
[10] H. Ekawa et al., Eur. Phys. J. A 59 103 (2023).
[11] A. Kasagi et al. Nucl. Instrum. Meth. A 1056 168663 (2023).
[12] H. Ekawa, et al., Prog. Theor. Exp. Phys. 2019, 021D02 (2019).
Analyses in HEP experiments often rely on large MC simulated datasets. These datasets are usually produced with full-simulation approaches based on Geant4, or exploiting parametric “fast” simulations introducing approximations and reducing the computational cost.
In the present work, we discuss a prototype of a fast simulation framework that we call “FlashSim” targeting analysis level data tiers (namely, CMS NanoAOD). This prototype is based on Machine Learning, in particular the Normalizing Flows generative model.
We present the physics results achieved with this prototype, currently simulating several physics objects collections, in terms of: 1) accuracy of object properties, 2) correlations among pairs of observables, 3) comparisons of analysis level derived quantities and discriminators between full-simulation and flash-simulation of the very same events. The speed-up obtained with such an approach is of several orders of magnitude compared to classical approaches. Because of this, when using FlashSim, the simulation bottleneck is represented by the “generator” (e.g. Pythia) step. We further investigated “oversampling” techniques, reusing the generator information of the same event by passing it multiple times through the detector simulation, in order to understand the increase in statistical precision that could be ultimately achieved. The results achieved with the current prototype show a higher physics accuracy and a lower computing cost compared to other fast simulation approaches such as CMS standard FastSim and Delphes-based simulations.
Presented is a novel method for analyzing particle identification (PID) by incorporating machine learning techniques, applied to a physics case within the fixed-target program at the LHCb experiment at CERN. Typically, a PID classifier is constructed by integrating responses from specialized subdetectors, utilizing diverse techniques to ensure redundancy and broad kinematic coverage. The efficiency of PID selections varies with several experimental observables, such as particle momentum, collision geometry, and experimental conditions. To accurately model the PID classifier distribution and address simulation imperfections, extensive calibration samples from data reconstruction and selection are essential but not always available.
In this proposed approach the PID classifier is modeled using a Gaussian Mixture Model by combining the well-established maximum-likelihood technique with state-of-the-art machine learning libraries and methods. The model parameters are determined by Multi-Layer Perceptrons, which are fed with relevant experimental features. This ensures that the PID classifier's non-trivial dependencies are learned. The presented approach has been demonstrated on a proof-of-principle physics case to match or improve detailed simulations, especially when limited calibration data is available. It is applicable to a wide range of cases involving experimental observables dependent on numerous experimental features. For the LHCb experiment's fixed-target program, this approach serves to mitigate the dominant experimental uncertainties.
Within the Compact Muon Solenoid (CMS) Collaboration, various Deep Neural Networks (DNNs) and Machine Learning (MLs) approaches have been employed to investigate the production of a new massive particle that undergoes decay into Higgs Boson pairs (HH) which further decay into a pair of b-quarks and a pair of tau leptons and discriminate the HH signal from the backgrounds.
However, these models are often complex and considered black boxes, making it challenging to interpret how the task was performed and the data analysis review process.
This work aimed therefore to provide a better understanding of how the model is working by validating an established Explainable Artificial Intelligence (AI) technique such as SHapley Additive exPlanations (SHAP), aiming for more interpretable, trustworthy models and predictions.
A data pre-processing pipeline was established to select the most important features to be used as input to the model. First highly correlated data were removed, secondly a feature selection was performed in repeated 5-fold cross-validation based on SHAP values by means of Recursive Feature Elimination (RFE) algorithm. Finally, a fine tuning of the hyperparameters of a gradient boosting algorithm (XGBoost), trained on the SHAP selected features, was done. This XGBoost model was then used to perform the classification task described at the beginning of this abstract. The selected features were later compared with those obtained from a Principal Component Analysis (PCA) performed on the original entire dataset (prior to any pre-processing steps). PCA achieves dimensionality reduction and, therefore, can be thought of as a clustering method to visualize separation of the observations belonging to the two classes (i.e., signal, background) along the principal components. Therefore, the linear combination of the features along the principal components enabled the validation and interpretation of SHAP.
The results obtained with SHAP and PCA agreed on the importance of some of the features used in the classification task. The combination of the two techniques confirmed the reliability of SHAP as an established tool, but also the potential of High Energy Physics (HEP) domain as a new technical validation tool thanks to its high-quality tabular data and well understood underlying causal theory, which might also be exploited into other fields.
Blazars are among the most powerful extragalactic sources, emitting across the entire electromagnetic spectrum, from radio to very high energy gamma-ray bands. As powerful sources of non-thermal radiation, blazars are frequently monitored using various telescopes, leading to the accumulation of substantial multi-wavelength data over different time periods. Also, over the years, the complexity of models has dramatically increased. This complexity hinders parameter exploration and makes data interpretation through model fitting challenging. I will present the pioneering effort in employing a Convolutional Neural Network (CNN) for the efficient modeling of blazar emission. By training the CNN on lepton-hadronic emission models generated from a set of models computed with the kinetic code SOPRANO, where the interaction of initial and all secondary particles is considered, the resultant CNN can accurately model the radiative signatures of electron/proton interactions in relativistic jets with high accuracy. This CNN-based approach significantly reduces computational time, thereby enabling real-time fitting to multi-wavelength (photons) and multi-messenger (neutrinos) datasets. This approach allows self-consistent modeling of blazar emission, which holds the potential for deepening our understanding of their physics. I will also present the novel Markarian Multiwavelength Datacenter (www.mmdc.am), where all the models are publicly available, allowing everyone to perform state-of-the-art, self-consistent analyses of multi-wavelength and multi-messenger data from blazar observations.
A major task in particle physics is the measurement of rare signal processes. These measurements are highly dependent on the classification accuracy of these events in relation to the huge background of other Standard Model processes. Reducing the background by a few tens of percent with the same signal efficiency can already increase the sensitivity considerably.
This study demonstrates the importance of adding physical information (and inductive biases) to these architectures. In addition to the information previously proposed for jet tagging, we add particle measures for energy-dependent particle-particle interaction strengths as predicted by the Feynman rules of the Standard Model (SM). Our work includes this information into different methods for classifying events, in particular Boosted Decision Trees, Transformer Architectures (Particle Transformer) and Graph Neural Networks (Particle Net). We find that the integration of physical information into the attention matrix (transformers) or edges (graphs) notably improves background rejection by $10\%$ to $30\%$ over baseline models (Particle Net), with about $10\%$ of this improvement directly attributable to what we call the SM interaction matrix.
The Alpha Magnetic Spectrometer-02 (AMS-02) experiment is a magnetic spectrometer on the International Space Station (ISS) that can measure the flux of particles from cosmic sources in a rigidity window ranging from GVs to a few TVs and up to at least Nickel (charge Z=28). High-precision measurements of fluxes of rare nuclei, such as Sc, Ti, and Mn, provide unique constraints to models of cosmic-ray propagation in the galaxy. These measurements are challenging because of the low abundances of such high-mass nuclei compared to their neighbors.
Manually optimized standard selection criteria have been shown to work well for low-mass nuclei but they are static and variables often have non-linear correlations. Also, they only include a very few variables. Machine learning (ML) algorithms learn from data by analyzing examples and the selection applied through them is non-linear. Various ML algorithms such as MLPs, CNNs, transformers, and XGBoost were tried and XGBoost shows better performance in terms of accuracy and speed. To build an ML model that analyzes nuclei from Li (charge Z=3) to Ni (charge Z=28), Monte Carlo (MC) simulations are used for training the ML algorithm. The data contains every nuclei species in equal abundance. After training and checking the ML model performance for overfitting and underfitting, it is applied to an MC sample with natural abundances. Tests have been performed on the rare medium-mass F nuclei. The ML model shows better purity than the standard selection for F at the same signal efficiency. The ML algorithm can suppress the background much better than the standard selections of the AMS-02 experiment. ML algorithms are often termed black boxes, therefore, to understand the approximate behavior of the model the Shapley method is used. The method shows that the algorithm makes decisions according to the underlying physics.
We report progress in using transformer models to generate particle theory Lagrangians. By treating Lagrangians as complex, rule-based constructs similar to linguistic expressions, we employ transformer architectures —proven in language processing tasks— to model and predict Lagrangians. A dedicated dataset, which includes the Standard Model and a variety of its extensions featuring various scalar and fermionic extensions, was utilized to train our transformer model from the ground up. The resulting model hopes to demonstrate initial capabilities reminiscent of large language models, including pattern recognition and the generation of consistent physical theories. The ultimate goal of this initiative is to establish an AI system capable of formulating theoretical explanations for experimental observations, a significant step towards integrating artificial intelligence into the iterative process of theoretical physics.
One of the biggest obstacles for machine learning algorithms that predict amplitudes from phase space points is the scaling with the number of interacting particles. The more particles there are in a given process, the more challenging it is for the model to provide accurate predictions for the matrix elements. We present a deep learning framework that is built to reduce the impact of this issue, based on the implementation of permutation invariance and Lorentz equivariance within the network architecture. We demonstrate how the use of both of these symmetries grants the model the necessary structure to reproduce LO and NLO amplitude distributions in a competent way for processes with multiple QCD jets in the final state. Additionally, we use Bayesian networks as a main ingredient for all studied amplitude surrogates. That way, we can perform a Bayesian analysis to understand and optimize the uncertainty on model predictions.
This study investigates the adaptation of leading classifiers, such as Transformers and Convolutional Graph Neural Networks, as anomaly detectors using different training techniques. The focus lies in their utilization with proton-proton collisions simulated by the DarkMachines collaboration, where some exotic signatures are aimed to be detected as anomalies.
Adaptations of these architectures, named Particle Transformers and ParticleNet, have been proved to be the state-of-the-art for the jet tagging task. An event-level approach is studied in this project, where the kinematical information of various physical objects is provided: jets, b-tagged jets, leptons, and photons.
Our main interest is how to turn classifiers into anomaly detectors. To this end, we have investigated three different strategies. First, the Deep Support Vector Data Description (SVDD) technique, in which the input information is encoded in a latent space of lower dimension and the loss function is computed as the distance to a centre point. In second place, the DROCC technique (Deep Robust One-Class Classification) assumes that background points lie in a low-dimensional manifold that is locally-linear and a modified distance metric is implemented as an adversarial loss function in order to identify the anomalies. The third technique consists of generating a modified background sample in which a certain amount of noise is introduced. By performing supervised learning to classify the normal and the noisy background events, the trained algorithm is able to evaluate if any event deviates from a "normal" collision.
The purpose is to investigate the anomaly detection capabilities of these architectures in comparison to more established techniques described in the Unsupervised Challenge Paper of the DarkMachines initiative. The outcome of this comparative analysis should provide valuable insights into the application and future developments of unsupervised learning techniques in high-energy physics research, specifically how to turn optimal classifiers into good anomaly detection algorithms.
Id Title Presenters
2 Anomaly aware machine learning for dark matter direct detection at DARWIN Andre Scaffidi
10 Einstein Telescope: binary black holes gravitational wave signals detection from three detectors combined data using deep learning Wathela Alhassan
16 Costless Performance Gains in Nested Sampling for Applications to AI and Gravitational Waves Metha Prathaban
17 Real-Time Detection of Low-Energy Events with 2DCNN on FPGA's for the DUNE Data Selection System Akshay Malige
22 Deep Learning for Cosmic-Ray Observatories Jonas Glombitza
26 Quantum Probabilistic Diffusion Models Andrea Cacioppo
27 Long-Lived Particles Anomaly Detection with Parametrized Quantum Circuits Simone Bordoni
28 Model compression and simplification pipelines for fast and explainable deep neural network inference in FPGAs in HEP Graziella Russo
29 Leveraging Physics-Informed Graph Neural Networks for Enhanced Combinatorial Optimization Lorenzo Colantonio
33 Simulation of Z2 model using Variational Autoregressive Network (VAN). Vaibhav Chahar
36 Machine learning for radiometer calibration in global 21cm cosmology Samuel Alan Kossoff Leeney
38 Accelerating the search for mass bumps using the Data-Directed Paradigm Fannie Bilodeau
39 Efficient Parameter Space Exploration in BSM Theories with Batched Multi-Objective Constraint Active Search Mauricio A. Diaz
42 Feature selection techniques for CR isotope identification with the AMS-02 experiment in space Marta Borchiellini
46 Artificial Intelligence techniques in KM3NeT Evangelia Drakopoulou
50 Quantum Computing for Track Reconstruction at LHCb Miriam Lucio Martinez
59 The Calorimeter Pyramid: Rethinking the design of generative calorimeter shower models Simon Schnake
63 Full-event reconstruction using CNN-based models on calibrated waveforms for the Large-Sized Telescope prototype of the Cherenkov Telescope Array Iaroslava Bezshyiko
64 Embedded Neural Networks on FPGAs for Real-Time Computation of the Energy Deposited in the ATLAS Liquid Argon Calorimeter Raphael Bertrand
65 Kicking it Off(-shell) with Direct Diffusion Sofia Palacios Schweitzer
66 CaloMan: Fast generation of calorimeter showers with density estimation on learned manifolds Humberto Reyes-Gonzalez
68 Optimizing bayesian inference in cosmology with Marginal Neural Ratio Estimation Guillermo Franco Abellan
71 Extracting Dark Matter Halo Parameters with Overheated Exoplanets María Benito
72 Estimating classical mutual information for spin systems and field theories using generative neural networks Piotr Korcyl
76 Choose Your Diffusion: Efficient and flexible way to accelerate the diffusion model dynamics in fast physics simulation Cheng Jiang
87 Searching for Dark Matter Subhalos in Astronomical Data using Deep Learning Sven Põder
91 Diffusion meets Nested Sampling David Yallup
96 Flow-based generative models for particle calorimeter simulation Claudius Krause
100 Simulation-Based Supernova Ia Cosmology Konstantin Karchev
101 Gaussian processes for managing model uncertainty in gravitational wave analyses Daniel Williams
102 Emulation by committee: faster AGN fitting Benjamin Ricketts
103 Adaptive Machine Learning on FPGAs: Bridging Simulated and Real-World Data in High-Energy Physics Marius Köppel
104 Generative models and lattice field theory Mathis Gerdes
108 Normalizing flows for jointly predicting photometry and photometric redshifts Laura Cabayol Garcia
113 Exploring the Universe with Radio Astronomy and AI Lara Alegre
114 interTwin - an interdisciplinary Digital Twin Engine for Science Kalliopi Tsolaki
115 Quantum and classical methods for ground state optimisation in quantum many-body problems Thomas Spriggs
116 ML-based Unfolding Techniques for High Energy Physics Nathan Huetsch
120 Clustering Considerations for Nested Sampling Adam Ormondroyd
124 Enhancing Robustness: BSM Parameter Inference with n1D-CNN and Novel Data Augmentation Yong Sheng Koay
128 Boosted object reconstruction with Monte-Carlo truth supervised Graph Neural Networks Jacan Chaplais
129 Importance nested sampling with normalizing flows for gravitational-wave inference Michael Williams
130 Searching gravitational wave from stellar-mass binary black hole early inspiral Xue-Ting Zhang
131 Stochastic Gravitational Wave Background Analysis with SBI James Alvey
136 Hybrid quantum graph neural networks for particle tracking in high energy physics Matteo Argenton
147 Characterizing the Fermi-LAT high-latitude sky with simulation-based inference Christopher Eckner
148 lsbi: linear simulation based inference Will Handley
149 Magnet Design Optimisation with Supervised Deep Neural Networks Florian Stummer
150 PolySwyft: a sequential simulation-based nested sampler Kilian Scheutwinkel
158 Studies on track finding algorithms based on machine learning with GPU and FPGA Maria Carnesale
169 Learning new physics with a (kernel) machine Marco Letizia
173 A Hybrid Approach to Anomaly Detection in Particle Physics Dennis Noll
180 Realtime Anomaly Detection with the CMS Level-1 Global Trigger Test Crate Sioni Summers
185 Building sparse kernel methods via dictionary learning. Expressive, regularized and interpretable models for statistical anomaly detection Gaia Grosso
188 Parameter estimation from quantum-jump data using neural networks Enrico Rinaldi
190 Physics-Informed Neural Networks for Gravitational Waves Matteo Scialpi
203 Advances in developing deep neural networks for finding primary vertices in proton-proton collisions at the LHC Simon Akar
209 Differentiable Vertex Fitting for Jet Flavour Tagging Rachel Smith
210 Fast Inference of Deep Learning Models with SOFIE Lorenzo Moneta
212 Machine Learning-based Data Compression Per Alexander Ekman
219 Machine Learning applications at the ATLAS experiment Judita Mamuzic
A particularly interesting application of autoencoders (AE) for High Energy Physics is their use as anomaly detection (AD) algorithms to perform a signal-agnostic search for new physics. This is achieved by training AEs on standard model physics and tagging potential new physics events as anomalies. The use of an AE as an AD algorithm relies on the assumption that the network better reconstructs examples it was trained on than ones drawn from a different probability distribution, i.e. anomalies. Using the search for non resonant production of semivisible jets as a benchmark, we demonstrate the tendency of AEs to generalize beyond the dataset they are trained on, hindering their performance. We show how normalized AEs, specifically designed to suppress this effect, give a sizable boost in performance. We further propose a different loss function and using the Wasserstein distance as a metric to reach the optimal performance in a fully signal-agnostic way.
Catalogs of sources have many sources with unknown physical nature. In particular, Fermi-LAT catalogs of gamma-ray sources have about one third of sources with unknown multi-wavelength counterparts. Some of the gamma-ray sources may be visible only in gamma rays, such as distant pulsars with radio jets not pointing at the observer. Machine learning algorithms provide a tool to perform a probabilistic classification of unassociated sources, which can provide information about their nature. In particular, such probabilistic classification of sources can be used for population studies of sources including not only associated but also unassociated sources. In this presentation we will illustrate the application of probabilistic classification of sources for the problem of understanding the excess of GeV gamma rays near the center of the Milky way galaxy.
In high-energy physics (HEP), neural-network (NN) based algorithms have found many applications, such as quark-flavor identification of jets in experiments like the Compact Muon Solenoid (CMS) at the Large Hadron Collider (LHC) at CERN. Unfortunately, complete training pipelines often encounter application-specific obstacles like the processing of many and large files of HEP data format such as ROOT, the data provisioning to the model, and a correct evaluation of performance.
We have developed a framework called "b-hive" that combines state-of-the-art tools for HEP data processing and training in a Python-based ecosystem. The framework uses common Python packages like law, Coffea and pytorch bundled in a conda-environment, aimed for an uncomplicated setup. Different subtasks like dataset conversion, training, and evaluation are implemented inside the workflow management system "law", making the reproduction of trainings through built-in versioning and parametrization straightforward.
The framework is designed in a modular structure so that single components can be exchanged and used through parameters, making b-hive suited for not only production tasks but also network development and optimization. Further, fundamental HEP requirements as the configuration of different physics processes, event-level information, and kinematic cuts can be specified and steered in a single configuration without touching the code itself.
Semivisible jets are a novel signature arising in Hidden Valley (HV) extensions of the SM with a confining interaction [1]. Originating from a double shower and hadronization process and containing undetectable dark bound states, semivisible jets are expected to have a substantially different radiation pattern compared to SM jets.
Unsupervised machine learning allows to learn the showering pattern of SM jets from data and successfully tag semivisible jets without relying on assumptions on the showering dynamics of the HV interaction [2]. Lund trees [3] are a natural representation of hadronic jets, encoding the full showering history. We show how a graph autoencoder can succesfully learn the Lund tree structure of SM jets and tag semivisible jets as anomalies. We furthermore propose a novel training workflow that extends the normalized autoencoder architecture [4][5] to graph networks, allowing to suppress out-of-distribution reconstruction in a fully signal-agnostic fashion by constraining the low-reconstruction-error phase space to match the support of the training data.
I will present an explainable deep learning framework for extracting new knowledge about the underlying physics of cosmological structure formation. I will focus on an application to dark matter halos, which form the building blocks of the cosmic large-scale structure and wherein galaxy formation takes place. The goal is to use an interpretable neural network to generate a compressed, “latent” representation of the data, which encodes all the relevant information about the final output of interest; the latent representation can then be interpreted using mutual information. I will show how such networks can be used to model final emergent properties of dark matter halos, such as their density profiles, and connect them to the physics that determines those properties. The results illustrate the potential for machine-assisted scientific discovery in cosmological structure formation and beyond.
Weakly supervised machine learning has emerged as a powerful tool in particle physics, enabling the classification of data without relying on extensive labeled examples. This approach holds immense potential for the identification of exotic objects in the gamma-ray sky, particularly those arising from dark matter annihilation. In this contribution, we present our methodology for exploring this potential using the most recent catalog of gamma-ray sources observed by the Fermi-Large Area Telescope. We compare supervised and unsupervised classification techniques to analyze the gamma-ray spectra of sources, aiming to identify objects of unknown astrophysical origin without prior knowledge of their nature. By employing weakly supervised learning, we seek to generalize towards more model-independent searches for exotic sources. Our results demonstrate the effectiveness of both supervised and unsupervised approaches in identifying dark-matter-like objects, while also highlighting limitations on less well-defined problems. This work paves the way for the systematic use of weakly supervised machine learning in the quest for new physics beyond the Standard Model using gamma-ray sources.
The Bayesian evidence can be used to compare and select models based on observed data. However, calculating the evidence can be computationally expensive and sometimes analytically intractable. I present a novel method for rapid computation of the Bayesian evidence based on normalizing flows that rely only on the existence of a set of independent and identically distributed samples extracted from a target posterior distribution. The proposed method has wide applicability and can be employed on the results of Markov-chain Monte Carlo sampling, simulation-based inference, or any other sampled distribution for which we have an estimate of the (unnormalized) posterior probability density. The method is shown to produce fast yet similar evidence estimation in comparison to typical sampling techniques such as Nested Sampling. Finally, I present its application in the context of gravitational-wave data analysis.
Machine learning, in its conventional form, has often been criticised for being a black box, providing outputs without a clear rationale. To obtain more interpretable results we can make use of symbolic regression (SR) which, as opposed to traditional regression techniques, goes beyond curve-fitting and attempts to determine the underlying mathematical equations that best describe the data. In this talk we will explore how SR can be used to infer closed form analytic expressions that can be exploited to improve the accuracy of phenomenological analysis at the LHC in the context of electroweak precision observables, such as W and Z production.
We present a pipeline to infer the equation of state of neutron stars from observations based on deep neural networks. In particular, using the standard (deterministic), as well as Bayesian (probabilistic) deep networks, we explore how one can infer the interior speed of sound of the star given a set of mock observations of total stellar mass, stellar radius and tidal deformability. We discuss in detail the construction of our simulated data set of stellar observables starting from the solution of the post-Newtonian gravitational equations, as well as the relevant architectures for the deep networks, along with their performance and accuracy. We further explain how our pipeline is capable to detect a possible QCD phase transition in the stellar core associated with the so--called cosmological constant. Our results show that deep networks offer a promising tool towards solving the inverse problem of neutron stars, and the accurate inference of their interior from future stellar observations.
Whilst gravitational waves from compact binary signals are well modelled, other transient signals do not not necessarily have a clearly defined waveform. Searches for these kinds of signals are often un-modelled so do not say much about the system that produced the gravitational wave. Having a method that can extract some information on the structure and dynamics of the system could be crucial in quickly finding or creating a model for that system. Here we use a normalising flow to reconstruct the masses and dynamics of the system that produced a given gravitational waveform.
Strong gravitational lensing has become one of the most important tools for investigating the nature of dark matter (DM). With a technique called gravitational imaging, the number and mass of dark matter subhaloes can be measured in strong lenses, constraining the underlying DM model.
Gravitational imaging however is an expensive method and requires adaptation in astronomy's current "big data" era. This is due to a stage of the analysis called sensitivity mapping. Here, the observation is analysed to find the smallest detectable subhalo in each pixel. This information can be used to turn a set of subhalo detections and non-detections into an inference on the dark matter model.
In this talk, we cover our previously introduced machine learning method for estimating sensitivity and its results. For example, we produced tens of thousands of sensitivity maps for simulated Euclid strong lenses (1). Our method was able to detect substructures with mass $M>10^{8.8}M_\odot$. This allowed us to forecast the number of substructure detections available in Euclid, that is, $\sim 2500$ in a cold dark matter universe.
More recently, we used the method to examine a critical systematic in substructure detection, the angular structure of the lens galaxy (2). We used an ensemble of models trained with different amounts of lens galaxy angular complexity in the training data, based on realistic HST strong lens images. We found that small perturbations beyond elliptical symmetry, typical in elliptical galaxy isophotes, were highly degenerate with dark matter substructure. The introduction of this complexity to the model reduces the area on the sky where a substructure can be detected by a factor $\sim3$.
In both cases, our work required large numbers of sensitivity maps which would not have been possible without the acceleration of the machine learning method. We finally discuss the application of our method to data from ground-based (Keck AO) and sky-based (Euclid) telescopes, and other prospects for the future.
References
1. O'Riordan C. M., Despali, G., Vegetti, S., Moliné, Á., Lovell, M., MNRAS (2023)
2. O'Riordan C. M., Vegetti, S., MNRAS (2023)
Methods for training jet taggers directly on real data are well motivated due to both the ambiguity of parton labels and the potential for mismodelled jet substructure in Monte Carlo. This talk presents a study of weakly-supervised learning applied to Z+jet and dijet events in CMS Open Data. In order to measure the discrimination power in real data, we consider three different estimates of the quark/gluon mixture fractions. These fractions are then used to train TopicFlow: a deep generative model that disentangles quark and gluon distributions from mixed datasets. We discuss the use of TopicFlow both as a generative classifier and as a way to overcome limited statistics.
Timepix4 is a hybrid pixel detector readout ASIC developed by the Medipix4 Collaboration at CERN. It consists of a matrix of about 230\,k pixels, each equipped with amplifier, discriminator and time-to-digital converter with 195 ps bin size that allows to measure both time-of-arrival and time-over-threshold of the hits. Due to its characteristics, it can be exploited in a wide range of fields, such as fundamental physics, medical imaging or dosimetry, and can be coupled to different kind of sensors.
Timepix4 can produce up to 160 Gbps of output data, so, regardless of its application, a strong software counterpart is needed for fast and efficient data processing. Beyond the data acquisition and control software, we developed a fast convolutional neural network to recognize and label different types of radiation that interact with a silicon pixel sensor bump-bonded to the Timepix4. This neural network is trained for the time being exposing the detector to natural radioactivity and allows classification of the deposited energy pattern of each track to an electron, muon, photon or alpha particle.
The network has four different convolutional layers, with different kernel size, that work in parallel. Depending on the shape and number of pixels of the clusters, one or more layers can detect different features in the track. A maximum layer then merges the four outputs, so that the relevant features captured by the different layers are combined together and a final dense layer can correctly classify the input.
The neural network was implemented in Python3, using the Tensorflow v2 module, and it ran on a CPU Intel i7-3770 @ 3.40GHz. The training has been carried out on a 2000-tracks dataset for 150 epochs, taking a total time of 10 minutes. After the 150-epochs training, the network shows an accuracy of more than 95% and a loss < 0.2 and it can analyse more than 1600 tracks per second.
Given the reduced training time, this network can be adapted and re-trained to other tracks datasets, depending on the need: for example, it can be adapted for different radiation fields or for a different sensor coupled to the Timepix4, such as other semiconductors or microchannel plates.
In the field of nuclear physics, multi-neutron detection plays a critical role in revealing specific nuclear properties(e.g. the structure of light exotic nuclei or four-neutron resonance states). However, one neutron can interact several times in different bars of neutron detector array, since it will likely pass through the detectors without losing all its energy. The phenomenon commonly called CrossTalk. Effectively distinguishing CrossTalk from real two-neutron events poses an important challenge becomes an essential problem to address. The conventional approach for eliminating CrossTalk events has been well established in previous work[1, 2], which relies on the physical relations between signal pairs in detectors, such as distance of time and space, causality condition and light output. However, many real two-neutron events will be eliminated at the same time, which results in a significant decrease in neutron detection efficiency, particularly in smaller relative energy range.
In this study, we will illustrate the application of the XGBoost algorithm based on ensemble learning and tree models to remove the CrossTalk events, using the experimental and simulation data on $^{11}$Li (p,pn)$^{8}$Li+2$n$ of the SAMURAI18 experiment at the RIKEN. XGBoost method not only enhances the efficiency of two-neutron detection but also exhibits superior interpretability compared to deep learning. Furthermore, it is well-suited for handling structured data without necessary normalization for feature inputs. The potential application of this method in the future extends to increasing the detection efficiency of multiple neutrons (triple neutrons, quadruple neutrons, etc.) or to address diverse particle identification tasks.
[1]T. Nakamura, Y. Kondo., Nucl. Instrum. Methods Phys. Res., Sect. B 376, 156 (2016) https://doi.org/10.1016/j.nimb.2016.01.003
[2] Y. Kondo, T. Tomai, and T. Nakamura, Nucl. Instrum. Methods Phys.Res., Sect. B 376, 156 (2016) https://doi.org/10.1016/j.nimb.2019.05.068
The application of modern Machine Learning (ML) techniques for anomaly detection in collider physics is a very active and prolific field, with use cases that include the exploration of physics beyond the Standard Model and the detection of faults in the experimental setup. Our primary focus is on data-quality monitoring. Within large experimental collaborations, this anomaly detection task usually relies on a large pool of rotating non-expert shifters. Their goal is to identify detector-related issues that would render data unusable for future physics analysis. The partial automation of these tasks presents an opportunity to ameliorate data collection efficiency and reduce the need for associated person power.
Challenges intensify in scenarios of rapidly changing experimental conditions, such as during the commissioning of a new detector. In this case, for an automated anomaly detection system to be useful it would have to be continuously retrained in an efficient manner. Additionally, optimization for factors beyond data-collection efficiency, such as minimizing unnecessary human interventions, introduces complexities in defining an adequate loss function. To address these challenges, we propose the application of Reinforcement Learning (RL) techniques with human feedback to the task of data-quality monitoring.
This contribution describes a simplified simulated setup designed to study the automation of data-quality monitoring in two regimes: “online” and “offline”. The "online" regime addresses real-time detection of issues in the detector during data collection, emphasizing a rapid intervention to enhance future data-collection efficiency. On the other hand, the "offline" regime centers on the classification of previously collected data as either usable or unusable.
This work aims to exploit RL algorithms within these regimes, demonstrating progress in simulations using both multi-agent and single-agent RL techniques. We present the performance obtained with different policies and outline future research directions.
The Hubble function entirely characterizes a given Friedmann-Robertson-Walker spacetime as a consequence of homogeneity and isotropy on cosmological scales. In conjunction with the gravitational field equation, it can be related to the densities of the cosmological fluids and their respective equation of state. The type Ia supernovae allow to constrain the evolution of the luminosity distance which is related to the Hubble function through a differential equation. Physics-informed neural networks can emulate this dynamical system, allowing fast predictions of luminosity distances for a given set of densities and equations of state. PINNs open the possibility of a parameter-free reconstruction of the Hubble function based on supernova Ia observations. This estimation of the Hubble function is then used to reconstruct the dark energy equation of state. Cosmic inference and reconstruction of the Hubble function with associated errors require uncertainty estimates on the network output, we investigate the impact of a heteroscedastic loss on the PINN setup.
Accelerator-based experiments in particle physics and medical experiments in neuroscience generate petabytes of data, where well-defined questions could be answered by intense computing analysis, however, new correlations may remain hidden in the huge data-sea. On the other hand, physics/neuroscience-informed AI/ML can help to discover new connections, integrating seamlessly data and theoretical models, even in partially understood, uncertain and high-dimensional contexts. During last years both scientific disciplines were explored and investigated intensively in the Wigner RCP. The methodical cross fertilization will be reported, especially focusing on applications of neural network-based regression methods in data sets from particle physics and neuroscience.
Weakly supervised methods have emerged as a powerful tool for anomaly detection at the LHC. While these methods have shown remarkable performance on specific signatures, their application in an even more model-agnostic manner requires using higher dimensional feature spaces compared to the first publications on this topic. We present two directions towards more model agnosticity, either by including more hand-crafted high-level features or by using low-level features like four-momenta. Although both directions are challenging in the weakly supervised setup, we present powerful classification architectures which can obtain the significance enhancement necessary for a potential discovery of new physics.
Dark energy has ushered in a golden age of astronomical galaxy surveys, allowing for the meticulous mapping of galaxy distributions to constrain models of dark energy and dark matter. The majority of these surveys rely on measuring galaxy redshifts through a limited set of observations in broad optical bands. While determining redshift is theoretically a straightforward machine learning problem, the practical challenge lies in the constraint of a small selected training sample, coming from spectroscopic surveys.
This challenge is particularly pronounced in the context of the PAUS survey, a 40 narrowband survey, which seeks to enhance the precision of photometric redshifts by an order of magnitude. In this study, we present a novel approach to address this limitation by leveraging transfer learning from simulations. While increasing the overall number and galaxy types, the challenge is a calibration mismatch between observed and simulated data. Different training methods are developed to address this issue.
Additionally, the Euclid satellite is engaged in an extensive sky survey, covering a significantly larger area compared to the relatively small sub area encompassed by PAUS. The weak lensing measurements require a tight control of the photo-z bias. Multi-task learning is a technique to improve neural network training from simultaneously solving multiple tasks, which is a general approach that can be applied to a wide range of problems. Employing multi-task learning and predicting both Euclid photo-z and corresponding PAUS observations, we have developed an innovative method to enhance Euclid photo-z accuracy. With this method we improve the results, but it's not completely intuitive how it works. Here, we delve into the underlying mechanisms responsible for these improvements.
Physics-Informed Neural Networks (PINNs) have gained significant attention in the field of deep learning for their ability to tackle physical scenarios, gaining significant interest since its inception in scientific literature. These networks optimize neural architectures by incorporating inductive biases derived from knowledge of physics. To embed the underlying physics, a suitable loss function is defined, encompassing the necessary physical constraints. PINNs have proven versatile in comprehending and resolving diverse physical systems, resulting in their growing popularity in machine learning research and their direct application in various scientific domains. However, to accurately represent and solve systems of differential equations with discontinuous solutions, modifications to the fundamental algorithms of PINNs are necessary.
An approach called Gradient-Annihilated Physics-Informed Neural Networks (GA-PINNs) is presented for solving partial differential equations with discontinuous solutions. GA-PINNs use a modified loss function and weighting function to ignore high gradients in physical variables. The method demonstrates excellent performance in solving Riemann problems in special relativistic hydrodynamics. Possible extensions to ultrarelativistic scenarios are considered through the implementation of an additional term to the loss function which enforces the network to treat the discontinuities as Dirac deltas in the space of gradients. The results obtained by GA-PINNs accurately describe the propagation speeds of discontinuities and outperform a baseline PINN algorithm. Moreover, GA-PINNs avoid the costly recovery of primitive variables, a drawback in grid-based solutions of relativistic hydrodynamics equations. This approach shows promise for modeling relativistic flows in astrophysics and particle physics with discontinuous solutions.
Foundation models are increasingly prominent in various physics subfields. Moreover, the application of supervised machine learning methods in astronomy suffers from scarce training data. We explore computer vision foundation models, focusing on their application to radio astronomical image data.
Specifically, we explore the unsupervised, morphological classification of radio sources through self-supervised representation learning. Our proposed three-step pipeline involves extracting image representations, identifying morphological clusters in the representation space, and ultimately classifying all data. To ensure morphological relevance in the obtained representations, we make use of saliency maps and employing tailored random augmentations.
Commencing with FRI and FRII radio sources, we uncover seven pre-existing subclasses. It is crucial to emphasize that our classification procedure, including the identification of subclasses, is entirely unsupervised.
In conclusion, we present a novel data-driven classification scheme for radio sources and highlight that utilizing pre-trained supervised classifier weights can obscure the detection of these subclasses.
In this work we demonstrate that significant gains in performance and data efficiency can be achieved moving beyond the standard paradigm of sequential optimization in High Energy Physics (HEP). We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensional embedding spaces and quantify the gains in the example usecase of searches of heavy resonances decaying via an intermediate di-Higgs to four b-jets.
In any lattice QCD based study, gauge configurations have to be generated using some form of Monte Carlo simulations. These are then used to compute physical observables. In these measurements, physical observables (like the chiral condensate or baryon number density) can be expressed as a trace of a combination of products of the inverse fermion matrix. These traces are usually estimated stochastically using the random noise method. This method requires making a choice of the number of random sources used in the computation. In principle, only in the limit of infinite such sources one sees the true physics results. Due to the finiteness of the number of sources, a systematic error is introduced in all measurements. We propose making use of an Unfolding algorithm based on a sequential neural network to learn the inverse of the transformation that takes a “true” distribution (using a sufficiently large number of random sources) to a “measured” distribution (small number of sources) in order to attempt reducing errors on measured observables.
The Bert pretraining paradigm has proven to be highly effective in many domains including natural language processing, image processing and biology. To apply the Bert paradigm the data needs to be described as a set of tokens, and each token needs to be labelled. To date the Bert paradigm has not been explored in the context of HEP. The samples that form the data used in HEP can be described as a set of particles (tokens) where each particle is represented as a continuous vector. We explore different approaches for discretising/labelling particles such that the Bert pretraining can be performed and demonstrate the utility of the resulting pretrained models on common downstream HEP tasks.
We present a newly developed code, JERALD - JAX Enhanced Resolution Approximate Lagrangian Dynamics -, that builds on the Lagrangian Deep Learning method (LDL) of Dai and Seljak (2021), improving on the time and the memory requirements of the original code. JERALD takes as input DM particle positions from a low-resolution, computationally inexpensive run of the approximate N-body simulator FastPM and, using a parametrization inspired by effective theories, reproduces 3D maps of both DM and baryonic properties such as stellar mass and neutral hydrogen number density of higher-resolution full hydrodynamical simulations.
We train the model using either the TNG Illustris simulation suite (specifically, TNG300-1) or various simulations from the Sherwood suite, which differs from TNG both in implementation and in sub-grid physics treatment. We investigate the robustness of the learnt mapping at various redshifts by training on one set of simulations and validating it on the other and, using the numerous different Sherwood simulations available, we explore the performance of the model against changes in cosmology and/or dark matter models.
We find that the model can reproduce higher resolution DM and baryonic maps with excellent agreement in the power spectra at large/intermediate scales up to $k\sim 5\div7h/$Mpc independently of the target simulation code and properties. We outline ongoing work aimed at integrating line of sight data, with a view of using our approach to produce fast, accurate and precise mock Ly-alpha data for use with upcoming surveys.
Traditionally, searches for new physics use complex computer simulations to reproduce what Standard Model processes should look like in collisions recorded by the LHC experiments. These are then compared to simulations of new-physics models (e.g. dark matter, supersymmetry, etc.).
The lack of evidence for new interactions and particles since the Higgs boson’s discovery has motivated the execution of generic searches to complement the existing rigorous, model-dependent analysis program. Unsupervised machine learning can offer a new style of analyses which is completely agnostic to types of new-physics models and to any expectations of scientists.
The application of anomaly detection to collider searches is a rapidly growing effort in the high-energy physics community [1]. Machine learning provides an excellent framework for the construction of tools that can isolate events in data solely because of their incompatibility with a background-only hypothesis. Building a tool to perform model-independent classification of collision events involves training on data events, and therefore requires the ability to cope with a lack of labels indicating whether inputs are signal or background. This distinguishes the typical supervised classification problem, where all inputs are labeled with a known origin, from the anomaly detection approach, which makes use of unsupervised (no input labels) or weakly supervised (noisy labels) training.
First application of fully unsupervised machine learning has been reported by ATLAS collaboration [2] where a VRNN is trained on jets in data to define an anomaly detection SR, which selects the X particle based solely on its substructural incompatibility with background jets will be shown.
Moreover we’ll review the status of current efforts for Anomaly Detection in Atlas collaboration. In particular, Graph Anomaly Detection (GAD) exploits innovative machine learning algorithms denoted as Graph Neural Networks, which have proved to be more efficient than standard techniques when applied to heterogeneous data naturally structured as graphs.
[1] The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics, Rep. Prog. Phys. 84 (2021) 124201
[2] Phys. Rev. D 108 (2023) 052009
Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data.
We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works.
Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-𝛼 model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.
A recent proposal suggests using autoregressive neural networks to approximate multi-dimensional probability distributions found in lattice field theories or statistical mechanics. Unlike Monte Carlo algorithms, these networks can serve as variational approximators to evaluate extensive properties of statistical systems, such as free energy.
In the case of two-dimensional systems, the numerical cost of such simulations scales like $L^6$ with increasing size $L$ of $L \times L$ system and can be reduced down to $L^3$ using a hierarchy of autoregressive neural networks.
In this poster, we will show the generalization of the two-dimensional hierarchical algorithm to three-dimensional Ising model $L \times L \times L$, which cost scales with $L^6$ instead of the expected $L^9$. We present conducted simulations on lattices of diverse sizes, including up to $16 \times 16 \times 16$ spins. We also show various algorithms that allow us to train our networks faster.
Our proposed approach improves neural network training, yielding a closer approximation of the target probability distribution, leading to a more accurate variational free energy, reduced autocorrelation time in Markov Chain Monte Carlo simulations, and decreased memory requirements through the use of a hierarchical network structure.
The next generation of observatories such as the Vera C. Rubin Observatory and Euclid are posing a massive data challenge. An obstacle we need to overcome is the inference of accurate redshifts from photometric observations that can be limited to a handful of bands. We addressed this challenge with a forward modeling framework, pop-COSMOS, calibrated by fitting a population model to observations on the photometry space. This high-dimensional fitting, complete with data-driven noise modeling and flexible selection effects, is achieved via a novel use of simulation-based inference. Sampling from our fitted model provides the full spectral energy densities (SEDs) which encode the integrated information from all the stars, gas and dust in galaxies. pop-COSMOS therefore unlocks a medium for the study of galaxy evolution science that was not possible before, as it far surpasses the scope of current spectroscopic catalogs and their wavelength coverage. Analyzing the SEDs of high volume galaxy populations sampled from pop-COSMOS will be the focus of this talk, presenting analysis on the SEDs directly and on their lower-dimensional representations constructed by unsupervised learning algorithms. First, I will demonstrate how key galaxy evolution diagnostics are captured by variational autoencoders (VAE). I will then present our work using mutual information in two directions, (i) to measure the correlations between derived quantities and direct measures from galaxy SEDs, and (ii) to interpret the compressed latent representations constructed within the VAE. (i) paves the way for robust predictions of galaxy properties when only limited observations are available. (ii) provides a path for astrophysical discovery in an interpretible way on latent spaces.
Machine learning can be a powerful tool to discover new signal types in astronomical data. In our recent study, we have applied it for the first time to search for long-duration transient gravitational waves triggered by pulsar glitches, which could yield physical insight into the mostly unknown depths of the pulsar. Other methods previously applied to search for such signals rely on matched filtering and a brute-force grid search over possible signal durations, which is sensitive but can become very computationally expensive. We have developed a new method to search for post-glitch signals on combining matched filtering with convolutional neural networks, which reaches similar sensitivities to the standard method at false-alarm probabilities relevant for practical searches, while being significantly faster. We specialize to the Vela glitch during the LIGO-Virgo O2 run, and present new upper limits on the gravitational-wave strain amplitude from the data of the two LIGO detectors for both constant-amplitude and exponentially decaying signals.
Modern simulation-based inference techniques leverage neural networks to solve inverse problems efficiently. One notable strategy is neural posterior estimation (NPE), wherein a neural network parameterizes a distribution to approximate the posterior. This approach is particularly advantageous for tackling low-latency or high-volume inverse problems. However, the accuracy of NPE varies significantly within the learned parameter space. This variability is observed even in seemingly straightforward systems like coupled-harmonic oscillators. This paper emphasizes the critical role of prior selection in ensuring the consistency of NPE outcomes. Our findings indicate a clear relationship between NPE performance across the parameter space and the number of similar samples trained on by the model. Thus, the prior should match the sample diversity across the parameter space to promote strong, uniform performance. Furthermore, we introduce a novel procedure, in which amortized and sequential NPE are combined to swiftly refine NPE predictions for individual events. This method substantially improves sample efficiency, on average from nearly 0% to 10-80% within ten minutes. Notably, our research demonstrates its real-world applicability by achieving a significant milestone: accurate and swift inference of posterior distributions for low-mass binary black hole (BBH) events with NPE.
Gravitational wave parameter estimation plays a crucial role in understanding astrophysical phenomena, yet it is often challenged by real-world noise inherent in the detection process. In this work, we use the simulation-based-inference pipeline PEREGRINE to do robust parameter estimation and tailor it to address the complexities of real noise in gravitational wave data analysis. We aim to effectively distinguish gravitational wave signals from noise artifacts, enabling accurate parameter estimation even in challenging observational conditions. We showcase the performance through rigorous validation studies and real-data applications, demonstrating its effectiveness in extracting astrophysical insights from noisy gravitational wave signals. Our work highlights the importance of developing tailored solutions to mitigate real-world noise challenges in gravitational wave astronomy and paves the way for improved parameter estimation methodologies in future observations.
We present a Machine Learning approach to perform fully Bayesian
inference of the neutron star equation of state given results from
parameter estimation from gravitational wave signals of binary neutron
star (BNS) mergers. The detection of gravitational waves from BNS merger
GW170817 during the second observing run of the ground based
gravitational wave detector network provided a new medium through which
to probe the neutron star equation of state. With the increased
sensitivity of the current and future observing runs, we expect to
detect more of such signals and therefore further constrain the equation
of state. Traditionally, equation of state inference is computationally
expensive and as such there is a need to improve analysis efficiency for
future observing runs. Our analysis facilitates both model-independent
and rapid equation of state inference to complement electromagnetic
follow-up investigation of gravitational wave events. Using a
conditional Normalising Flow, we can return O(1000) neutron star
equations of state given mass and tidal deformability samples in O(0.1)
seconds. We also discuss strategies for rapid hierarchical inference of
the dense matter equation of state from multiple gravitational wave events.
Strong gravitational lenses are a singular probe of the universe's small-scale structure --- they are sensitive to the gravitational effects of low-mass ($<10^{10} M_\odot$) halos even when these dark matter halos have no luminous counterpart. Strong-lensing analyses of dark matter structure generally rely on simulation-based inference (SBI). Modern SBI methods, which leverage neural networks as density estimators, have shown promise in extracting the halo-population signal. However, it is unclear whether the performance of these models is limited by the methodology or the information content of the data. In this study, we investigate the major suspects that could pose a methodological limitation: model complexity, optimization strategy, length of training, and training dataset size. We find that only the training set size significantly impacts model performance. Considering this, we adopt a sequential neural posterior estimation (SNPE) approach, allowing us to iteratively refine the distribution of simulated training images to better align with the observed data. SNPE nearly quadruples the information extracted from mock, Hubble Space Telescope (HST) images when compared to our best, non-sequential model. The notable improvement in constraining power enabled by our sequential approach highlights that the current strong lensing constraints are limited primarily by methodology and not the data itself. Moreover, our results emphasize the need to treat training set generation and model optimization as interconnected stages of any cosmological analysis using simulation-based inference techniques.
Modern machine learning will allow for simulation-based inference from reionization-era 21cm observations at the Square Kilometre Array. Our framework combines a convolutional summary network and a conditional invertible network through a physics-inspired latent representation. It allows for an optimal and extremely fast determination of the posteriors of astrophysical and cosmological parameters. The sensitivity to non-Gaussian information makes our method a promising alternative to the established power spectra.
Cosmic Dawn (CD) and Epoch of Reionization (EoR) are epochs of the Universe which host invaluable information about the cosmology and astrophysics of X-ray heating and hydrogen reionization. Radio interferometric observations of the 21-cm line at high redshifts have the potential to revolutionize our understanding of the Universe during this time. However, modelling the evolution of these epochs is particularly challenging due to the complex interplay of many physical processes. This makes it difficult to perform the conventional statistical analysis using the likelihood-based Markov-Chain Monte Carlo (MCMC) methods, which scales poorly with the dimensionality of the parameter space. We show how the Simulation-Based Inference through Marginal Neural Ratio Estimation (MNRE) provides a step towards evading these issues. We use 21cmFAST to model the 21-cm power spectrum during CD–EoR with a six-dimensional parameter space. With the expected thermal noise from the Square Kilometre Array, we are able to accurately recover the posterior distribution for the parameters of our model at a significantly lower computational cost than the conventional likelihood-based methods. We further show how the same training data set can be utilized to investigate the sensitivity of the model parameters over different redshifts. Our results support that such efficient and scalable inference techniques enable us to significantly extend the modelling complexity beyond what is currently achievable with conventional MCMC methods.
Recently, conditional normalizing flows have shown promise to directly approximate the posterior distribution via amortized stochastic variational inference from raw simulation data without resorting to likelihood modelling.
In this contribution, I will discuss an open-source GitHub package, "jammy_flows", a pytorch-based project which comes with many state of the art normalizing flows out of the box and is taylor-made for this physics use case. It includes normalizing flows for different manifolds like Euclidean space, intervals, the probability simplex or spheres - the latter one being in particular important for directional distribution modelling. Joint probability distributions over multiple manifolds can be easily created via an auto-regressive structure that is taken care of internally without extra work by the user. The calculation of information geometric quantities like entropy, KL-divergence based asymmetry measures and convenience functions for coverage checks are also available. Finally, I will showcase an application of conditional NFs for neutrino event reconstruction in the IceCube detector.
The Dark Matter Particle Explorer (DAMPE), a satellite-borne experiment capable of detecting gamma rays from few GeV to 10 TeV, studies the galactic and extragalactic gamma-ray sky and is at the forefront of the search for dark-matter spectral lines in the gamma-ray spectrum. In this contribution we detail the development of a convolutional neural network (CNN) model for the trajectory reconstruction of gamma rays. Four distinct models, each taking a different resolution Hough image of the DAMPE silicon-tungsten tracker converter (STK) as input, were trained with Monte-Carlo data. Their standalone and sequential-application performance was benchmarked, and a proof-of-concept with flight data was realized. The results indicate that the developed CNN is a viable approach for the gamma-ray track-reconstruction. Further studies aimed at pushing the CNN performance beyond the conventional Kalman algorithm are ongoing.
Doors open at 19:30, the event starts at 20:00
IDs Title Presenters
4 Analyzing ML-enabled Full Population Model for Galaxy SEDs with Unsupervised Learning and Mutual Information Sinan Deger
7 Quark/gluon discrimination and top tagging with dual attention transformer Daohan Wang
21 Learning the ‘Match’ Manifold to Accelerate Template Bank Generation Susanna Green
25 Optimal, fast, and robust inference of reionization-era cosmology with the 21cmPIE-INN Benedikt Schosser
40 Rapidly searching and producing Bayesian posteriors for neutron stars in gravitational wave data. Joe Bayley
41 Convolutional neural network search for long-duration transient gravitational waves from glitching pulsars Rodrigo Tenorio
44 GNN for Λ Hyperon Reconstruction in the WASA-FRS Experiment Snehankit Pattnaik
54 OmniJet: The first cross-task foundation model for particle physics Joschka Birk
55 Turning optimal classifiers into anomaly detectors Adrian Rubio Jimenez
58 Gradient-Annihilated PINNs for Solving Riemann Problems: Application to Relativistic Hydrodynamics Antonio Ferrer Sánchez
60 Increasing the model agnosticity of weakly supervised anomaly detection Marie Hein
67 Multi-class classification of gamma-ray sources and the nature of excess of GeV gamma rays near the Galactic center Dmitry Malyshev
69 Estimation of Machine Learning model uncertainty in particle physics event classifiers julia vazquez escobar
70 Robust Uncertainty Quantification in Parton Distribution Function Inference Mark Costantini
75 Symbolic regression for precision LHC physics Manuel Morales-Alvarado
78 Deep learning techniques in the study of the hypertriton puzzle Christophe Rappold
82 Next-Generation Source Analysis: AI Techniques for Data-Intensive Astronomical Observations Rodney Nicolaas Nicolaas
83 Flexible joint conditional normalizing flow distributions over manifolds: the jammy-flows toolkit Thorsten Glüsenkamp
85 Finetuning Foundation Models for Joint Analysis Optimization Lukas Heinrich
90 Calculating entanglement entropy with generative neural networks Dawid Zapolski
95 Energy-based graph autoencoders for semivisible jet tagging in the Lund representation Roberto Seidita
97 Fast and Precise Amplitude Surrogates with Bayesian and Symmetry Preserving Networks Víctor Bresó Pla
98 Galaxy redshift estimations with transfer and multi-task learning Martin Boerstad Eriksen
99 Quark/gluon tagging in CMS Open Data with CWoLa and TopicFlow Ayodele Ore
105 Generating Lagrangians for particle theories Eliel Camargo-Molina
106 Evaluating Generative Models with non-parametric two-sample tests Samuele Grossi
107 The flash-simulation of the LHCb experiment using the Lamarr framework Matteo Barbetti
110 Utilizing Artificial Intelligence Technologies for the Enhancement of X-ray Spectroscopy with Metallic-Magnetic Calorimeters Marc Oliver Herdrich
112 Applying hierarchical autoregressive neural networks for three-dimensional Ising model Mateusz Winiarski
117 End-to-End Object Reconstruction in a Sampling-Calorimeter using YOLO Pruthvi Suryadevara
118 Validating Explainable AI Techniques through High Energy Physics Data Mariagrazia Monteleone
126 Transformer-inspired models for particle track reconstruction Yue Zhao
127 Sensitivity of strong lenses to substructure with machine learning Conor O'Riordan
134 A fast convolutional neural network for online particle track recognition Viola Cavallini
139 A deep learning method for the gamma-ray identification with the DAMPE space mission Jennifer Maria Frieden
143 Flavour Tagging with Graph Neural Networks with the ATLAS experiment Walter Leinonen
145 A deep learning method for the trajectory reconstruction of gamma rays with the DAMPE space mission Parzival Nussbaum
146 Unsupervised tagging of semivisible jets with energy-based autoencoders in CMS Florian Eble
152 Precision-Machine Learning for the Matrix Element Method Theo Heimel
153 Unsupervised Classification of Radio Sources Through Self-Supervised Representation Learning Nicolas Baron Perez
163 Model selection with normalizing flows Rahul Srinivasan
164 Towards the first time ever measurement of the $gg\rightarrow ZH$ process at the LHC using Transformer networks Geoffrey Gilles
165 Next generation cosmological analysis with a re-usable library of machine learning emulators across a variety of cosmological models Dily Duan Yi Ong
172 LHC Event Generation with JetGPT Jonas Spinner
179 Machine-learning analysis of cosmic-ray nuclei data from the AMS-02 experiment Shahid Khan
182 b-hive: a modular training framework for state-of-the-art object-tagging within the python ecosystem at the CMS experiment Niclas Eich
183 FlashSim: an end-to-end fast simulation prototype using Normalizing Flow Francesco Vaselli
193 Improving Two-Neutron Detection Efficiency on the NEBULA Detector using XGBoost Algorithm Yutian Li
195 Reconstruction of Low Mass Vector Mesons via Dimuon decay channel using Machine Learning Technique for the CBM Experiment at FAIR Abhishek Kumar Sharma
201 Reconstructing the Neutron Star Equation of State with Bayesian deep learning Giulia Ventagli
202 A Neural-Network-defined Gaussian Mixture Model for particle identification in LHCb Edoardo Franzoso
204 Deep learning predicted elliptic flow of identified particles in heavy-ion collisions at the RHIC and LHC energies Gergely Gábor Barnaföldi
205 Anomaly detection search for BSM physics in ATLAS experiment at LHC Francesco Cirotto
208 Simulation Based Inference from the CD-EoR 21-cm signal Anchal Saxena
214 Deep support vector data description models on an analog in-memory computing platform for real-time unsupervised anomaly detection. Dominique Kosters
215 Application of science-informed AI in experimental particle physics and neuroscience Peter Levai
217 Tuning neural posterior estimation for gravitational wave inference Alex Kolmus
220 Using ML based Unfolding to reduce error on lattice QCD observables Simran Singh
221 Addressing Real-World Noise Challenges in Gravitational Wave Parameter Estimation with Truncated Marginal Neural Ratio Estimation Alexandra Wernersson
222 Fully Bayesian Forecasts with Neural Bayes Ratio Estimation Thomas Gessey-Jones
Anomaly detection at the LHC broadens the search for BSM effects by making no assumptions about the signal hypothesis. We employ ML to perform density estimation on raw data and use the density estimate for anomaly detection. A neural network can learn the physics content of the raw data. However, the gain in sensitivity to features of interest can be hindered by redundant information already explainable in terms of known physics. This poses the question of constructing a representation space where known symmetries are manifest and discriminative features are retained. We use contrastive learning to define a representation space invariant under pre-defined transformations and test the learned representations with an autoencoder-based OOD detection task. I will present results on tagging dark jets from jet constituents and the detection of anomalies in reconstructed events using our CLR framework.
This talk presents a comprehensive analysis of the potential role of Large Language Models (LLMs) and Question-Answering Machines (QAMs) in augmenting the field of fundamental physics, drawing upon a nuanced synthesis of insights from an interdisciplinary consortium encompassing various sub-disciplines of physics, philosophy of science, and computer science.
The primary objective of this paper is to explore possible advancements of our understanding of complex physical phenomena through the application of LLMs. This endeavour necessitates not only the development of such systems (which are currently lacking) but also their thorough evaluation and the identification of optimal use cases.
To this end, we present a detailed research agenda and roadmap. The talk argues for a collaborative paradigm in which AI development, assessment and reflection come together to critically evaluate and guide the integration of LLMs into physics research. This approach is based on the assertion that a multi-layered perspective is essential to recognize the nuanced capabilities and limitations of LLMs in fundamental physics.
How can we gain physical intuition in real-world datasets using ‘black-box’ machine learning? In this talk, I will discuss how ordered component analyses can be used to seperate, identify, and understand physical signals in astronomical datasets. We introduce Information Ordered Bottlenecks (IOBs), a neural layer designed to adaptively compress data into latent variables organized by likelihood maximization. As an nonlinear extension of Principal Component Analysis, IOB autoencoders are designed to be truncated at any bottleneck width, controlling information flow through only the most crucial latent variables. With this architecture, we show how classical neural networks can be easily extended to dynamically order latent information, revealing learned structure in multisignal datasets. We demonstrate how this methodology can be extended to structure and classify physical phenomena, discover low-dimensional symbolic expressions in high-dimensional data, and regularize implicit inference. Along the way, we present several astronomical applications including emulation of CMB power spectrum, analysis of binary black hole systems, and dimensionality reduction of galaxy properties in large cosmological simulations.
Non-Gaussian transient noise artifacts, commonly referred to as glitches, are one of the most challenging limitations in the study of gravitational-wave interferometer data due to their similarities with astrophysical sources signals in the time and frequency domains. Therefore, exploring novel methods to recover physical information from data corrupted by glitches is essential. In our work, we focus on modeling and generating glitches using deep generative algorithms. Namely, we employ a Pix2Pix-like architecture, a family of Generative Adversarial Networks for data-to-data translation. This strategy involves mapping glitches from carefully chosen auxiliary channels (uncorrelated with the physical signals) to the 'strain' (main) channel, allowing us to subtract the generated noise from the physically interesting data. In this talk, we outline our method and present some preliminary results.
Knowledge of the primordial matter density field from which the present non-linear observations formed is of fundamental importance for cosmology, as it contains an immense wealth of information about the physics, evolution, and initial conditions of the universe. Reconstructing this density field from the galaxy survey data is a notoriously difficult task, requiring sophisticated statistical methods, advanced cosmological simulators, and exploration of a multi-million-dimensional parameter space. In this talk, I will discuss how simulation-based inference with energy-based models implemented through sliced score matching allows us to tackle this problem and sequentially obtain data-constrained realisations of the primordial dark matter density field in a simulation-efficient way for general non-differentiable simulators. In addition, I will describe how graph neural networks can be used to get optimal data summaries for galaxy maps, and how our results compare to those obtained with classical likelihood-based methods such as Hamiltonian Monte Carlo.
Investigating the properties of QCD matter at extreme temperatures and densities is a fundamental objective of high energy nuclear physics. Such matter can be created in facilities like CERN and FAIR for short periods of time through heavy-ion collisions. Particularly interesting are the intermediate energy heavy-ion collision experiments such as CBM@FAIR, STAR-FXT@RHIC and experiments at NICA and HIAF which stand to benefit significantly from modern data driven AI techniques.
The talk provides a comprehensive overview of diverse applications and potential of AI in studying heavy-ion collisions, emphasizing on the exploitation of raw, detector-level, point cloud information for ultra fast, real-time analysis and mitigating biases introduced by preprocessing algorithms. PointNet-based models are employed to reconstruct crucial collision event features [1,2] and extract physics information such as the Equation of State [3] directly from detector level information of the hits and/or tracks of particles in experiments. The PointNet-based models, distinguished by their ability to handle detector data directly, are shown to be versatile tools for studying heavy-ion collisions [4]. Additionally, a novel autoregressive Point cloud generator which can perform fast simulations of heavy-ion collisions on an event-by-event basis is also introduced. This innovative tool holds promise to meet the demand for large scale simulations by future, next generation experiments [5]. The choice of point cloud data structures not only amplifies the adaptability of these models to particle/high-energy physics but also establishes them as resilient tools applicable across a diverse range of physics disciplines dealing with electronic data [6].
[1] Omana Kuttan, M., Steinheimer, J., Zhou, K., Redelbach, A., & Stoecker, H. (2020).
A fast centrality-meter for heavy-ion collisions at the CBM experiment. Physics Letters B, 811, 135872.
[2] Omana Kuttan, M., Steinheimer, J., Zhou, K., Redelbach, A., & Stoecker, H. (2021)
Deep Learning Based Impact Parameter Determination for the CBM Experiment. Particles, 4, 47-52.
[3] Omana Kuttan, M., Zhou, K., Steinheimer, J., Redelbach, A., & Stoecker, H. (2021).
An equation-of-state-meter for CBM using PointNet. Journal of High Energy Physics, 2021(10), 1-25.
[4] Omana Kuttan, M., Steinheimer, J., Zhou, K., Redelbach, A., & Stoecker, H. (2022).
Extraction of global event features in heavy-ion collision experiments using PointNet. In PoS FAIRness2022 (2023) 040, Contribution to: FAIRNESS 2022, 040
[5] Omana Kuttan, M., Steinheimer, J., Zhou, K., & Stoecker, H. (2023).
QCD Equation of State of Dense Nuclear Matter from a Bayesian Analysis of Heavy-Ion Collision Data. Physical Review Letters, 131(20), 202303.
[6] Omana Kuttan, M. (2023). Artificial intelligence in heavy-ion collisions: bridging the gap between theory and experiments (Doctoral dissertation, Universitätsbibliothek Johann Christian Senckenberg).
Gauge symmetry is fundamental to describing quantum chromodynamics on a lattice. While the local nature of gauge symmetry presents challenges for machine learning due to the vast and intricate parameter space, which involves distinct group transformations at each spacetime point, it remains a fundamental and indispensable prior in physics. Lattice gauge equivariant convolutional neural networks (L-CNNs) can be utilized to approximate any gauge-covariant function in a robust way [1].
Here we apply L-CNNs to learn the fixed point (FP) action, implicitly defined through a renormalization group transformation [2]. FP actions, designed to be free of lattice artifacts on classical gauge-field configurations, can yield quantum physical predictions with greatly reduced lattice artifacts, even on coarse lattices. Training L-CNNs, we obtain an FP action for SU(3) gauge theory in four dimensions which significantly exceeds the accuracy of previous hand-crafted parametrizations. This is a first step towards future Monte Carlo simulations that are based on machine-learned FP actions, which have the potential to avoid typical problems such as critical slowing down and topological freezing.
[1] M. Favoni, A. Ipp, D. I. Müller, D. Schuh, Phys. Rev. Lett. 128 (2022), 032003, https://doi.org/10.1103/PhysRevLett.128.032003 , https://arxiv.org/abs/2012.12901
[2] K. Holland, A. Ipp, D. I. Müller, U. Wenger, https://arxiv.org/abs/2401.06481
In recent years, deep learning algorithms have excelled in various domains, including Astronomy. Despite this success, few deep learning models are planned for online deployment in the O4 data collection run of the LIGO-Virgo-KAGRA collaboration. This is partly due to a lack of standardized software tools for quick implementation and deployment of novel ideas with confidence in production performance. Our team addressed this gap by developing ml4gw and hermes libraries. We’ll discuss how these libraries enhanced efficiency and model robustness in several applications: Aframe, a low-latency machine learning pipeline for compact binary sources of gravitational waves, and a deep learning-based denoising scheme for astrophysical gravitational waves, covering Binary Neutron Stars (BNS), Neutron Star-Black Hole (NSBH), and Binary Black Hole (BBH) events. We'll explore the potential of machine learning for real-time detection and end-to-end searches for gravitational-wave transients. We also introduce anomaly detection techniques using deep recurrent autoencoders and a semi-supervised strategy called Gravitational Wave Anomalous Knowledge (GWAK) to identify binaries, detector glitches, and hypothesized astrophysical sources emitting GWs in the LIGO-Virgo-KAGRA frequency band. We discuss how in the future these developments can lead to rapid deployment of next-generation deep learning technology for fast gravitational wave detection.
The Multi-disciplinary Use Cases for Convergent new Approaches to AI explainability (MUCCA) project is pioneering efforts to enhance the transparency and interpretability of AI algorithms in complex scientific endeavours. The presented study focuses on the role of Explainable AI (xAI) in the domain of high-energy physics (HEP). Approaches based on Machine Learning (ML) methodologies, from classical boosted decision trees to Graph Neural Nets, are considered to search for new physics models.
A set of use-cases are exploited to highlight the potential of ML, based on studies performed on the ATLAS experiment at the Large Hadron Collider (LHC). Results demonstrate there can be significant enhancements in sensitivity when using ML approaches, affirming the effectiveness of these tools in exploring a broad range of phase space and new physics models that traditional searches may not reach. Maintaining this balance is critical for consistent result interpretation and scientific rigour. The studies performed so far and presented in this talk emphasise this crucial balance in HEP between state-of-the-art ML techniques and transparency achievable through xAI.
The theory of the strong force, quantum chromodynamics, describes the proton in terms of its constituents, the quarks and gluons. A major conundrum since the formulation of QCD five decades ago has been whether heavy quarks also exist as a part of the proton wavefunction determined by non-perturbative dynamics: so-called intrinsic heavy quarks. Innumerable efforts to establish intrinsic charm in the proton have remained inconclusive. Here we present evidence for intrinsic charm [1] by exploiting a high-precision determination of the quark–gluon content of the nucleon with state-of-the-art AI techniques [2] and the largest experimental dataset ever. We confirm these findings by comparing them to recent data on Z-boson production with charm jets from the CERN's LHCb experiment. We fingerprint the properties of intrinsic charm, including a possible matter-antimatter asymmetry, and quantify the implications of this discovery for the next generation of particle and astroparticle physics experiments. We also discuss how AI techniques are instrumental to disentangle possible signals of New Physics at the LHC from phenomena associated to proton structure dynamics [3].
[1] R. D Ball et al (NNPDF Collaboration), Evidence for intrinsic charm quarks in the proton, Nature 608 (2022) no.7923, 483-487 (arXiv:2208.08372).
[2] R. D. Ball (NNPDF Collaboration), The path to proton structure at 1% accuracy, Eur. Phys. J. C 82 (2022) no.5, 428 [arXiv:2109.02653].
[3] S. Carrazza, C. Degrande, S. Iranipour, J. Rojo and M. Ubiali, Can New Physics hide inside the proton?, Phys. Rev. Lett. 123 (2019) no.13, 132001 [arXiv:1905.05215 [hep-ph]].
The Fair Universe project is building a large-compute-scale AI ecosystem for sharing datasets, training large models and hosting challenges and benchmarks. Furthermore, the project is exploiting this ecosystem for an AI challenge series focused on minimizing the effects of systematic uncertainties in High-Energy Physics (HEP), and on predicting accurate confidence intervals.
This talk will describe the challenge platform we have developed that builds on the open-source benchmark ecosystem Codabench to interface it to the NERSC HPC center and its Perlmutter system with over 7000 A100 GPUs.
This presentation will also advertise the first of our Fair Universe public challenges hosted on this platform, the Fair Universe: HiggsML Uncertainty Challenge, which will run over summer 2024.
This challenge will present participants with a much larger training dataset (than previous similar competitions) corresponding to H to tau tau cross section measurement at the Large Hadron Collider, from four-vectors of the final state. They should design an advanced analysis technique able to not just measure the signal strength but to provide a confidence interval, from which correct coverage will be evaluated automatically from pseudo-experiments. The confidence interval should include statistical uncertainty and also systematic uncertainties (concerning detector calibration, background levels etc…). It is expected that advanced analysis techniques that are able to control the impact of systematics will perform best, thereby pushing the field of uncertainty aware AI techniques for HEP and beyond.
The Codabench/NERSC platform also allows for hosting challenges also from other communities, and we also intend to make our benchmark designs available as templates so similar efforts can be easily launched in other domains.
This contribution describes work that pushes the state of the art in the ML Challenge platform; the ML benchmarks and challenge itself; and in the evaluation of uncertainty-aware methods.
For the platform we describe a system capable of operating at much larger scale than other approaches, including on large datasets and trained and evaluated on multiple GPUs in parallel. The platform also provides a leaderboard and ecosystem for long-lived benchmarks, as well as capabilities to not only evaluate different models but also test models against new datasets.
For the “Fair Universe: HiggsML Uncertainty Challenge” we provide larger datasets, with multiple systematic uncertainties applied, as well as evaluation of uncertainties as part of the challenge, performed on multiple pseudo-experiments. All of these aspects are novel to HEP ML challenges as far as we are aware.
Furthermore we will present methodological innovations including novel metrics for evaluation of uncertainty aware methods as well as improvements in uncertainty aware methods themselves.
The LHCb experiment at the Large Hadron Collider (LHC) is designed to perform high-precision measurements of heavy-hadron decays, which requires the collection of large data samples and a good understanding and suppression of multiple background sources. Both factors are challenged by a five-fold increase in the average number of proton-proton collisions per bunch crossing, corresponding to a change in the detector operation conditions for the recently started LHC Run 3. The limits in the storage capacity of the trigger have brought an inverse relation between the amount of particles selected to be stored per event and the number of events that can be recorded, and the background levels have risen due to the enlarged combinatorics. To tackle both challenges, we have proposed a novel approach, never attempted before in a hadronic collider: a Deep-learning based Full Event Interpretation (DFEI), to perform the simultaneous identification, isolation and hierarchical reconstruction of all the heavy-hadron decay chains in each event. We have developed a prototype for such an algorithm based on Graph Neural Networks. The construction of the algorithm and its current performance has recently been described in a publication [Comput.Softw.Big Sci. 7 (2023) 1, 12]. This contribution will summarise the main findings in that paper. In addition, new developments towards speeding up the inference of the algorithm will be presented, as well as novel applications of DFEI for data analysis. The applications, showcased using simulated datasets, focus on decay-mode-inclusive studies and automated methods for background suppression/characterisation.
A dedicated experimental search for a muon electric dipole moment (EDM) is being set up in PSI. This experiment will search for a muon EDM signal with a final precision of \SI{6e-23}{e \cdot cm} using the frozen-spin technique. This will be the most stringent test of the muon EDM to date, improving the current experimental limit by 3 orders of magnitude. A crucial component of the experiment is the off-axis injection of the muons into a 3T solenoid, where it will be stored with the aid of a weakly focusing magnetic field. To achieve the precision objective, it is important to maximize the muon injection efficiency. However, the injection efficiency is a function of multiple design parameters which makes simple Monte Carlo simulation techniques computationally demanding. Thus, we employ a Surrogate Model based on Polynomial Chaos Expansion (PCE) to optimize the injection efficiency as a function of the experimental design parameters and asses the model performance by utilizing regression based techniques. In this talk, we report findings from our simulation studies using PCE-based surrogate model and discuss the merits of this technique over alternative AI-based optimization methods.
19:00 - 19:30 Welcome drinks
19:30 - 21:00 Dinner
22:00 Closing