30 April 2024 to 3 May 2024
Amsterdam, Hotel CASA
Europe/Amsterdam timezone

Estimation of Machine Learning model uncertainty in particle physics event classifiers

30 Apr 2024, 17:19
3m
Oxford, Hotel CASA

Oxford, Hotel CASA

Speaker

julia vazquez escobar (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas (CIEMAT))

Description

Particle physics experiments entail the collection of large data samples of complex information. In order to produce and detect low probability processes of interest (signal), a huge number of particle collisions must be carried out. This type of experiments produces huge sets of observations where most of them are of no interest (background). For this reason, a mechanism able to differentiate rare signals buried in immense backgrounds is required. The use of Machine Learning algorithms for this task allows to efficiently process huge amounts of complex data, automate the classification of event categories and produce signal-enriched filtered datasets more suitable for subsequent physics study. Although the classification of large imbalanced datasets has been undertaken in the past, the generation of predictions with their corresponding uncertainties is quite infrequent. In particle physics, as well as in other scientific domains, point estimations are considered as an incomplete answer if uncertainties are not presented. As a benchmark, we present a real case study where we compare three methods that estimate the uncertainty of Machine Learning algorithms predictions in the identification of the production and decay of top-antitop quark pairs in collisions of protons at the Large Hadron Collider at CERN. Datasets of detailed simulations of the signal and background processes elaborated by the CMS experiment are used. Three different techniques that provide a way to quantify prediction uncertainties for classification algorithms are proposed and evaluated: dropout training in deep neural networks as approximate Bayesian inference, variance estimation across an ensemble of trained deep neural networks, and Probabilistic Random Forest. All of them exhibit an excellent discrimination power with a model uncertainty measure that turns out to be small, showing that the predictions are precise and robust.

Primary authors

Dr José M. Hernándes (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas (CIEMAT)) Dr Miguel Cárdenas-Montes (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas (CIEMAT)) julia vazquez escobar (Centro de Investigaciones Energéticas Medioambientales y Tecnológicas (CIEMAT))

Presentation materials