
We demonstrate the power of our method at the finite statistics of the LHC Run 2. We can achieve a statistical stability by enlarging the size of testable data set with focusing on QCD structure effectively.
REF TAGGER FULL
In this letter, we provide a simple but a novel data preprocessing method using a Riemann sphere to utilize a full phase space by decorrelating QCD structure from a kinematics. While these kind of a boosted jet analyses provide an efficient way to identify a color structure, the constrained phase space reduces the number of available data, resulting in a low significance. Analyses of QCD color structure in the decay process of a boosted particle have been spotted as information becomes well localized in the limited phase space. Identifying the quantum chromodynamics (QCD) color structure of processes provides additional information to enhance the reach for new physics searches at the Large Hadron Collider (LHC). While the primary focus of this work remains a detailed study of interpretability of DNN-based top tagger models, it also features state-of-the art performance obtained from modified implementation of existing networks. We additionally illustrate the activity of hidden layers as Neural Activation Pattern (NAP) diagrams and demonstrate how they can be used to understand how DNNs relay information across the layers and how this understanding can help us to make such models significantly simpler by allowing effective model reoptimization and hyperparameter tuning. Our studies uncover some major pitfalls of existing xAI methods and illustrate how they can be overcome to obtain consistent and meaningful interpretation of these models. We also investigate how and why feature importance varies across different xAI metrics, how feature correlations impact their explainability, and how latent space representations encode information as well as correlate with physically meaningful quantities. We review a subset of existing such top tagger models and explore different quantitative methods to identify which features play the most important roles in identifying the top jets. In this paper we explore interpretability of DNN models designed for identifying jets coming from top quark decay in the high energy proton-proton collisions at the Large Hadron Collider (LHC). Recent developments in the methods of explainable AI (xAI) methods allow us to explore the inner workings of deep neural networks (DNNs), revealing crucial information about input-output relationships and realizing how data connects with machine learning models. Implementations are also provided for our proposed method and all reference algorithms.
REF TAGGER CODE
To simplify adaptation for various problems, we provide easy-to-follow instructions on how graph-based representations of data structures, relevant for fundamental physics, can be constructed and provide code implementations for several of them. We show that our approach reaches performance close to dedicated methods on all datasets. As showcase application, we present a simple yet flexible graph-based neural network architecture that can easily be applied to a wide range of supervised learning tasks. We discuss the design and structure and line out how additional datasets can be submitted for inclusion. While public datasets from multiple fundamental physics disciplines already exist, the common interface and provided reference models simplify future work on cross-disciplinary machine learning and transfer learning in fundamental physics. The datasets contain hadronic top quarks, cosmic-ray-induced air showers, phase transitions in hadronic matter, and generator-level histories. We introduce a Python package that provides simple and unified access to a collection of datasets from fundamental physics research-including particle physics, astroparticle physics, and hadron- and nuclear physics-for supervised machine learning studies.
