Introduction

Abstract

As the field of High Energy Particle Physics (HEPP) has begun exploring more exotic machine learning algorithms, such as Graph Neural Networks (GNNs), analyses commonly rely on pre-existing data science frameworks — including PyTorch, TensorFlow and Keras — to recast ROOT samples into an appropriate data structure. This often results in tedious and computationally expensive co-routines.

AnalysisG addresses these issues by following a similar philosophy to AnalysisTop: events and particles are treated as polymorphic objects. The framework translates ROOT n-tuples into user-defined particle and event objects, matches particles within complex decay chains, and constructs graph structures with edge, node and graph-level feature tensors ready for GNN training or inference.

For cut-based analyses the framework provides selection templates that accept event objects, perform detailed studies, and export results to ROOT n-tuples or serialised plot objects.

To facilitate fast machine learning in HEP, a self-contained sub-package called pyc (Python CUDA) implements high-performance C++ and CUDA kernels via the LibTorch API. These include \(\Delta R\), polar/Cartesian transforms, invariant-mass computation, edge/node aggregation, and analytical single/double neutrino reconstruction.

Core Modules

Module

Description

ParticleTemplate

C++ base class for user-defined particles. Provides kinematic properties (pt, eta, phi, e, px, py, pz, Mass), classification flags (is_b, is_lep, is_nu, is_add), decay-tree bookkeeping (Children, Parents), and the add_leaf / apply_type_prefix / build interface for ROOT branch mapping.

EventTemplate

C++ base class for user-defined physics events. Declares the ROOT trees/branches to read (trees, add_leaf), registers particle collections (register_particle), and provides build and CompileEvent hooks for constructing the event from raw branch data.

GraphTemplate

C++ base class for graph construction. Inside CompileEvent users call define_particle_nodes, add_node_data_feature, add_edge_truth_feature etc. to assemble the graph tensors that the ML pipeline consumes.

SelectionTemplate

Template for custom cut-based event selections. Provides dump/load serialisation, GetMetaData, InterpretROOT, and Postprocessing hooks.

MetricTemplate

Template for ML evaluation metrics. Exposes RunNames (dict), Variables (list), Postprocessing, and InterpretROOT.

ModelTemplate

C++ base class for GNN model definitions. Users override forward to fetch tensors from a graph_t object and write predictions back.

OptimizerConfig

Configuration struct for PyTorch optimizers (Adam, SGD, RMSprop, Adagrad, LBFGS) and learning-rate schedulers (StepLR, CyclicLR, ExponentialLR). Passed to Analysis.AddModel.

IO

C++ class (inheriting tools + notification) for reading CERN ROOT n-tuples. Iterable in Python; each iteration yields a dict whose keys are bytes in the format b'tree.leaf.leaf'.

Analysis

Top-level Python pipeline compiler. Chains AddSamples / AddEvent / AddGraph / AddModel / AddSelection / AddMetric registrations and launches the full pipeline with Start().

Meta / MetaLookup

ATLAS dataset metadata (AMI) cache and lookup helpers. Meta stores per-dataset fields (DSID, cross-section, generator, …); MetaLookup aggregates them and computes luminosity-weighted yields.

Plotting

Python histogram and line-plot wrappers (TH1F, TH2F, TLine, ROC) built on mplhep and boost-histogram.

Tools

Utility class (file-system, string, hashing, math helpers) used throughout the framework.

pyc

Self-contained C++/CUDA sub-package for HEP-specific PyTorch custom operators: \(\Delta R\), polar/Cartesian transforms, invariant-mass computation, edge/node aggregation, and neutrino reconstruction.

Note

Verified sample statistics — as a sanity check the IO class was run against the dilepton test sample shipped with the repository:

  • File: test/samples/dilepton/DAOD_TOPQ1.21955717._000001.root

  • Tree: nominal

  • Events: 1,098

  • Total jets (b'nominal.jet_pt.jet_pt'): 8,161

  • Average jets per event: 7.43 (min 4, max 14)

This result was obtained by iterating the IO class over the file and summing len(entry[b'nominal.jet_pt.jet_pt']) across all events.

See Quick Start for a step-by-step walkthrough with code examples.

Languages and Technologies

  • C++20 — core engine, modules, CUDA wrappers.

  • Cython — Python/C++ bridge with minimal overhead.

  • CUDA — GPU kernels for physics computations.

  • LibTorch — tensor operations inside CUDA kernels.

  • Doxygen + Breathe + Sphinx — documentation pipeline.