Introduction
Abstract
As the field of High Energy Particle Physics (HEPP) has begun exploring more exotic machine learning algorithms, such as Graph Neural Networks (GNNs), analyses commonly rely on pre-existing data science frameworks — including PyTorch, TensorFlow and Keras — to recast ROOT samples into an appropriate data structure. This often results in tedious and computationally expensive co-routines.
AnalysisG addresses these issues by following a similar philosophy to AnalysisTop: events and particles are treated as polymorphic objects. The framework translates ROOT n-tuples into user-defined particle and event objects, matches particles within complex decay chains, and constructs graph structures with edge, node and graph-level feature tensors ready for GNN training or inference.
For cut-based analyses the framework provides selection templates that accept event objects, perform detailed studies, and export results to ROOT n-tuples or serialised plot objects.
To facilitate fast machine learning in HEP, a self-contained sub-package called pyc (Python CUDA) implements high-performance C++ and CUDA kernels via the LibTorch API. These include \(\Delta R\), polar/Cartesian transforms, invariant-mass computation, edge/node aggregation, and analytical single/double neutrino reconstruction.
Core Modules
Module |
Description |
|---|---|
ParticleTemplate |
C++ base class for user-defined particles. Provides kinematic
properties ( |
EventTemplate |
C++ base class for user-defined physics events. Declares the ROOT
trees/branches to read ( |
GraphTemplate |
C++ base class for graph construction. Inside |
SelectionTemplate |
Template for custom cut-based event selections. Provides
|
MetricTemplate |
Template for ML evaluation metrics. Exposes |
ModelTemplate |
C++ base class for GNN model definitions. Users override |
OptimizerConfig |
Configuration struct for PyTorch optimizers (Adam, SGD, RMSprop,
Adagrad, LBFGS) and learning-rate schedulers (StepLR, CyclicLR,
ExponentialLR). Passed to |
IO |
C++ class (inheriting |
Analysis |
Top-level Python pipeline compiler. Chains
|
Meta / MetaLookup |
ATLAS dataset metadata (AMI) cache and lookup helpers. |
Plotting |
Python histogram and line-plot wrappers ( |
Tools |
Utility class (file-system, string, hashing, math helpers) used throughout the framework. |
pyc |
Self-contained C++/CUDA sub-package for HEP-specific PyTorch custom operators: \(\Delta R\), polar/Cartesian transforms, invariant-mass computation, edge/node aggregation, and neutrino reconstruction. |
Note
Verified sample statistics — as a sanity check the IO class was run against the dilepton test sample shipped with the repository:
File:
test/samples/dilepton/DAOD_TOPQ1.21955717._000001.rootTree:
nominalEvents: 1,098
Total jets (
b'nominal.jet_pt.jet_pt'): 8,161Average jets per event: 7.43 (min 4, max 14)
This result was obtained by iterating the IO class over the file and
summing len(entry[b'nominal.jet_pt.jet_pt']) across all events.
See Quick Start for a step-by-step walkthrough with code examples.
Languages and Technologies
C++20 — core engine, modules, CUDA wrappers.
Cython — Python/C++ bridge with minimal overhead.
CUDA — GPU kernels for physics computations.
LibTorch — tensor operations inside CUDA kernels.
Doxygen + Breathe + Sphinx — documentation pipeline.