Structs and Enumerations

Framework-wide plain-old-data structures and enumeration types. These types are used pervasively across the C++ core and are the primary data contracts between modules (e.g. between the IO layer and the GNN training pipeline).

Enumerations

enum class data_enum

Identifies the concrete C++ type of a stored data element.

Values:

enumerator d: Scalar double.

enumerator v_d: std::vector<double>.

enumerator vv_d: std::vector<std::vector<double>>.

enumerator vvv_d: std::vector<std::vector<std::vector<double>>>.

enumerator f: Scalar float.

enumerator v_f: std::vector<float>.

enumerator vv_f: std::vector<std::vector<float>>.

enumerator vvv_f: std::vector<std::vector<std::vector<float>>>.

enumerator l: Scalar long.

enumerator v_l: std::vector<long>.

enumerator vv_l: std::vector<std::vector<long>>.

enumerator vvv_l: std::vector<std::vector<std::vector<long>>>.

enumerator i: Scalar int.

enumerator v_i: std::vector<int>.

enumerator vv_i: std::vector<std::vector<int>>.

enumerator vvv_i: std::vector<std::vector<std::vector<int>>>.

enumerator ull: Scalar unsigned long long.

enumerator v_ull: std::vector<unsigned long long>.

enumerator vv_ull: std::vector<std::vector<unsigned long long>>.

enumerator vvv_ull: std::vector<std::vector<std::vector<unsigned long long>>>.

enumerator b: Scalar bool.

enumerator v_b: std::vector<bool>.

enumerator vv_b: std::vector<std::vector<bool>>.

enumerator vvv_b: std::vector<std::vector<std::vector<bool>>>.

enumerator ui: Scalar unsigned int.

enumerator v_ui: std::vector<unsigned int>.

enumerator vv_ui: std::vector<std::vector<unsigned int>>.

enumerator vvv_ui: std::vector<std::vector<std::vector<unsigned int>>>.

enumerator c: Scalar char.

enumerator v_c: std::vector<char>.

enumerator vv_c: std::vector<std::vector<char>>.

enumerator vvv_c: std::vector<std::vector<std::vector<char>>>.

enumerator undef: Type is unknown / undefined.

enumerator unset: Type has not yet been assigned.

enum class opt_enum

Identifies the PyTorch optimizer algorithm.

Values:

enumerator adam: Adam optimizer.

enumerator adagrad: Adagrad optimizer.

enumerator adamw: AdamW optimizer.

enumerator lbfgs: L-BFGS optimizer.

enumerator rmsprop: RMSprop optimizer.

enumerator sgd: Stochastic gradient descent optimizer.

enumerator invalid_optimizer: Sentinel value indicating no valid optimizer was selected.

enum class mlp_init

Weight-initialisation schemes for torch::nn::Sequential modules.

Values:

enumerator uniform: Uniform random initialisation.

enumerator normal: Normal (Gaussian) random initialisation.

enumerator xavier_normal: Xavier normal initialisation.

enumerator xavier_uniform: Xavier uniform initialisation.

enumerator kaiming_uniform: Kaiming (He) uniform initialisation.

enumerator kaiming_normal: Kaiming (He) normal initialisation.

enum class loss_enum

Identifies the PyTorch loss function to use.

Values:

enumerator bce: Binary cross-entropy loss.

enumerator bce_with_logits: Binary cross-entropy with logits loss.

enumerator cosine_embedding: Cosine embedding loss.

enumerator cross_entropy: Cross-entropy loss.

enumerator ctc: CTC loss.

enumerator hinge_embedding: Hinge embedding loss.

enumerator huber: Huber loss.

enumerator kl_div: Kullback-Leibler divergence loss.

enumerator l1: L1 (mean absolute error) loss.

enumerator margin_ranking: Margin ranking loss.

enumerator mse: Mean squared error loss.

enumerator multi_label_margin: Multi-label margin loss.

enumerator multi_label_soft_margin: Multi-label soft margin loss.

enumerator multi_margin: Multi-margin loss.

enumerator nll: Negative log-likelihood loss.

enumerator poisson_nll: Poisson NLL loss.

enumerator smooth_l1: Smooth L1 loss.

enumerator soft_margin: Soft margin loss.

enumerator triplet_margin: Triplet margin loss.

enumerator triplet_margin_with_distance: Triplet margin with distance loss.

enumerator invalid_loss: Sentinel value indicating no valid loss was selected.

enum class scheduler_enum

Identifies the learning-rate scheduler.

Values:

enumerator steplr: Step-based learning-rate decay (StepLR).

enumerator reducelronplateauscheduler: Reduce LR on plateau scheduler.

enumerator lrscheduler: Generic LR scheduler base.

enumerator invalid_scheduler: Sentinel value indicating no valid scheduler was selected.

enum class graph_enum

Identifies which tensor slot of a graph_t object is being accessed.

Values:

enumerator data_graph: Input feature tensor at graph level.

enumerator data_node: Input feature tensor at node level.

enumerator data_edge: Input feature tensor at edge level.

enumerator truth_graph: Ground-truth label tensor at graph level.

enumerator truth_node: Ground-truth label tensor at node level.

enumerator truth_edge: Ground-truth label tensor at edge level.

enumerator edge_index: COO edge-index tensor ([2, num_edges]).

enumerator weight: Per-event weight tensor.

enumerator batch_index: Batch-assignment index tensor for batched graphs.

enumerator batch_events: Global event indices of all graphs in the batch.

enumerator pred_graph: Model prediction tensor at graph level.

enumerator pred_node: Model prediction tensor at node level.

enumerator pred_edge: Model prediction tensor at edge level.

enumerator pred_extra: Miscellaneous model prediction tensor.

enum class mode_enum

Identifies the training phase.

Values:

enumerator training: Model is in training mode.

enumerator validation: Model is in validation mode.

enumerator evaluation: Model is in evaluation (test/inference) mode.

enum class particle_enum

Identifies which kinematic or metadata attribute of a particle to read or write.

Values:

enumerator index: Particle index in the event.

enumerator pdgid: PDG Monte Carlo particle ID.

enumerator pt: Transverse momentum.

enumerator eta: Pseudorapidity.

enumerator phi: Azimuthal angle.

enumerator energy: Energy.

enumerator px: x-component of momentum.

enumerator pz: z-component of momentum.

enumerator py: y-component of momentum.

enumerator mass: Invariant mass.

enumerator charge: Electric charge.

enumerator is_b: Flag: particle is a b-quark/hadron.

enumerator is_lep: Flag: particle is a lepton.

enumerator is_nu: Flag: particle is a neutrino.

enumerator is_add: Flag: additional particle classification.

enumerator pmc: Bulk Cartesian four-momentum write-out (px, py, pz, e).

enumerator pmu: Bulk polar four-momentum write-out (pt, eta, phi, e).

Core Data Structs

bsc_t — polymorphic leaf buffer

bsc_t (base struct) is the polymorphic root of the ROOT leaf-reading hierarchy. Each instantiation holds at most one heap-allocated buffer corresponding to the concrete data_enum type discovered at runtime. All overloads of element() read the buffer at index and write into the caller-supplied pointer.

struct bsc_t

Type-erased data container that stores a single typed pointer.

bsc_t holds one pointer for each supported data type (scalars, vectors, 2-deep nested vectors, and 3-deep nested vectors of all the numeric types declared in data_enum). Exactly one pointer should be non-null at any time; the active type is recorded in the type member. The index member controls which element of a vector-typed buffer is returned by the element() accessors. flush_buffer() either zeros or deletes the active pointer depending on the clear flag. The class is the base of data_t (read buffer for ROOT branches) and variable_t (write buffer for output ROOT branches).

Subclassed by data_t, variable_t

Public Functions

bsc_t(): Default constructor. Initialises all pointers to nullptr.

virtual ~bsc_t(): Virtual destructor.

void flush_buffer(): Clear or delete the active data pointer according to bsc_t::clear.

std::string as_string()

Return the string name of the currently active type.

Returns:: Type name as a string.

std::string scan_buffer()

Scan and return a debug summary of the buffer state.

Returns:: String describing the active pointer.

data_enum root_type_translate(std::string*)

Translate a ROOT leaf-type string to the corresponding data_enum.

Parameters:: root_str – Pointer to the ROOT type string (e.g. "D" for double).
Returns:: Corresponding data_enum value.

bool element(std::vector<std::vector<std::vector<float>>> *el)

Store the given pointer and set type accordingly.

Parameters:: el – Pointer to assign.
Returns:: true on success.

bool element(std::vector<std::vector<std::vector<double>>> *el)

bool element(std::vector<std::vector<std::vector<long>>> *el)

bool element(std::vector<std::vector<std::vector<int>>> *el)

bool element(std::vector<std::vector<std::vector<bool>>> *el)

bool element(std::vector<std::vector<float>> *el)

bool element(std::vector<std::vector<double>> *el)

bool element(std::vector<std::vector<long>> *el)

bool element(std::vector<std::vector<int>> *el)

bool element(std::vector<std::vector<bool>> *el)

bool element(std::vector<float> *el)

bool element(std::vector<double> *el)

bool element(std::vector<long> *el)

bool element(std::vector<int> *el)

bool element(std::vector<char> *el)

bool element(std::vector<bool> *el)

bool element(double *el)

bool element(float *el)

bool element(long *el)

bool element(int *el)

bool element(bool *el)

bool element(unsigned long long *el)

bool element(unsigned int *el)

bool element(char *el)

template<typename T>
inline bool flush_buffer(std::vector<T> **data)

template<typename T>
inline bool flush_buffer(T **data)

Public Members

std::vector<std::vector<std::vector<unsigned long long>>> *vvv_ull = nullptr

std::vector<std::vector<std::vector<unsigned int>>> *vvv_ui = nullptr

std::vector<std::vector<std::vector<double>>> *vvv_d = nullptr

std::vector<std::vector<std::vector<long>>> *vvv_l = nullptr

std::vector<std::vector<std::vector<float>>> *vvv_f = nullptr

std::vector<std::vector<std::vector<int>>> *vvv_i = nullptr

std::vector<std::vector<std::vector<bool>>> *vvv_b = nullptr

std::vector<std::vector<std::vector<char>>> *vvv_c = nullptr

std::vector<std::vector<unsigned long long>> *vv_ull = nullptr

std::vector<std::vector<unsigned int>> *vv_ui = nullptr: Pointer to depth-2 unsigned int data, or nullptr.

std::vector<std::vector<double>> *vv_d = nullptr

std::vector<std::vector<long>> *vv_l = nullptr

std::vector<std::vector<float>> *vv_f = nullptr

std::vector<std::vector<int>> *vv_i = nullptr

std::vector<std::vector<bool>> *vv_b = nullptr

std::vector<std::vector<char>> *vv_c = nullptr

std::vector<unsigned long long> *v_ull = nullptr: Pointer to a flat vector of unsigned long longs, or nullptr.

std::vector<unsigned int> *v_ui = nullptr: Pointer to a flat vector of unsigned ints, or nullptr.

std::vector<double> *v_d = nullptr: Pointer to a flat vector of doubles, or nullptr.

std::vector<long> *v_l = nullptr: Pointer to a flat vector of longs, or nullptr.

std::vector<float> *v_f = nullptr: Pointer to a flat vector of floats, or nullptr.

std::vector<int> *v_i = nullptr: Pointer to a flat vector of ints, or nullptr.

std::vector<bool> *v_b = nullptr: Pointer to a flat vector of bools, or nullptr.

std::vector<char> *v_c = nullptr: Pointer to a flat vector of chars, or nullptr.

unsigned long long *ull = nullptr: Pointer to a scalar unsigned long long, or nullptr.

unsigned int *ui = nullptr: Pointer to a scalar unsigned int, or nullptr.

double *d = nullptr: Pointer to a scalar double, or nullptr.

long *l = nullptr: Pointer to a scalar long, or nullptr.

float *f = nullptr: Pointer to a scalar float, or nullptr.

int *i = nullptr: Pointer to a scalar int, or nullptr.

bool *b = nullptr: Pointer to a scalar bool, or nullptr.

char *c = nullptr: Pointer to a scalar char, or nullptr.

long index = 0: Current index into a vector-typed buffer (used during iteration).

bool clear = false: If true, flush_buffer deletes the pointed-to object; otherwise it only resets the value to 0 / clears the vector.

data_enum type = data_enum::unset : Active data type of this container (default data_enum::unset).

data_t — single ROOT leaf accessor

data_t extends bsc_t with ROOT bookkeeping (TLeaf/TBranch/TTree pointers, file path, leaf type string) and sequential iteration helpers.

struct data_t : public bsc_t 

Represents a single ROOT branch/leaf with its associated metadata and typed data buffer.

Inherits bsc_t to store the branch data in a type-safe way. A data_t instance keeps references to the TFile, TTree, TBranch, and TLeaf it reads from and iterates over events via next().

Public Functions

data_t(): Default constructor.

~data_t() override: Destructor. Releases the typed data buffer.

void initialize(): Open the ROOT file and prepare the branch for reading.

void flush(): Clear the internal data buffer.

bool next()

Advance to the next event entry.

Returns:: true if a next entry was found; false at end of file.

Public Members

std::string leaf_name = "": Name of the ROOT TLeaf.

std::string branch_name = "": Name of the ROOT TBranch.

std::string tree_name = "": Name of the ROOT TTree.

std::string leaf_type = "": ROOT leaf type string (e.g. "D" for double, "F" for float).

std::string path = "": File path from which this branch is read.

std::string *fname = nullptr: Pointer to the current filename (not owned).

TLeaf *leaf = nullptr: Pointer to the associated ROOT TLeaf object.

TBranch *branch = nullptr: Pointer to the associated ROOT TBranch object.

TTree *tree = nullptr: Pointer to the ROOT TTree that contains this branch.

TFile *file = nullptr: Pointer to the ROOT TFile.

int file_index = 0: Index into the files_s / files_i / files_t arrays for multi-file iteration.

std::vector<std::string> *files_s = nullptr: Pointer to the list of file paths for multi-file reading.

std::vector<long> *files_i = nullptr: Pointer to per-file event counts.

std::vector<TFile*> *files_t = nullptr: Pointer to the list of open TFile objects.

element_t — per-event leaf handle map

element_t is handed to particle_template::build() and event_template::build(). The get<T>(key, ptr) template method looks up the named data_t and copies the current element into *ptr via the appropriate bsc_t::element() overload.

struct element_t

Aggregates all data_t branches for one event, providing a named-key access interface.

Public Functions

bool next()

Advance all branches to the next event.

Returns:: true if successful.

void set_meta(): Update metadata (filename, event index) from the active branches.

bool boundary()

Check whether the current entry is on a file boundary.

Returns:: true if the current entry is the first of a new file.

template<typename g> inline bool get(std::string key, g *var)

Retrieve the value stored under key and place it in var.

Template Parameters:

g – Target type; must match the leaf type or an abort is triggered.

Parameters:

key – Leaf/branch name used as the lookup key.
var – Pointer to the variable to fill.

Returns:

true on success; aborts on type mismatch.

Public Members

std::string tree = "": Name of the ROOT TTree this element belongs to.

long event_index = -1: Sequential event index (default -1 = not yet set).

std::string filename = "": Path of the file currently being read.

std::map<std::string, data_t*> handle = {}: Map of branch/leaf names to their data_t objects.

write_t / writer — ROOT output helpers

struct write_t

Groups a ROOT TFile, TTree, and meta_t pointer together with a map of named variable_t write buffers.

Public Functions

variable_t *process(std::string *name)

Look up or create the variable_t for name.

Parameters:: name – Variable name.
Returns:: Pointer to the corresponding variable_t.

void write(): Fill the current TTree entry from all registered variable_t buffers.

void create(std::string tr_name, std::string path)

Open the output ROOT file at path and create a TTree named tr_name.

Parameters:

tr_name – TTree name.
path – Output file path.

void close(): Write and close the output ROOT file.

Public Members

TFile *file = nullptr: Output ROOT TFile.

TTree *tree = nullptr: Output ROOT TTree.

meta_t *mtx = nullptr: Pointer to the metadata for this file.

std::map<std::string, variable_t*> *data = nullptr: Map of variable names to their variable_t write buffers.

struct writer

High-level helper that manages multiple write_t objects keyed by TTree name.

Public Functions

writer(): Default constructor.

~writer(): Destructor. Closes all open files.

void create(std::string *pth)

Open the output ROOT file at the given path.

Parameters:: pth – Pointer to the file path string.

void write(std::string *tree)

Fill one TTree entry for the tree named tree.

Parameters:: tree – Pointer to the TTree name.

template<typename g> inline void process(std::string *tree, std::string *name, g *t)

Write value t to branch name in TTree tree.

Template Parameters:

g – Value type.

Parameters:

tree – Pointer to the TTree name.
name – Pointer to the branch/variable name.
t – Pointer to the value to write.

Event, Particle, and Graph Payload Structs

particle_t — raw kinematic payload

particle_t carries the floating-point kinematics and integer metadata for a single particle. particle_template stores one of these as its internal representation.

struct particle_t

Raw particle kinematics and classification flags.

Raw kinematic and identification data for a single particle.

Public Members

double e = -0.000000000000001: Energy in MeV (default -1e-15 to signal “unset”).

double mass = -1: Invariant mass in MeV (default -1).

double px = 0: Cartesian x-component of three-momentum.

double py = 0: Cartesian y-component of three-momentum.

double pz = 0: Cartesian z-component of three-momentum.

double pt = 0: Transverse momentum.

double eta = 0: Pseudorapidity.

double phi = 0: Azimuthal angle in radians.

bool cartesian = false: true if Cartesian (px, py, pz, e) coordinates are valid.

bool polar = false: true if polar (pt, eta, phi, e) coordinates are valid.

double charge = 0: Electric charge.

int pdgid = 0: PDG Monte Carlo particle identifier.

int index = -1: Index of this particle within its parent event container.

std::string type = "": String label identifying the particle type (e.g. "top", "b").

std::string hash = "": Unique hash string for this particle instance.

std::string symbol = "": LaTeX-style or text symbol for the particle (e.g. "t", "b").

std::vector<int> lepdef = {11, 13, 15}: PDG IDs considered as leptons for is_lep classification (default {11, 13, 15}).

std::vector<int> nudef = {12, 14, 16}: PDG IDs considered as neutrinos for is_nu classification (default {12, 14, 16}).

std::map<std::string, bool> children = {}: Map of child particle hashes to a presence flag.

std::map<std::string, bool> parents = {}: Map of parent particle hashes to a presence flag.

std::map<std::string, particle_template*> *data_p = nullptr: Pointer to the event-level particle map; used during event building.

event_t — event identity

event_t is embedded in every event_template and carries the event index, weight, ROOT tree name and the unique hash used for graph caching.

struct event_t

Minimal event identification and state container.

Public Members

std::string name = "": Human-readable name of the event type.

double weight = 1: Event weight (default 1.0).

long index = -1: Sequential event index within the file (default -1 = unset).

std::string hash = "": Unique hash string identifying this event.

std::string tree = "": Name of the ROOT TTree from which this event was read.

graph_t — GNN tensor container

graph_t is the central data structure passed between the graph builder, the dataloader, and the model_template::forward() method. It stores batched PyTorch tensors for node/edge/graph data features, truth features, the COO edge index, and batching meta-data.

struct graph_t

Runtime container for a single graph’s input features, truth labels, edge index, and device-resident tensors.

Stores input features (data_graph/, truth labels (truth_graph/, a COO edge-index tensor, an event-weight tensor, and a batch-index tensor. All tensors are cached per CUDA device index (in the dev_* maps) so that multi-GPU training can transfer data without repeated host-to-device copies.

The has_feature(graph_enum, name, dev) method is the unified lookup entry-point for all feature categories. Friends graph_template and dataloader have write access to the private tensor storage. The in_use counter is managed by dataloader to implement a simple object pool.

Public Functions

template<typename g> inline torch::Tensor *get_truth_graph(std::string _name, g *mdl)

Retrieve the truth graph-level tensor named _name on the device of model mdl.

Template Parameters:

g – A model type with a device_index member.

Parameters:

_name – Feature name.
mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr if not found.

template<typename g> inline torch::Tensor *get_truth_node(std::string _name, g *mdl)

Retrieve the truth node-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:

_name – Feature name.
mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g> inline torch::Tensor *get_truth_edge(std::string _name, g *mdl)

Retrieve the truth edge-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:

_name – Feature name.
mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g> inline torch::Tensor *get_data_graph(std::string _name, g *mdl)

Retrieve the input data graph-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:

_name – Feature name.
mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g> inline torch::Tensor *get_data_node(std::string _name, g *mdl)

Retrieve the input data node-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:

_name – Feature name.
mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g> inline torch::Tensor *get_data_edge(std::string _name, g *mdl)

Retrieve the input data edge-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:

_name – Feature name.
mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g> inline torch::Tensor *get_edge_index(g *mdl)

Retrieve the COO edge-index tensor for model mdl's device.

Template Parameters:: g – A model type with a device_index member.
Parameters:: mdl – Pointer to the model.
Returns:: Pointer to the edge-index tensor, or nullptr.

template<typename g> inline torch::Tensor *get_event_weight(g *mdl)

Retrieve the event-weight tensor.

Template Parameters:: g – A model type with a device_index member.
Parameters:: mdl – Pointer to the model.
Returns:: Pointer to the weight tensor.

template<typename g> inline torch::Tensor *get_batch_index(g *mdl)

Retrieve the batch-assignment index tensor.

Template Parameters:: g – A model type with a device_index member.
Parameters:: mdl – Pointer to the model.
Returns:: Pointer to the batch-index tensor.

template<typename g> inline torch::Tensor *get_batched_events(g *mdl)

Retrieve the global event-index tensor for the batch.

Template Parameters:: g – A model type with a device_index member.
Parameters:: mdl – Pointer to the model.
Returns:: Pointer to the tensor.

torch::Tensor *has_feature(graph_enum tp, std::string _name, int dev)

Generic feature lookup by type, name, and device index.

Parameters:

tp – Feature category (graph_enum).
_name – Feature name.
dev – Device index.

Returns:

Pointer to the tensor, or nullptr.

void add_truth_graph(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps): Register truth graph-level tensors from data with index map maps.

void add_truth_node(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps): Register truth node-level tensors.

void add_truth_edge(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps): Register truth edge-level tensors.

void add_data_graph(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps): Register input data graph-level tensors.

void add_data_node(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps): Register input data node-level tensors.

void add_data_edge(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps): Register input data edge-level tensors.

void transfer_to_device(torch::TensorOptions *dev)

Copy all tensors to the device described by dev.

Parameters:: dev – Device options.

void _purge_all(): Delete all owned tensor data.

Public Members

int num_nodes = 0: Number of nodes in this graph.

long event_index = 0: Global event index.

double event_weight = 1: Event weight (default 1.0).

bool preselection = false: Result of the graph’s pre-selection flag.

std::vector<long> batched_events = {}: Global event indices of all graphs combined in the batch.

std::vector<std::string*> batched_filenames = {}: Source filenames of all graphs combined in the batch.

std::string *hash = nullptr: Pointer to the event hash string (not owned).

std::string *filename = nullptr: Pointer to the source file path (not owned).

std::string *graph_name = nullptr: Pointer to the graph type name (not owned).

c10::DeviceType device = c10::kCPU: Device on which data tensors reside (default CPU).

int in_use = 1: Reference count / in-use flag for the dataloader pool.

folds_t — k-fold assignment

struct folds_t

Associates an event hash with its k-fold split membership.

Public Functions

inline void flush_data(): Free the heap-allocated hash string and set it to nullptr.

Public Members

int k = -1: K-fold index to which this event belongs (default -1 = unset).

bool is_train = false: true if this event is assigned to the training set.

bool is_valid = false: true if this event is assigned to the validation set.

bool is_eval = false: true if this event is assigned to the evaluation (test) set.

char *hash = nullptr: Null-terminated C string holding the event hash.

graph_hdf5 / graph_hdf5_w — HDF5 serialisation records

struct graph_hdf5

std::string-based representation of a serialised graph for HDF5 I/O.

All tensor and map data are encoded as base-64 strings.

Public Members

int num_nodes = -1: Number of nodes in the graph.

double event_weight = 1: Event weight.

long event_index = -1: Global event index.

std::string hash: Event hash.

std::string filename: Source file path.

std::string edge_index: Serialised edge-index tensor.

std::string data_map_graph: Serialised data-feature name-to-index map (graph level).

std::string data_map_node: Serialised data-feature name-to-index map (node level).

std::string data_map_edge: Serialised data-feature name-to-index map (edge level).

std::string truth_map_graph: Serialised truth-feature name-to-index map (graph level).

std::string truth_map_node: Serialised truth-feature name-to-index map (node level).

std::string truth_map_edge: Serialised truth-feature name-to-index map (edge level).

std::string data_graph: Serialised data-feature tensors (graph level).

std::string data_node: Serialised data-feature tensors (node level).

std::string data_edge: Serialised data-feature tensors (edge level).

std::string truth_graph: Serialised truth-label tensors (graph level).

std::string truth_node: Serialised truth-label tensors (node level).

std::string truth_edge: Serialised truth-label tensors (edge level).

struct graph_hdf5_w

C-string (char*) variant of graph_hdf5 used for direct HDF5 write operations.

All string fields are heap-allocated null-terminated C strings. Call flush_data() to release them.

Public Functions

void flush_data(): Free all heap-allocated char* members and set them to nullptr.

Public Members

int num_nodes = -1: Number of nodes in the graph.

double event_weight = 1: Event weight.

long event_index = -1: Global event index.

char *hash = nullptr: Null-terminated event hash string.

char *filename = nullptr: Null-terminated source file path.

char *edge_index = nullptr: Null-terminated serialised edge-index tensor.

char *data_map_graph = nullptr

char *data_map_node = nullptr

char *data_map_edge = nullptr

char *truth_map_graph = nullptr

char *truth_map_node = nullptr

char *truth_map_edge = nullptr

char *data_graph = nullptr

char *data_node = nullptr

char *data_edge = nullptr

char *truth_graph = nullptr

char *truth_node = nullptr

char *truth_edge = nullptr

Settings and Configuration Structs

settings_t — global analysis settings

settings_t is the POD configuration object stored inside analysis (and accessible from Python as properties). All Analysis.* properties map onto fields of this struct.

struct settings_t

Global analysis run settings.

Aggregated run configuration covering I/O paths, ML hyperparameters, and plotting settings.

Passed directly to the analysis class via analysis::m_settings and propagated to dataloader, optimizer, metrics, and io.

The I/O fields (output_path, run_name, sow_name, metacache_path) control where results are written and how the sum-of-weights histogram is named.

The ML fields (epochs, kfolds, batch_size, kfold, num_examples, train_size, training, validation, evaluation, continue_training, training_dataset, graph_cache) configure the k-fold cross-validation loop, mini-batch size, which phases to run, whether to resume from a checkpoint, and paths to pre-built graph caches.

The plotting fields (var_pt, var_eta, var_phi, var_energy, targets, nbins, max_range, logy) configure auto-generated invariant-mass histograms.

The execution fields (threads, intra_th, debug_mode, build_cache, selection_root) control parallelism, intra-op thread count, verbose output, and output format.

Public Members

std::string output_path = "./ProjectName": Root output directory for plots, models, and selection files (default "./ProjectName").

std::string run_name = "": Optional tag appended to output directories for this run.

std::string sow_name = "": Name of the sum-of-weights histogram in the ROOT file.

std::string metacache_path = "./": Path to the directory used for caching AMI metadata (default "./").

bool fetch_meta = false: If true, query AMI for dataset metadata; otherwise use cache.

bool pretagevents = false: If true, pre-tag events before building graphs.

int epochs = 10: Number of training epochs (default 10).

int kfolds = 10: Number of k-fold splits (default 10).

int batch_size = 1: Minibatch size for training (default 1).

std::vector<int> kfold = {}: Explicit list of k-fold indices to run; empty means run all.

int num_examples = 3: Number of example graphs displayed during progress reporting (default 3).

float train_size = 50: Percentage of data used for training (default 50).

bool training = true: Enable the training phase (default true).

bool validation = true: Enable the validation phase (default true).

bool evaluation = true: Enable the evaluation phase (default true).

bool continue_training = true: If true, resume training from the last saved checkpoint (default true).

std::string training_dataset = "": Path to a pre-built graph dataset for training.

std::string graph_cache = "": Path to a graph cache directory.

std::string var_pt = "pt": Leaf name of the transverse-momentum variable (default "pt").

std::string var_eta = "eta": Leaf name of the pseudorapidity variable (default "eta").

std::string var_phi = "phi": Leaf name of the azimuthal angle variable (default "phi").

std::string var_energy = "energy": Leaf name of the energy variable (default "energy").

std::vector<std::string> targets = {}: List of variable names to use as training targets.

int nbins = 400: Number of histogram bins (default 400).

int max_range = 400: Maximum axis range for mass histograms (default 400 GeV).

bool logy = false: Use logarithmic y-axis on histograms (default false).

int threads = 10: Number of parallel worker threads (default 10).

int intra_th = -1: Number of intra-op threads for PyTorch (default -1 = use system default).

bool debug_mode = false: Enable verbose debug output (default false).

bool build_cache = false: If true, build and save a local graph cache (default false).

bool selection_root = false: If true, write selection output to ROOT files (default false).

model_settings_t — per-model ML configuration

model_settings_t is populated by model_template and carries optimizer choice, I/O feature maps, weight/tree names, and device info.

struct model_settings_t

Snapshot of a model_template configuration suitable for serialisation and transfer between objects.

Public Members

opt_enum e_optim: Optimizer type enum.

std::string s_optim: Optimizer type as a string (e.g. "adam").

std::string weight_name: Name of the event-weight leaf in the ROOT tree.

std::string tree_name: Name of the ROOT TTree to read from.

std::string model_name: Human-readable model name.

std::string model_device: PyTorch device string (e.g. "cpu" or "cuda:0").

std::string model_checkpoint_path: Path where model checkpoints are saved.

bool inference_mode: true when the model is used for inference only.

bool is_mc: true if the dataset is Monte Carlo simulation.

std::map<std::string, std::string> o_graph: Output feature names to loss-function names (graph level).

std::map<std::string, std::string> o_node: Output feature names to loss-function names (node level).

std::map<std::string, std::string> o_edge: Output feature names to loss-function names (edge level).

std::vector<std::string> i_graph: Names of requested input graph-level features.

std::vector<std::string> i_node: Names of requested input node-level features.

std::vector<std::string> i_edge: Names of requested input edge-level features.

loss_opt — loss function options

struct loss_opt

Options controlling the behaviour of a loss function.

Public Members

loss_enum fx = loss_enum::invalid_loss : Loss function type (default invalid_loss).

bool mean = false: Use torch::kMean reduction.

bool sum = false: Use torch::kSum reduction.

bool none = false: Use torch::kNone reduction.

bool swap = false: Enable the swap option (TripletMargin).

bool full = false: Enable the full option (KLDiv).

bool batch_mean = false: Use torch::kBatchMean reduction (KLDiv).

bool target = false: Enable log_target (KLDiv).

bool zero_inf = false: Enable zero_infinity (CTC).

bool defaults = true: If true, use the loss function’s default options.

int ignore = 1000: Index to ignore in cross-entropy/NLL (default 1000 = disabled).

int blank = 0: Blank token index for CTC (default 0).

double margin = 0: Margin value for margin-based losses.

double beta = 0: Beta parameter for Huber/SmoothL1.

double eps = 0: Epsilon for numerical stability.

double smoothing = 0: Label-smoothing factor for cross-entropy.

double delta = 0: Delta parameter for Huber loss.

std::vector<double> weight = {}: Per-class weight vector for cross-entropy and NLL.

optimizer_params_t — optimizer hyper-parameters

optimizer_params_t is the C++ counterpart of OptimizerConfig (Cython layer). Each cproperty field sets a m_* sentinel flag so the optimizer builder knows which hyper-parameters have been explicitly specified.

class optimizer_params_t

Typed hyper-parameter set for a PyTorch optimizer.

Wraps each hyper-parameter as a cproperty so that setting one automatically records that it was explicitly provided (the corresponding m_* flag is set to true).

Public Functions

optimizer_params_t(): Constructor. Registers setter callbacks for all hyper-parameters.

Public Members

std::string optimizer = "": Optimizer name string (e.g. "adam").

cproperty<double, optimizer_params_t> lr: Learning rate.

cproperty<double, optimizer_params_t> lr_decay: Learning-rate decay (Adagrad).

cproperty<double, optimizer_params_t> weight_decay: L2 weight-decay regularisation.

cproperty<double, optimizer_params_t> initial_accumulator_value: Initial accumulator value (Adagrad).

cproperty<double, optimizer_params_t> eps: Epsilon for numerical stability.

cproperty<double, optimizer_params_t> tolerance_grad: Gradient tolerance (L-BFGS).

cproperty<double, optimizer_params_t> tolerance_change: Function value/parameter change tolerance (L-BFGS).

cproperty<double, optimizer_params_t> alpha: Alpha / smoothing constant (RMSprop).

cproperty<double, optimizer_params_t> momentum: Momentum factor (SGD / RMSprop).

cproperty<double, optimizer_params_t> dampening: Dampening for momentum (SGD).

cproperty<bool, optimizer_params_t> amsgrad: Enable AMSGrad variant of Adam/AdamW.

cproperty<bool, optimizer_params_t> centered: Centered RMSprop.

cproperty<bool, optimizer_params_t> nesterov: Nesterov momentum (SGD).

cproperty<int, optimizer_params_t> max_iter: Maximum number of iterations per optimisation step (L-BFGS).

cproperty<int, optimizer_params_t> max_eval: Maximum number of function evaluations per step (L-BFGS).

cproperty<int, optimizer_params_t> history_size: History size for L-BFGS.

cproperty<std::tuple<float, float>, optimizer_params_t> betas: Adam β₁/β₂ coefficients.

cproperty<std::vector<float>, optimizer_params_t> beta_hack: Alternative way to set β₁/β₂ as a vector (for Cython compatibility).

std::string scheduler = "": Scheduler type string (e.g. "steplr").

unsigned int step_size = 1: StepLR step size (default 1).

double gamma = 0.1: Multiplicative factor for LR decay (default 0.1).

bool m_lr = false

bool m_lr_decay = false

bool m_weight_decay = false

bool m_initial_accumulator_value = false

bool m_eps = false

bool m_betas = false

bool m_amsgrad = false

bool m_max_iter = false

bool m_max_eval = false

bool m_tolerance_grad = false

bool m_tolerance_change = false

bool m_history_size = false

bool m_alpha = false

bool m_momentum = false

bool m_centered = false

bool m_dampening = false

bool m_nesterov = false

Meta Structs

meta_t — ATLAS dataset metadata

meta_t holds all AMI / ATLAS metadata for a dataset: DSID, campaign, generator, cross-section, filter efficiency, luminosity, sum-of-weights, run numbers, file GUIDs and per-systematic weight dictionaries.

struct meta_t

Full dataset metadata struct.

Complete metadata for one Monte Carlo or data dataset.

Stores AMI metadata, cross-section, luminosity, and provenance information for a single dataset.

Aggregates ATLAS AnalysisBase tracking values, AMI attributes, physics cross-section information, and ROOT file inventory.

Public Members

unsigned int dsid = 0: Dataset identifier.

bool isMC = true: true if the dataset is Monte Carlo simulation.

std::string derivationFormat = "": ATLAS derivation format string (e.g. "DAOD_PHYS").

std::map<int, std::string> inputfiles = {}: Map of integer index to input file path.

std::map<std::string, std::string> config = {}: Arbitrary key-value configuration map.

std::string AMITag = "": AMI tag string (e.g. "e8496_s3126_r12305_p5169").

std::string generators = "": Space-separated list of generator names.

std::map<int, int> inputrange = {}: Map of file index to event count in that file.

double eventNumber = -1: ROOT event number (reserved for ROOT-specific mapping).

double event_index = -1: Free-use event index within the framework.

bool found = false: true if the dataset was found in AMI / the local cache.

std::string DatasetName = "": Full AMI logical dataset name.

double totalSize = 0: Total dataset size in bytes.

double kfactor = 0: QCD k-factor for the process.

double ecmEnergy = 0: Centre-of-mass energy in MeV.

double genFiltEff = 0: Generator filter efficiency.

double completion = 0: Fraction of the dataset that has been processed.

double beam_energy = 0: Beam energy in MeV.

double crossSection = 0: Cross section in nb.

double crossSection_mean = 0: Mean cross section (e.g. averaged over PDF members).

double campaign_luminosity = 0: Integrated luminosity of the corresponding campaign in fb⁻¹.

unsigned int nFiles = 0: Number of files in the dataset.

unsigned int totalEvents = 0: Total number of events across all files.

unsigned int datasetNumber = 0: Numeric dataset identifier (redundant with dsid for legacy use).

std::string identifier = "": Human-readable unique identifier string for the dataset.

std::string prodsysStatus = "": Production system status string.

std::string dataType = "": Data type string (e.g. "MC" or "DATA").

std::string version = "": Software version tag.

std::string PDF = "": PDF set used for generation.

std::string AtlasRelease = "": ATLAS software release string.

std::string principalPhysicsGroup = "": ATLAS physics group that owns the dataset.

std::string physicsShort = "": Short physics description tag.

std::string generatorName = "": Primary generator name.

std::string geometryVersion = "": ATLAS detector geometry version.

std::string conditionsTag = "": Conditions database tag.

std::string generatorTune = "": Generator tune identifier.

std::string amiStatus = "": AMI dataset status string.

std::string beamType = "": Beam type string (e.g. "collisions").

std::string productionStep = "": Production step label.

std::string projectName = "": ATLAS project name.

std::string statsAlgorithm = "": Statistics algorithm used.

std::string genFilterNames = "": Comma-separated generator filter names.

std::string file_type = "": File type identifier (e.g. "ROOT" or "HDF5").

std::string sample_name = "": Short sample name used in plots and outputs.

std::string logicalDatasetName = "": Full logical dataset name (LDN) in AMI.

std::string campaign = "": ATLAS MC campaign identifier (e.g. "mc21").

std::vector<std::string> keywords = {}: List of AMI keyword strings.

std::vector<std::string> weights = {}: Names of the event weights stored in the dataset.

std::vector<std::string> keyword = {}: Additional keyword list (complementary to keywords).

std::vector<int> events = {}: Per-file event counts.

std::vector<int> run_number = {}: Per-file run numbers.

std::vector<double> fileSize = {}: Per-file sizes in bytes.

std::vector<std::string> fileGUID = {}: Per-file GUIDs.

std::map<std::string, int> LFN = {}: Map of logical file name (LFN) to integer index.

std::map<std::string, weights_t> misc = {}: Per-file or per-tag supplementary weight data.

weights_t — per-systematic sum-of-weights record

struct weights_t

Sample weighting information from AMI.

Holds normalisation weights and statistics for a single dataset.

Public Members

int dsid = -1: Dataset identifier (DSID).

bool isAFII = false: true if the dataset uses ATLAS Fast II simulation.

std::string generator = "": Name of the Monte Carlo generator.

std::string ami_tag = "": AMI processing tag string.

float total_events_weighted = -1: Sum of weights for all events in the dataset.

float total_events = -1: Raw total event count.

float processed_events = -1: Number of events processed.

float processed_events_weighted = -1: Sum of weights for processed events.

float processed_events_weighted_squared = -1: Sum of squared weights for processed events (for uncertainty estimation).

std::map<std::string, float> hist_data = {}: Histogram data keyed by variable name.

Training Report Structs

model_report — per-epoch training summary

model_report is produced by the dataloader after each epoch and carries loss/accuracy maps keyed by mode_enum and feature name, as well as the current learning rates and iteration counters.

struct model_report

Stores and formats the training progress of a single epoch for one k-fold.

Tracks per-epoch, per-mode (training / validation / evaluation) loss and accuracy values at graph, node, and edge level, indexed by feature name. The current_lr vector records the learning rate of each parameter group at the end of the epoch.

is_complete is set to true when the epoch loop finishes; waiting_plot points to the metrics object that is waiting to dump plots for this epoch (or nullptr). progress tracks fractional completion within an epoch; iters and num_evnt count gradient updates and processed events respectively.

print() formats all maps as a multi-line human-readable string. prx() formats one individual map with a given title prefix.

Public Functions

std::string print()

Format all accumulated metrics as a multi-line human-readable string.

Returns:: Formatted report string.

std::string prx(std::map<mode_enum, std::map<std::string, float>> *data, std::string title)

Format one metric map (e.g. loss_graph) with the given title.

Parameters:

data – Pointer to the metric map to format.
title – Section title to prepend.

Returns:

Formatted section string.

Public Members

int k: K-fold index for this report.

int epoch: Epoch number for this report.

bool is_complete = false: true once the epoch has finished processing.

metrics *waiting_plot = nullptr: Pointer to the metrics object waiting to produce plots for this epoch, or nullptr if not applicable.

std::vector<double> current_lr = {}: Per-parameter-group learning rates at the end of this epoch.

std::map<mode_enum, std::map<std::string, float>> loss_graph = {}: Graph-level loss values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> loss_node = {}: Node-level loss values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> loss_edge = {}: Edge-level loss values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> accuracy_graph = {}: Graph-level accuracy values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> accuracy_node = {}: Node-level accuracy values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> accuracy_edge = {}: Edge-level accuracy values keyed by mode then feature name.

std::string run_name: Name of the training run (as passed to analysis::add_model).

std::string mode: Current mode string ("training", "validation", or "evaluation").

long iters = 0: Number of gradient-update iterations performed so far.

long num_evnt = 0: Number of events processed in the current epoch.

float progress: Fractional progress through the epoch in [0, 1].

roc_t — ROC curve data

struct roc_t

Holds the data for one ROC curve (one class, one k-fold, one model).

Stores the ROC curve data for one class and one k-fold.

Aggregates the ground-truth label arrays (truth) and classifier score arrays (scores) together with the computed true-positive rate (tpr_), false-positive rate (fpr_), and area-under-curve values (_auc) for a specific class index (cls) and k-fold (kfold). The truth and scores pointers are not owned by this struct.

Public Members

int cls = 0: Class index.

int kfold = 0: K-fold index.

std::string model = "": Name of the model that produced this ROC curve.

std::vector<double> _auc = {}: Area under the ROC curve for each threshold sweep.

std::vector<std::vector<double>> tpr_ = {}: True positive rate (recall) values; one vector per threshold sweep.

std::vector<std::vector<double>> fpr_ = {}: False positive rate values; one vector per threshold sweep.

std::vector<std::vector<int>> *truth = nullptr: Pointer to the ground-truth label arrays (not owned).

std::vector<std::vector<double>> *scores = nullptr: Pointer to the classifier score arrays (not owned).