Structs and Enumerations

Framework-wide plain-old-data structures and enumeration types. These types are used pervasively across the C++ core and are the primary data contracts between modules (e.g. between the IO layer and the GNN training pipeline).

Enumerations

enum class data_enum

Identifies the concrete C++ type of a stored data element.

Values:

enumerator d

Scalar double.

enumerator v_d

std::vector<double>.

enumerator vv_d

std::vector<std::vector<double>>.

enumerator vvv_d

std::vector<std::vector<std::vector<double>>>.

enumerator f

Scalar float.

enumerator v_f

std::vector<float>.

enumerator vv_f

std::vector<std::vector<float>>.

enumerator vvv_f

std::vector<std::vector<std::vector<float>>>.

enumerator l

Scalar long.

enumerator v_l

std::vector<long>.

enumerator vv_l

std::vector<std::vector<long>>.

enumerator vvv_l

std::vector<std::vector<std::vector<long>>>.

enumerator i

Scalar int.

enumerator v_i

std::vector<int>.

enumerator vv_i

std::vector<std::vector<int>>.

enumerator vvv_i

std::vector<std::vector<std::vector<int>>>.

enumerator ull

Scalar unsigned long long.

enumerator v_ull

std::vector<unsigned long long>.

enumerator vv_ull

std::vector<std::vector<unsigned long long>>.

enumerator vvv_ull

std::vector<std::vector<std::vector<unsigned long long>>>.

enumerator b

Scalar bool.

enumerator v_b

std::vector<bool>.

enumerator vv_b

std::vector<std::vector<bool>>.

enumerator vvv_b

std::vector<std::vector<std::vector<bool>>>.

enumerator ui

Scalar unsigned int.

enumerator v_ui

std::vector<unsigned int>.

enumerator vv_ui

std::vector<std::vector<unsigned int>>.

enumerator vvv_ui

std::vector<std::vector<std::vector<unsigned int>>>.

enumerator c

Scalar char.

enumerator v_c

std::vector<char>.

enumerator vv_c

std::vector<std::vector<char>>.

enumerator vvv_c

std::vector<std::vector<std::vector<char>>>.

enumerator undef

Type is unknown / undefined.

enumerator unset

Type has not yet been assigned.

enum class opt_enum

Identifies the PyTorch optimizer algorithm.

Values:

enumerator adam

Adam optimizer.

enumerator adagrad

Adagrad optimizer.

enumerator adamw

AdamW optimizer.

enumerator lbfgs

L-BFGS optimizer.

enumerator rmsprop

RMSprop optimizer.

enumerator sgd

Stochastic gradient descent optimizer.

enumerator invalid_optimizer

Sentinel value indicating no valid optimizer was selected.

enum class mlp_init

Weight-initialisation schemes for torch::nn::Sequential modules.

Values:

enumerator uniform

Uniform random initialisation.

enumerator normal

Normal (Gaussian) random initialisation.

enumerator xavier_normal

Xavier normal initialisation.

enumerator xavier_uniform

Xavier uniform initialisation.

enumerator kaiming_uniform

Kaiming (He) uniform initialisation.

enumerator kaiming_normal

Kaiming (He) normal initialisation.

enum class loss_enum

Identifies the PyTorch loss function to use.

Values:

enumerator bce

Binary cross-entropy loss.

enumerator bce_with_logits

Binary cross-entropy with logits loss.

enumerator cosine_embedding

Cosine embedding loss.

enumerator cross_entropy

Cross-entropy loss.

enumerator ctc

CTC loss.

enumerator hinge_embedding

Hinge embedding loss.

enumerator huber

Huber loss.

enumerator kl_div

Kullback-Leibler divergence loss.

enumerator l1

L1 (mean absolute error) loss.

enumerator margin_ranking

Margin ranking loss.

enumerator mse

Mean squared error loss.

enumerator multi_label_margin

Multi-label margin loss.

enumerator multi_label_soft_margin

Multi-label soft margin loss.

enumerator multi_margin

Multi-margin loss.

enumerator nll

Negative log-likelihood loss.

enumerator poisson_nll

Poisson NLL loss.

enumerator smooth_l1

Smooth L1 loss.

enumerator soft_margin

Soft margin loss.

enumerator triplet_margin

Triplet margin loss.

enumerator triplet_margin_with_distance

Triplet margin with distance loss.

enumerator invalid_loss

Sentinel value indicating no valid loss was selected.

enum class scheduler_enum

Identifies the learning-rate scheduler.

Values:

enumerator steplr

Step-based learning-rate decay (StepLR).

enumerator reducelronplateauscheduler

Reduce LR on plateau scheduler.

enumerator lrscheduler

Generic LR scheduler base.

enumerator invalid_scheduler

Sentinel value indicating no valid scheduler was selected.

enum class graph_enum

Identifies which tensor slot of a graph_t object is being accessed.

Values:

enumerator data_graph

Input feature tensor at graph level.

enumerator data_node

Input feature tensor at node level.

enumerator data_edge

Input feature tensor at edge level.

enumerator truth_graph

Ground-truth label tensor at graph level.

enumerator truth_node

Ground-truth label tensor at node level.

enumerator truth_edge

Ground-truth label tensor at edge level.

enumerator edge_index

COO edge-index tensor ([2, num_edges]).

enumerator weight

Per-event weight tensor.

enumerator batch_index

Batch-assignment index tensor for batched graphs.

enumerator batch_events

Global event indices of all graphs in the batch.

enumerator pred_graph

Model prediction tensor at graph level.

enumerator pred_node

Model prediction tensor at node level.

enumerator pred_edge

Model prediction tensor at edge level.

enumerator pred_extra

Miscellaneous model prediction tensor.

enum class mode_enum

Identifies the training phase.

Values:

enumerator training

Model is in training mode.

enumerator validation

Model is in validation mode.

enumerator evaluation

Model is in evaluation (test/inference) mode.

enum class particle_enum

Identifies which kinematic or metadata attribute of a particle to read or write.

Values:

enumerator index

Particle index in the event.

enumerator pdgid

PDG Monte Carlo particle ID.

enumerator pt

Transverse momentum.

enumerator eta

Pseudorapidity.

enumerator phi

Azimuthal angle.

enumerator energy

Energy.

enumerator px

x-component of momentum.

enumerator pz

z-component of momentum.

enumerator py

y-component of momentum.

enumerator mass

Invariant mass.

enumerator charge

Electric charge.

enumerator is_b

Flag: particle is a b-quark/hadron.

enumerator is_lep

Flag: particle is a lepton.

enumerator is_nu

Flag: particle is a neutrino.

enumerator is_add

Flag: additional particle classification.

enumerator pmc

Bulk Cartesian four-momentum write-out (px, py, pz, e).

enumerator pmu

Bulk polar four-momentum write-out (pt, eta, phi, e).

Core Data Structs

bsc_t — polymorphic leaf buffer

bsc_t (base struct) is the polymorphic root of the ROOT leaf-reading hierarchy. Each instantiation holds at most one heap-allocated buffer corresponding to the concrete data_enum type discovered at runtime. All overloads of element() read the buffer at index and write into the caller-supplied pointer.

struct bsc_t

Type-erased data container that stores a single typed pointer.

bsc_t holds one pointer for each supported data type (scalars, vectors, 2-deep nested vectors, and 3-deep nested vectors of all the numeric types declared in data_enum). Exactly one pointer should be non-null at any time; the active type is recorded in the type member. The index member controls which element of a vector-typed buffer is returned by the element() accessors. flush_buffer() either zeros or deletes the active pointer depending on the clear flag. The class is the base of data_t (read buffer for ROOT branches) and variable_t (write buffer for output ROOT branches).

Subclassed by data_t, variable_t

Public Functions

bsc_t()

Default constructor. Initialises all pointers to nullptr.

virtual ~bsc_t()

Virtual destructor.

void flush_buffer()

Clear or delete the active data pointer according to bsc_t::clear.

std::string as_string()

Return the string name of the currently active type.

Returns:

Type name as a string.

std::string scan_buffer()

Scan and return a debug summary of the buffer state.

Returns:

String describing the active pointer.

data_enum root_type_translate(std::string*)

Translate a ROOT leaf-type string to the corresponding data_enum.

Parameters:

root_str – Pointer to the ROOT type string (e.g. "D" for double).

Returns:

Corresponding data_enum value.

bool element(std::vector<std::vector<std::vector<float>>> *el)

Store the given pointer and set type accordingly.

Parameters:

el – Pointer to assign.

Returns:

true on success.

bool element(std::vector<std::vector<std::vector<double>>> *el)
bool element(std::vector<std::vector<std::vector<long>>> *el)
bool element(std::vector<std::vector<std::vector<int>>> *el)
bool element(std::vector<std::vector<std::vector<bool>>> *el)
bool element(std::vector<std::vector<float>> *el)
bool element(std::vector<std::vector<double>> *el)
bool element(std::vector<std::vector<long>> *el)
bool element(std::vector<std::vector<int>> *el)
bool element(std::vector<std::vector<bool>> *el)
bool element(std::vector<float> *el)
bool element(std::vector<double> *el)
bool element(std::vector<long> *el)
bool element(std::vector<int> *el)
bool element(std::vector<char> *el)
bool element(std::vector<bool> *el)
bool element(double *el)
bool element(float *el)
bool element(long *el)
bool element(int *el)
bool element(bool *el)
bool element(unsigned long long *el)
bool element(unsigned int *el)
bool element(char *el)
template<typename T>
inline bool flush_buffer(std::vector<T> **data)
template<typename T>
inline bool flush_buffer(T **data)

Public Members

std::vector<std::vector<std::vector<unsigned long long>>> *vvv_ull = nullptr
std::vector<std::vector<std::vector<unsigned int>>> *vvv_ui = nullptr
std::vector<std::vector<std::vector<double>>> *vvv_d = nullptr
std::vector<std::vector<std::vector<long>>> *vvv_l = nullptr
std::vector<std::vector<std::vector<float>>> *vvv_f = nullptr
std::vector<std::vector<std::vector<int>>> *vvv_i = nullptr
std::vector<std::vector<std::vector<bool>>> *vvv_b = nullptr
std::vector<std::vector<std::vector<char>>> *vvv_c = nullptr
std::vector<std::vector<unsigned long long>> *vv_ull = nullptr
std::vector<std::vector<unsigned int>> *vv_ui = nullptr

Pointer to depth-2 unsigned int data, or nullptr.

std::vector<std::vector<double>> *vv_d = nullptr
std::vector<std::vector<long>> *vv_l = nullptr
std::vector<std::vector<float>> *vv_f = nullptr
std::vector<std::vector<int>> *vv_i = nullptr
std::vector<std::vector<bool>> *vv_b = nullptr
std::vector<std::vector<char>> *vv_c = nullptr
std::vector<unsigned long long> *v_ull = nullptr

Pointer to a flat vector of unsigned long longs, or nullptr.

std::vector<unsigned int> *v_ui = nullptr

Pointer to a flat vector of unsigned ints, or nullptr.

std::vector<double> *v_d = nullptr

Pointer to a flat vector of doubles, or nullptr.

std::vector<long> *v_l = nullptr

Pointer to a flat vector of longs, or nullptr.

std::vector<float> *v_f = nullptr

Pointer to a flat vector of floats, or nullptr.

std::vector<int> *v_i = nullptr

Pointer to a flat vector of ints, or nullptr.

std::vector<bool> *v_b = nullptr

Pointer to a flat vector of bools, or nullptr.

std::vector<char> *v_c = nullptr

Pointer to a flat vector of chars, or nullptr.

unsigned long long *ull = nullptr

Pointer to a scalar unsigned long long, or nullptr.

unsigned int *ui = nullptr

Pointer to a scalar unsigned int, or nullptr.

double *d = nullptr

Pointer to a scalar double, or nullptr.

long *l = nullptr

Pointer to a scalar long, or nullptr.

float *f = nullptr

Pointer to a scalar float, or nullptr.

int *i = nullptr

Pointer to a scalar int, or nullptr.

bool *b = nullptr

Pointer to a scalar bool, or nullptr.

char *c = nullptr

Pointer to a scalar char, or nullptr.

long index = 0

Current index into a vector-typed buffer (used during iteration).

bool clear = false

If true, flush_buffer deletes the pointed-to object; otherwise it only resets the value to 0 / clears the vector.

data_enum type = data_enum::unset

Active data type of this container (default data_enum::unset).

data_t — single ROOT leaf accessor

data_t extends bsc_t with ROOT bookkeeping (TLeaf/TBranch/TTree pointers, file path, leaf type string) and sequential iteration helpers.

struct data_t : public bsc_t

Represents a single ROOT branch/leaf with its associated metadata and typed data buffer.

Inherits bsc_t to store the branch data in a type-safe way. A data_t instance keeps references to the TFile, TTree, TBranch, and TLeaf it reads from and iterates over events via next().

Public Functions

data_t()

Default constructor.

~data_t() override

Destructor. Releases the typed data buffer.

void initialize()

Open the ROOT file and prepare the branch for reading.

void flush()

Clear the internal data buffer.

bool next()

Advance to the next event entry.

Returns:

true if a next entry was found; false at end of file.

Public Members

std::string leaf_name = ""

Name of the ROOT TLeaf.

std::string branch_name = ""

Name of the ROOT TBranch.

std::string tree_name = ""

Name of the ROOT TTree.

std::string leaf_type = ""

ROOT leaf type string (e.g. "D" for double, "F" for float).

std::string path = ""

File path from which this branch is read.

std::string *fname = nullptr

Pointer to the current filename (not owned).

TLeaf *leaf = nullptr

Pointer to the associated ROOT TLeaf object.

TBranch *branch = nullptr

Pointer to the associated ROOT TBranch object.

TTree *tree = nullptr

Pointer to the ROOT TTree that contains this branch.

TFile *file = nullptr

Pointer to the ROOT TFile.

int file_index = 0

Index into the files_s / files_i / files_t arrays for multi-file iteration.

std::vector<std::string> *files_s = nullptr

Pointer to the list of file paths for multi-file reading.

std::vector<long> *files_i = nullptr

Pointer to per-file event counts.

std::vector<TFile*> *files_t = nullptr

Pointer to the list of open TFile objects.

element_t — per-event leaf handle map

element_t is handed to particle_template::build() and event_template::build(). The get<T>(key, ptr) template method looks up the named data_t and copies the current element into *ptr via the appropriate bsc_t::element() overload.

struct element_t

Aggregates all data_t branches for one event, providing a named-key access interface.

Public Functions

bool next()

Advance all branches to the next event.

Returns:

true if successful.

void set_meta()

Update metadata (filename, event index) from the active branches.

bool boundary()

Check whether the current entry is on a file boundary.

Returns:

true if the current entry is the first of a new file.

template<typename g>
inline bool get(std::string key, g *var)

Retrieve the value stored under key and place it in var.

Template Parameters:

g – Target type; must match the leaf type or an abort is triggered.

Parameters:
  • key – Leaf/branch name used as the lookup key.

  • var – Pointer to the variable to fill.

Returns:

true on success; aborts on type mismatch.

Public Members

std::string tree = ""

Name of the ROOT TTree this element belongs to.

long event_index = -1

Sequential event index (default -1 = not yet set).

std::string filename = ""

Path of the file currently being read.

std::map<std::string, data_t*> handle = {}

Map of branch/leaf names to their data_t objects.

write_t / writer — ROOT output helpers

struct write_t

Groups a ROOT TFile, TTree, and meta_t pointer together with a map of named variable_t write buffers.

Public Functions

variable_t *process(std::string *name)

Look up or create the variable_t for name.

Parameters:

name – Variable name.

Returns:

Pointer to the corresponding variable_t.

void write()

Fill the current TTree entry from all registered variable_t buffers.

void create(std::string tr_name, std::string path)

Open the output ROOT file at path and create a TTree named tr_name.

Parameters:
  • tr_name – TTree name.

  • path – Output file path.

void close()

Write and close the output ROOT file.

Public Members

TFile *file = nullptr

Output ROOT TFile.

TTree *tree = nullptr

Output ROOT TTree.

meta_t *mtx = nullptr

Pointer to the metadata for this file.

std::map<std::string, variable_t*> *data = nullptr

Map of variable names to their variable_t write buffers.

struct writer

High-level helper that manages multiple write_t objects keyed by TTree name.

Public Functions

writer()

Default constructor.

~writer()

Destructor. Closes all open files.

void create(std::string *pth)

Open the output ROOT file at the given path.

Parameters:

pth – Pointer to the file path string.

void write(std::string *tree)

Fill one TTree entry for the tree named tree.

Parameters:

tree – Pointer to the TTree name.

template<typename g>
inline void process(std::string *tree, std::string *name, g *t)

Write value t to branch name in TTree tree.

Template Parameters:

g – Value type.

Parameters:
  • tree – Pointer to the TTree name.

  • name – Pointer to the branch/variable name.

  • t – Pointer to the value to write.

Event, Particle, and Graph Payload Structs

particle_t — raw kinematic payload

particle_t carries the floating-point kinematics and integer metadata for a single particle. particle_template stores one of these as its internal representation.

struct particle_t

Raw particle kinematics and classification flags.

Raw kinematic and identification data for a single particle.

Public Members

double e = -0.000000000000001

Energy in MeV (default -1e-15 to signal “unset”).

double mass = -1

Invariant mass in MeV (default -1).

double px = 0

Cartesian x-component of three-momentum.

double py = 0

Cartesian y-component of three-momentum.

double pz = 0

Cartesian z-component of three-momentum.

double pt = 0

Transverse momentum.

double eta = 0

Pseudorapidity.

double phi = 0

Azimuthal angle in radians.

bool cartesian = false

true if Cartesian (px, py, pz, e) coordinates are valid.

bool polar = false

true if polar (pt, eta, phi, e) coordinates are valid.

double charge = 0

Electric charge.

int pdgid = 0

PDG Monte Carlo particle identifier.

int index = -1

Index of this particle within its parent event container.

std::string type = ""

String label identifying the particle type (e.g. "top", "b").

std::string hash = ""

Unique hash string for this particle instance.

std::string symbol = ""

LaTeX-style or text symbol for the particle (e.g. "t", "b").

std::vector<int> lepdef = {11, 13, 15}

PDG IDs considered as leptons for is_lep classification (default {11, 13, 15}).

std::vector<int> nudef = {12, 14, 16}

PDG IDs considered as neutrinos for is_nu classification (default {12, 14, 16}).

std::map<std::string, bool> children = {}

Map of child particle hashes to a presence flag.

std::map<std::string, bool> parents = {}

Map of parent particle hashes to a presence flag.

std::map<std::string, particle_template*> *data_p = nullptr

Pointer to the event-level particle map; used during event building.

event_t — event identity

event_t is embedded in every event_template and carries the event index, weight, ROOT tree name and the unique hash used for graph caching.

struct event_t

Minimal event identification and state container.

Public Members

std::string name = ""

Human-readable name of the event type.

double weight = 1

Event weight (default 1.0).

long index = -1

Sequential event index within the file (default -1 = unset).

std::string hash = ""

Unique hash string identifying this event.

std::string tree = ""

Name of the ROOT TTree from which this event was read.

graph_t — GNN tensor container

graph_t is the central data structure passed between the graph builder, the dataloader, and the model_template::forward() method. It stores batched PyTorch tensors for node/edge/graph data features, truth features, the COO edge index, and batching meta-data.

struct graph_t

Runtime container for a single graph’s input features, truth labels, edge index, and device-resident tensors.

Stores input features (data_graph/, truth labels (truth_graph/, a COO edge-index tensor, an event-weight tensor, and a batch-index tensor. All tensors are cached per CUDA device index (in the dev_* maps) so that multi-GPU training can transfer data without repeated host-to-device copies.

The has_feature(graph_enum, name, dev) method is the unified lookup entry-point for all feature categories. Friends graph_template and dataloader have write access to the private tensor storage. The in_use counter is managed by dataloader to implement a simple object pool.

Public Functions

template<typename g>
inline torch::Tensor *get_truth_graph(std::string _name, g *mdl)

Retrieve the truth graph-level tensor named _name on the device of model mdl.

Template Parameters:

g – A model type with a device_index member.

Parameters:
  • _name – Feature name.

  • mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr if not found.

template<typename g>
inline torch::Tensor *get_truth_node(std::string _name, g *mdl)

Retrieve the truth node-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:
  • _name – Feature name.

  • mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g>
inline torch::Tensor *get_truth_edge(std::string _name, g *mdl)

Retrieve the truth edge-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:
  • _name – Feature name.

  • mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g>
inline torch::Tensor *get_data_graph(std::string _name, g *mdl)

Retrieve the input data graph-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:
  • _name – Feature name.

  • mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g>
inline torch::Tensor *get_data_node(std::string _name, g *mdl)

Retrieve the input data node-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:
  • _name – Feature name.

  • mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g>
inline torch::Tensor *get_data_edge(std::string _name, g *mdl)

Retrieve the input data edge-level tensor named _name.

Template Parameters:

g – A model type with a device_index member.

Parameters:
  • _name – Feature name.

  • mdl – Pointer to the model.

Returns:

Pointer to the tensor, or nullptr.

template<typename g>
inline torch::Tensor *get_edge_index(g *mdl)

Retrieve the COO edge-index tensor for model mdl's device.

Template Parameters:

g – A model type with a device_index member.

Parameters:

mdl – Pointer to the model.

Returns:

Pointer to the edge-index tensor, or nullptr.

template<typename g>
inline torch::Tensor *get_event_weight(g *mdl)

Retrieve the event-weight tensor.

Template Parameters:

g – A model type with a device_index member.

Parameters:

mdl – Pointer to the model.

Returns:

Pointer to the weight tensor.

template<typename g>
inline torch::Tensor *get_batch_index(g *mdl)

Retrieve the batch-assignment index tensor.

Template Parameters:

g – A model type with a device_index member.

Parameters:

mdl – Pointer to the model.

Returns:

Pointer to the batch-index tensor.

template<typename g>
inline torch::Tensor *get_batched_events(g *mdl)

Retrieve the global event-index tensor for the batch.

Template Parameters:

g – A model type with a device_index member.

Parameters:

mdl – Pointer to the model.

Returns:

Pointer to the tensor.

torch::Tensor *has_feature(graph_enum tp, std::string _name, int dev)

Generic feature lookup by type, name, and device index.

Parameters:
  • tp – Feature category (graph_enum).

  • _name – Feature name.

  • dev – Device index.

Returns:

Pointer to the tensor, or nullptr.

void add_truth_graph(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps)

Register truth graph-level tensors from data with index map maps.

void add_truth_node(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps)

Register truth node-level tensors.

void add_truth_edge(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps)

Register truth edge-level tensors.

void add_data_graph(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps)

Register input data graph-level tensors.

void add_data_node(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps)

Register input data node-level tensors.

void add_data_edge(std::map<std::string, torch::Tensor*> *data, std::map<std::string, int> *maps)

Register input data edge-level tensors.

void transfer_to_device(torch::TensorOptions *dev)

Copy all tensors to the device described by dev.

Parameters:

dev – Device options.

void _purge_all()

Delete all owned tensor data.

Public Members

int num_nodes = 0

Number of nodes in this graph.

long event_index = 0

Global event index.

double event_weight = 1

Event weight (default 1.0).

bool preselection = false

Result of the graph’s pre-selection flag.

std::vector<long> batched_events = {}

Global event indices of all graphs combined in the batch.

std::vector<std::string*> batched_filenames = {}

Source filenames of all graphs combined in the batch.

std::string *hash = nullptr

Pointer to the event hash string (not owned).

std::string *filename = nullptr

Pointer to the source file path (not owned).

std::string *graph_name = nullptr

Pointer to the graph type name (not owned).

c10::DeviceType device = c10::kCPU

Device on which data tensors reside (default CPU).

int in_use = 1

Reference count / in-use flag for the dataloader pool.

folds_t — k-fold assignment

struct folds_t

Associates an event hash with its k-fold split membership.

Public Functions

inline void flush_data()

Free the heap-allocated hash string and set it to nullptr.

Public Members

int k = -1

K-fold index to which this event belongs (default -1 = unset).

bool is_train = false

true if this event is assigned to the training set.

bool is_valid = false

true if this event is assigned to the validation set.

bool is_eval = false

true if this event is assigned to the evaluation (test) set.

char *hash = nullptr

Null-terminated C string holding the event hash.

graph_hdf5 / graph_hdf5_w — HDF5 serialisation records

struct graph_hdf5

std::string-based representation of a serialised graph for HDF5 I/O.

All tensor and map data are encoded as base-64 strings.

Public Members

int num_nodes = -1

Number of nodes in the graph.

double event_weight = 1

Event weight.

long event_index = -1

Global event index.

std::string hash

Event hash.

std::string filename

Source file path.

std::string edge_index

Serialised edge-index tensor.

std::string data_map_graph

Serialised data-feature name-to-index map (graph level).

std::string data_map_node

Serialised data-feature name-to-index map (node level).

std::string data_map_edge

Serialised data-feature name-to-index map (edge level).

std::string truth_map_graph

Serialised truth-feature name-to-index map (graph level).

std::string truth_map_node

Serialised truth-feature name-to-index map (node level).

std::string truth_map_edge

Serialised truth-feature name-to-index map (edge level).

std::string data_graph

Serialised data-feature tensors (graph level).

std::string data_node

Serialised data-feature tensors (node level).

std::string data_edge

Serialised data-feature tensors (edge level).

std::string truth_graph

Serialised truth-label tensors (graph level).

std::string truth_node

Serialised truth-label tensors (node level).

std::string truth_edge

Serialised truth-label tensors (edge level).

struct graph_hdf5_w

C-string (char*) variant of graph_hdf5 used for direct HDF5 write operations.

All string fields are heap-allocated null-terminated C strings. Call flush_data() to release them.

Public Functions

void flush_data()

Free all heap-allocated char* members and set them to nullptr.

Public Members

int num_nodes = -1

Number of nodes in the graph.

double event_weight = 1

Event weight.

long event_index = -1

Global event index.

char *hash = nullptr

Null-terminated event hash string.

char *filename = nullptr

Null-terminated source file path.

char *edge_index = nullptr

Null-terminated serialised edge-index tensor.

char *data_map_graph = nullptr
char *data_map_node = nullptr
char *data_map_edge = nullptr
char *truth_map_graph = nullptr
char *truth_map_node = nullptr
char *truth_map_edge = nullptr
char *data_graph = nullptr
char *data_node = nullptr
char *data_edge = nullptr
char *truth_graph = nullptr
char *truth_node = nullptr
char *truth_edge = nullptr

Settings and Configuration Structs

settings_t — global analysis settings

settings_t is the POD configuration object stored inside analysis (and accessible from Python as properties). All Analysis.* properties map onto fields of this struct.

struct settings_t

Global analysis run settings.

Aggregated run configuration covering I/O paths, ML hyperparameters, and plotting settings.

Passed directly to the analysis class via analysis::m_settings and propagated to dataloader, optimizer, metrics, and io.

The I/O fields (output_path, run_name, sow_name, metacache_path) control where results are written and how the sum-of-weights histogram is named.

The ML fields (epochs, kfolds, batch_size, kfold, num_examples, train_size, training, validation, evaluation, continue_training, training_dataset, graph_cache) configure the k-fold cross-validation loop, mini-batch size, which phases to run, whether to resume from a checkpoint, and paths to pre-built graph caches.

The plotting fields (var_pt, var_eta, var_phi, var_energy, targets, nbins, max_range, logy) configure auto-generated invariant-mass histograms.

The execution fields (threads, intra_th, debug_mode, build_cache, selection_root) control parallelism, intra-op thread count, verbose output, and output format.

Public Members

std::string output_path = "./ProjectName"

Root output directory for plots, models, and selection files (default "./ProjectName").

std::string run_name = ""

Optional tag appended to output directories for this run.

std::string sow_name = ""

Name of the sum-of-weights histogram in the ROOT file.

std::string metacache_path = "./"

Path to the directory used for caching AMI metadata (default "./").

bool fetch_meta = false

If true, query AMI for dataset metadata; otherwise use cache.

bool pretagevents = false

If true, pre-tag events before building graphs.

int epochs = 10

Number of training epochs (default 10).

int kfolds = 10

Number of k-fold splits (default 10).

int batch_size = 1

Minibatch size for training (default 1).

std::vector<int> kfold = {}

Explicit list of k-fold indices to run; empty means run all.

int num_examples = 3

Number of example graphs displayed during progress reporting (default 3).

float train_size = 50

Percentage of data used for training (default 50).

bool training = true

Enable the training phase (default true).

bool validation = true

Enable the validation phase (default true).

bool evaluation = true

Enable the evaluation phase (default true).

bool continue_training = true

If true, resume training from the last saved checkpoint (default true).

std::string training_dataset = ""

Path to a pre-built graph dataset for training.

std::string graph_cache = ""

Path to a graph cache directory.

std::string var_pt = "pt"

Leaf name of the transverse-momentum variable (default "pt").

std::string var_eta = "eta"

Leaf name of the pseudorapidity variable (default "eta").

std::string var_phi = "phi"

Leaf name of the azimuthal angle variable (default "phi").

std::string var_energy = "energy"

Leaf name of the energy variable (default "energy").

std::vector<std::string> targets = {}

List of variable names to use as training targets.

int nbins = 400

Number of histogram bins (default 400).

int max_range = 400

Maximum axis range for mass histograms (default 400 GeV).

bool logy = false

Use logarithmic y-axis on histograms (default false).

int threads = 10

Number of parallel worker threads (default 10).

int intra_th = -1

Number of intra-op threads for PyTorch (default -1 = use system default).

bool debug_mode = false

Enable verbose debug output (default false).

bool build_cache = false

If true, build and save a local graph cache (default false).

bool selection_root = false

If true, write selection output to ROOT files (default false).

model_settings_t — per-model ML configuration

model_settings_t is populated by model_template and carries optimizer choice, I/O feature maps, weight/tree names, and device info.

struct model_settings_t

Snapshot of a model_template configuration suitable for serialisation and transfer between objects.

Public Members

opt_enum e_optim

Optimizer type enum.

std::string s_optim

Optimizer type as a string (e.g. "adam").

std::string weight_name

Name of the event-weight leaf in the ROOT tree.

std::string tree_name

Name of the ROOT TTree to read from.

std::string model_name

Human-readable model name.

std::string model_device

PyTorch device string (e.g. "cpu" or "cuda:0").

std::string model_checkpoint_path

Path where model checkpoints are saved.

bool inference_mode

true when the model is used for inference only.

bool is_mc

true if the dataset is Monte Carlo simulation.

std::map<std::string, std::string> o_graph

Output feature names to loss-function names (graph level).

std::map<std::string, std::string> o_node

Output feature names to loss-function names (node level).

std::map<std::string, std::string> o_edge

Output feature names to loss-function names (edge level).

std::vector<std::string> i_graph

Names of requested input graph-level features.

std::vector<std::string> i_node

Names of requested input node-level features.

std::vector<std::string> i_edge

Names of requested input edge-level features.

loss_opt — loss function options

struct loss_opt

Options controlling the behaviour of a loss function.

Public Members

loss_enum fx = loss_enum::invalid_loss

Loss function type (default invalid_loss).

bool mean = false

Use torch::kMean reduction.

bool sum = false

Use torch::kSum reduction.

bool none = false

Use torch::kNone reduction.

bool swap = false

Enable the swap option (TripletMargin).

bool full = false

Enable the full option (KLDiv).

bool batch_mean = false

Use torch::kBatchMean reduction (KLDiv).

bool target = false

Enable log_target (KLDiv).

bool zero_inf = false

Enable zero_infinity (CTC).

bool defaults = true

If true, use the loss function’s default options.

int ignore = 1000

Index to ignore in cross-entropy/NLL (default 1000 = disabled).

int blank = 0

Blank token index for CTC (default 0).

double margin = 0

Margin value for margin-based losses.

double beta = 0

Beta parameter for Huber/SmoothL1.

double eps = 0

Epsilon for numerical stability.

double smoothing = 0

Label-smoothing factor for cross-entropy.

double delta = 0

Delta parameter for Huber loss.

std::vector<double> weight = {}

Per-class weight vector for cross-entropy and NLL.

optimizer_params_t — optimizer hyper-parameters

optimizer_params_t is the C++ counterpart of OptimizerConfig (Cython layer). Each cproperty field sets a m_* sentinel flag so the optimizer builder knows which hyper-parameters have been explicitly specified.

class optimizer_params_t

Typed hyper-parameter set for a PyTorch optimizer.

Wraps each hyper-parameter as a cproperty so that setting one automatically records that it was explicitly provided (the corresponding m_* flag is set to true).

Public Functions

optimizer_params_t()

Constructor. Registers setter callbacks for all hyper-parameters.

Public Members

std::string optimizer = ""

Optimizer name string (e.g. "adam").

cproperty<double, optimizer_params_t> lr

Learning rate.

cproperty<double, optimizer_params_t> lr_decay

Learning-rate decay (Adagrad).

cproperty<double, optimizer_params_t> weight_decay

L2 weight-decay regularisation.

cproperty<double, optimizer_params_t> initial_accumulator_value

Initial accumulator value (Adagrad).

cproperty<double, optimizer_params_t> eps

Epsilon for numerical stability.

cproperty<double, optimizer_params_t> tolerance_grad

Gradient tolerance (L-BFGS).

cproperty<double, optimizer_params_t> tolerance_change

Function value/parameter change tolerance (L-BFGS).

cproperty<double, optimizer_params_t> alpha

Alpha / smoothing constant (RMSprop).

cproperty<double, optimizer_params_t> momentum

Momentum factor (SGD / RMSprop).

cproperty<double, optimizer_params_t> dampening

Dampening for momentum (SGD).

cproperty<bool, optimizer_params_t> amsgrad

Enable AMSGrad variant of Adam/AdamW.

cproperty<bool, optimizer_params_t> centered

Centered RMSprop.

cproperty<bool, optimizer_params_t> nesterov

Nesterov momentum (SGD).

cproperty<int, optimizer_params_t> max_iter

Maximum number of iterations per optimisation step (L-BFGS).

cproperty<int, optimizer_params_t> max_eval

Maximum number of function evaluations per step (L-BFGS).

cproperty<int, optimizer_params_t> history_size

History size for L-BFGS.

cproperty<std::tuple<float, float>, optimizer_params_t> betas

Adam β₁/β₂ coefficients.

cproperty<std::vector<float>, optimizer_params_t> beta_hack

Alternative way to set β₁/β₂ as a vector (for Cython compatibility).

std::string scheduler = ""

Scheduler type string (e.g. "steplr").

unsigned int step_size = 1

StepLR step size (default 1).

double gamma = 0.1

Multiplicative factor for LR decay (default 0.1).

bool m_lr = false
bool m_lr_decay = false
bool m_weight_decay = false
bool m_initial_accumulator_value = false
bool m_eps = false
bool m_betas = false
bool m_amsgrad = false
bool m_max_iter = false
bool m_max_eval = false
bool m_tolerance_grad = false
bool m_tolerance_change = false
bool m_history_size = false
bool m_alpha = false
bool m_momentum = false
bool m_centered = false
bool m_dampening = false
bool m_nesterov = false

Meta Structs

meta_t — ATLAS dataset metadata

meta_t holds all AMI / ATLAS metadata for a dataset: DSID, campaign, generator, cross-section, filter efficiency, luminosity, sum-of-weights, run numbers, file GUIDs and per-systematic weight dictionaries.

struct meta_t

Full dataset metadata struct.

Complete metadata for one Monte Carlo or data dataset.

Stores AMI metadata, cross-section, luminosity, and provenance information for a single dataset.

Aggregates ATLAS AnalysisBase tracking values, AMI attributes, physics cross-section information, and ROOT file inventory.

Public Members

unsigned int dsid = 0

Dataset identifier.

bool isMC = true

true if the dataset is Monte Carlo simulation.

std::string derivationFormat = ""

ATLAS derivation format string (e.g. "DAOD_PHYS").

std::map<int, std::string> inputfiles = {}

Map of integer index to input file path.

std::map<std::string, std::string> config = {}

Arbitrary key-value configuration map.

std::string AMITag = ""

AMI tag string (e.g. "e8496_s3126_r12305_p5169").

std::string generators = ""

Space-separated list of generator names.

std::map<int, int> inputrange = {}

Map of file index to event count in that file.

double eventNumber = -1

ROOT event number (reserved for ROOT-specific mapping).

double event_index = -1

Free-use event index within the framework.

bool found = false

true if the dataset was found in AMI / the local cache.

std::string DatasetName = ""

Full AMI logical dataset name.

double totalSize = 0

Total dataset size in bytes.

double kfactor = 0

QCD k-factor for the process.

double ecmEnergy = 0

Centre-of-mass energy in MeV.

double genFiltEff = 0

Generator filter efficiency.

double completion = 0

Fraction of the dataset that has been processed.

double beam_energy = 0

Beam energy in MeV.

double crossSection = 0

Cross section in nb.

double crossSection_mean = 0

Mean cross section (e.g. averaged over PDF members).

double campaign_luminosity = 0

Integrated luminosity of the corresponding campaign in fb⁻¹.

unsigned int nFiles = 0

Number of files in the dataset.

unsigned int totalEvents = 0

Total number of events across all files.

unsigned int datasetNumber = 0

Numeric dataset identifier (redundant with dsid for legacy use).

std::string identifier = ""

Human-readable unique identifier string for the dataset.

std::string prodsysStatus = ""

Production system status string.

std::string dataType = ""

Data type string (e.g. "MC" or "DATA").

std::string version = ""

Software version tag.

std::string PDF = ""

PDF set used for generation.

std::string AtlasRelease = ""

ATLAS software release string.

std::string principalPhysicsGroup = ""

ATLAS physics group that owns the dataset.

std::string physicsShort = ""

Short physics description tag.

std::string generatorName = ""

Primary generator name.

std::string geometryVersion = ""

ATLAS detector geometry version.

std::string conditionsTag = ""

Conditions database tag.

std::string generatorTune = ""

Generator tune identifier.

std::string amiStatus = ""

AMI dataset status string.

std::string beamType = ""

Beam type string (e.g. "collisions").

std::string productionStep = ""

Production step label.

std::string projectName = ""

ATLAS project name.

std::string statsAlgorithm = ""

Statistics algorithm used.

std::string genFilterNames = ""

Comma-separated generator filter names.

std::string file_type = ""

File type identifier (e.g. "ROOT" or "HDF5").

std::string sample_name = ""

Short sample name used in plots and outputs.

std::string logicalDatasetName = ""

Full logical dataset name (LDN) in AMI.

std::string campaign = ""

ATLAS MC campaign identifier (e.g. "mc21").

std::vector<std::string> keywords = {}

List of AMI keyword strings.

std::vector<std::string> weights = {}

Names of the event weights stored in the dataset.

std::vector<std::string> keyword = {}

Additional keyword list (complementary to keywords).

std::vector<int> events = {}

Per-file event counts.

std::vector<int> run_number = {}

Per-file run numbers.

std::vector<double> fileSize = {}

Per-file sizes in bytes.

std::vector<std::string> fileGUID = {}

Per-file GUIDs.

std::map<std::string, int> LFN = {}

Map of logical file name (LFN) to integer index.

std::map<std::string, weights_t> misc = {}

Per-file or per-tag supplementary weight data.

weights_t — per-systematic sum-of-weights record

struct weights_t

Sample weighting information from AMI.

Holds normalisation weights and statistics for a single dataset.

Public Members

int dsid = -1

Dataset identifier (DSID).

bool isAFII = false

true if the dataset uses ATLAS Fast II simulation.

std::string generator = ""

Name of the Monte Carlo generator.

std::string ami_tag = ""

AMI processing tag string.

float total_events_weighted = -1

Sum of weights for all events in the dataset.

float total_events = -1

Raw total event count.

float processed_events = -1

Number of events processed.

float processed_events_weighted = -1

Sum of weights for processed events.

float processed_events_weighted_squared = -1

Sum of squared weights for processed events (for uncertainty estimation).

std::map<std::string, float> hist_data = {}

Histogram data keyed by variable name.

Training Report Structs

model_report — per-epoch training summary

model_report is produced by the dataloader after each epoch and carries loss/accuracy maps keyed by mode_enum and feature name, as well as the current learning rates and iteration counters.

struct model_report

Stores and formats the training progress of a single epoch for one k-fold.

Tracks per-epoch, per-mode (training / validation / evaluation) loss and accuracy values at graph, node, and edge level, indexed by feature name. The current_lr vector records the learning rate of each parameter group at the end of the epoch.

is_complete is set to true when the epoch loop finishes; waiting_plot points to the metrics object that is waiting to dump plots for this epoch (or nullptr). progress tracks fractional completion within an epoch; iters and num_evnt count gradient updates and processed events respectively.

print() formats all maps as a multi-line human-readable string. prx() formats one individual map with a given title prefix.

Public Functions

std::string print()

Format all accumulated metrics as a multi-line human-readable string.

Returns:

Formatted report string.

std::string prx(std::map<mode_enum, std::map<std::string, float>> *data, std::string title)

Format one metric map (e.g. loss_graph) with the given title.

Parameters:
  • data – Pointer to the metric map to format.

  • title – Section title to prepend.

Returns:

Formatted section string.

Public Members

int k

K-fold index for this report.

int epoch

Epoch number for this report.

bool is_complete = false

true once the epoch has finished processing.

metrics *waiting_plot = nullptr

Pointer to the metrics object waiting to produce plots for this epoch, or nullptr if not applicable.

std::vector<double> current_lr = {}

Per-parameter-group learning rates at the end of this epoch.

std::map<mode_enum, std::map<std::string, float>> loss_graph = {}

Graph-level loss values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> loss_node = {}

Node-level loss values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> loss_edge = {}

Edge-level loss values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> accuracy_graph = {}

Graph-level accuracy values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> accuracy_node = {}

Node-level accuracy values keyed by mode then feature name.

std::map<mode_enum, std::map<std::string, float>> accuracy_edge = {}

Edge-level accuracy values keyed by mode then feature name.

std::string run_name

Name of the training run (as passed to analysis::add_model).

std::string mode

Current mode string ("training", "validation", or "evaluation").

long iters = 0

Number of gradient-update iterations performed so far.

long num_evnt = 0

Number of events processed in the current epoch.

float progress

Fractional progress through the epoch in [0, 1].

roc_t — ROC curve data

struct roc_t

Holds the data for one ROC curve (one class, one k-fold, one model).

Stores the ROC curve data for one class and one k-fold.

Aggregates the ground-truth label arrays (truth) and classifier score arrays (scores) together with the computed true-positive rate (tpr_), false-positive rate (fpr_), and area-under-curve values (_auc) for a specific class index (cls) and k-fold (kfold). The truth and scores pointers are not owned by this struct.

Public Members

int cls = 0

Class index.

int kfold = 0

K-fold index.

std::string model = ""

Name of the model that produced this ROC curve.

std::vector<double> _auc = {}

Area under the ROC curve for each threshold sweep.

std::vector<std::vector<double>> tpr_ = {}

True positive rate (recall) values; one vector per threshold sweep.

std::vector<std::vector<double>> fpr_ = {}

False positive rate values; one vector per threshold sweep.

std::vector<std::vector<int>> *truth = nullptr

Pointer to the ground-truth label arrays (not owned).

std::vector<std::vector<double>> *scores = nullptr

Pointer to the classifier score arrays (not owned).