Analysis Interface

The main interfacing class that automates and defines the workflow from MVA training, ROOT n-tuples production, GNN inference, sample generation and much more.

The C++ Interface

class analysis : public notification, public tools
settings_t m_settings

A member struct varible used to control and specify runtime behaviour.

void add_samples(std::string path, std::string label)

A function used to specify the directory of the ROOT samples used for the analysis. Accepted syntax for the path parameter is /path/<name>.root or /path/*.root. The label parameter is useful when samples need to separated, but is optional.

class Analysis
AddSamples(str path, str label)

A function used to assign a sample label (arbitrary name) to a particular dataset.

AddEvent(EventTemplate ev, str label)

A function used to pass the event implementation to be used for subsequent compilations.

AddGraph(GraphTemplate ev, str label)

A function used to tell the framework which graph implementation should be used.

AddSelection(SelectionTemplate selc)

A function which adds any selection templates to the current analysis workflow.

AddModel(ModelTemplate model, OptimizerConfig op, str run_name)

A function used to add a model to be trained, along with any optimizer hyperparameters that should be applied to the model. The additional run_name variable is used to create folders that contain the training output.

AddModelInference(ModelTemplate model, str run_name = "run_name")

A function used to add a trained model that should be used for inference studies. The run_name variable is used to generate folder structures for output ROOT files that hold model predictions.

Start()
Variables:
  • BatchSize (int)

  • FetchMeta (bool)

  • BuildCache (str)

  • PreTagEvents (bool)

  • SaveSelectionToROOT (bool)

  • GetMetaData (bool) – Attempts to identify any meta-data associated with the input samples and queries PyAMI to match any results.

  • SumOfWeightsTreeName (list) – Scans the ROOT file for possible sum of weights trees and histograms.

  • OutputPath (str) – The output path of the results.

  • kFolds (int) – Number of folds to train the model with.

  • kFold (list) – A list of kfolds to train the model. Useful if not enough resources are available to do a full k-fold train at once.

  • Epochs (int) – Number of epochs to train the model.

  • NumExamples (int) – Number of test example to validate runtime of the model.

  • TrainingDataset (str) – Path of the training set to use. If a value is given but no training set is available, the framework will dump a .h5 file.

  • TrainSize (int) – Size of the training set in percentage.

  • Training (bool) – Run the model over the training set.

  • Validation (bool) – Run the model over the validation set in a k-fold training session.

  • Evaluation (bool) – Run the model over the evaluation set.

  • ContinueTraining (bool) – Continue training the model at the last known checkpoint.

  • nBins (int) – Number of bins to plot the invariant mass metrics with.

  • Refresh (int) – Progress bar refresh step.

  • MaxRange (float) – Maximum range to plot the invariant mass metric plots.

  • VarPt (str) – The transverse momentum variable string name to use for the invariant mass computation.

  • VarEta (str) – The rapidity variable string name to use for the invariant mass computation.

  • VarPhi (str) – The azimuthal angle variable string name to use for the invariant mass computation.

  • VarEnergy (str) – The energy variable string name to use for the invariant mass computation.

  • Targets (list) – The targets to plot (the output of the model) e.g. top_edge.

  • DebugMode (bool) – Disables all threading.

  • Threads (int) – Number of threads to run the framework over.

  • GraphCache (str) – Specifies a directory in which graph_template outputs should be cached. This will generate .h5 files that can be reused.