The SampleTracer

A core module which is being inherited by all generator classes. This class focuses on tracking sample generation including events, graphs and selections, and how they are cached in memory. The class can be used as a standalone package, but is primarily intended to be integrated into any abstract generator class that one might want to implement, this will be illustrated later. Most of the functionalities, such as the magic functions are implemented within this class, making it a rather useful core module.

Methods and Attributes

class Event

A simple wrapper class used to batch cached objects into a single object. Upon calling for a specific attribute, the class will scan available objects for the attribute. Unlike most sub-modules within the package, this class has limited functionalities in terms of magic functions.

release_event() EventTemplate

A method which releases the event object from the sample tracer batch.

release_graph() GraphTemplate

A method which releases the graph object from the sample tracer batch.

release_selection() SelectionTemplate

A method which releases the selection object from the sample tracer batch.

event_cache_dir() dict

Returns a dictionary of the current caching directory of the given event name.

graph_cache_dir() dict

Returns a dictionary of the current caching directory of the given event name.

selection_cache_dir() dict

Returns a dictionary of the current caching directory of the given event name.

meta() MetaData

Returns a MetaData object for the current event.

__eq__() bool

Returns true if the events have the same hash.

__hash__() bool

Allows for the use of set and dict, where the event can be interpreted as a key in a dictionary.

__getstate__() tuple[meta_t, batch_t]

Allows the event to be pickled.

__setstate__(tuple[meta_t, batch_t])

Rebuilds the Event from a meta_t and batch_t data type.

Variables:

hash (str) – Returns a the hash of the current event.

class SampleTracer
__getstate__() tracer_t

Export this tracer including all samples (selections, graphs, events) and state (settings).

__setstate__(tracer_t inpt)

Import tracer parameters including all samples (selections, graphs, events) and state (settings).

Parameters:

inpt (tracer_t) – An exported tracer data type.

__getitem__(key Union[list, str]) bool or list

Scan indexed content and return a list of matches or a boolean if nothing has been found.

Params Union[list, str] key:

Scan the given requested term (ROOT name, Event Hash, …).

__contains__(str val) bool

Check if query is in sample tracer.

Params str val:

The string to check against.

__len__() int

Return length of the entire sample.

__add__(other) SampleTracer

Add two SampleTracers to create an independent SampleTracer. Content of both samples is compared and summed as a set.

Params SampleTracer other:

The other SampleTracer inherited object to sum.

__radd__(other) SampleTracer

Add two SampleTracers to create an independent SampleTracer. Content of both samples is compared and summed as a set.

Params SampleTracer other:

The other SampleTracer inherited object to sum.

__iadd__(SampleTracer other) SampleTracer

Append the incoming tracer object to this tracer.

__iter__()

Iteratate over the Sample Tracer with given parameters, e.g. cache type etc.

__next__ -> Event

The return of the iterator is an Event (Not to be confused with EventTemplate). This Event is a batched version of SelectionTemplate/GraphTemplate/EventTemplate and MetaData

preiteration() bool

A place holder for adding last minute behaviour changes to the iteration process. This can include loading specific caches or changing general behaviour, i.e. pre-fetching etc. By default this function returns False to indicate no errors occurred. If True is returned, the iterator will be nulled.

DumpTracer(retag: str | None) None

Preserve the index map of the samples within the tracer. The output of this is a set of HDF5 files, which are written in the form of their Logical File Names or original sample name.

Parameters:

retag (str, None) – Allows for tagging specific samples of the tracer to be tagged.

RestoreTracer(dict tracers = {}, sample_name: Union[None, str]) None

Restore the index map of the samples within the tracer.

Parameters:
  • tracers (dict) – Restore these HDF5 file directories

  • sample_name (None, str) – Restore only tracer samples with a particular sample name tag.

DumpEvents() None

Preserve the EventTemplates in HDF5 files.

DumpGraphs() None

Preserve the GraphTemplates in HDF5 files.

DumpSelections() None

Preserve the SelectionTemplates in HDF5 files.

RestoreEvents(list these_hashes = []) None

Restore EventTemplates matching a particular set of hashes.

Params list these_hashes:

A list of hashes consistent with events indexed by the tracer.

RestoreGraphs(list these_hashes = []) None

Restore GraphTemplates matching a particular set of hashes.

Params list these_hashes:

A list of hashes consistent with events indexed by the tracer.

RestoreSelections(list these_hashes = []) None

Restore SelectionTemplates matching a particular set of hashes.

Params list these_hashes:

A list of hashes consistent with events indexed by the tracer.

FlushEvents(list these_hashes = []) None

Delete EventTemplates matching a particular set of hashes from RAM

Params list these_hashes:

A list of hashes consistent with events indexed by the tracer.

FlushGraphs(list these_hashes = []) None

Delete GraphsTemplates matching a particular set of hashes from RAM.

Params list these_hashes:

A list of hashes consistent with events indexed by the tracer.

FlushSelections(list these_hashes = []) None

Delete SelectionTemplates matching a particular set of hashes from RAM.

Params list these_hashes:

A list of hashes consistent with events indexed by the tracer.

_makebar(inpt: int, CustTitle: None | str = None)

Creates a tqdm progress bar.

Params int inpt:

Length of the sample, i.e. the range of the bar.

Params None, str CustTitle:

Override the default progress prefix title (see Caller).

trace_code(obj) code_t

Preserve an object which is independent of the current file implementation (see Code).

Params obj:

Any Python object

rebuild_code(val: list | str | None) list[Code]

Rebuild a set of Code objects which mimic the originally traced code.

Params list, str, None val:

Rebuild these strings from the traced code of the SampleTracer.

ImportSettings(settings_t inpt) None

Apply settings from the input to the current SampleTracer.

Params settings_t inpt:

A dictionary like object with specific keys. See the Data Type and Dictionary Section.

ExportSettings -> settings_t

Export the current settings of the SampleTracer.

clone -> SampleTracer

Returns a copy of the current object SampleTracer object. This will NOT clone the content of the source tracer.

is_self(inpt, obj=SampleTracer) bool

Checks whether the input has a type consistent with the object type (also inherited objects are permitted).

Params inpt:

Any Python object

Params obj:

The target object type to check against, e.g. SampleTracer type.

makehashes() dict

Returns a dictionary of current hashes not found in RAM.

makelist() list[Event]

Returns a list of Event objects regardless if Templates are not loaded in memory.

AddEvent(event_inpt, meta_inpt=None) None

An internal function used to add EventTemplate to the sample tracer.

Params EventTemplate event_inpt:

The EventTemplate object to add.

Params MetaData meta_inpt:

An optional parameter that decorates the template with meta-data.

AddGraph(graph_inpt, meta_inpt=None) None

An internal function used to add GraphTemplate to the sample tracer.

Params GraphTemplate event_inpt:

The GraphTemplate object to add.

Params MetaData meta_inpt:

An optional parameter that decorates the template with meta-data.

AddSelections(selection_inpt, meta_inpt=None) None

An internal function used to add SelectionTemplate to the sample tracer.

Params SelectionTemplate event_inpt:

The SelectionTemplate object to add.

Params MetaData meta_inpt:

An optional parameter that decorates the template with meta-data.

SetAttribute(fx, str name) bool
Params callable fx:

A function used to apply to the GraphTemplate (this is an internal function).

Params str name:

The name of the feature to add.

Variables:
  • Tree (str) – Returns current ROOT Tree being used.

  • ShowTrees (list[str]) – Returns a list of ROOT Trees found within the index.

  • Event (Union[EventTemplate, Code]) – Specifies the an EventTemplate inherited event implementation to use for building Event objects from ROOT Files.

  • ShowEvents (list[str]) – Returns a list of EventTemplate implementations found within the index.

  • GetEvent (bool) – Forcefully get or ignore EventTemplate types from the Event object. This is useful to avoid redundant sample fetching from RAM.

  • EventCache (bool) – Specifies whether to generate a cache after constructing Event objects. If this is enabled without specifying a ProjectName, a folder called UNTITLED is generated.

  • EventName (str) – The event name to fetch from cache.

  • Graph (Union[GraphTemplate, Code]) – Specifies the event graph implementation to use for constructing graphs.

  • ShowGraphs (list[str]) – Returns a list of GraphTemplate implementations found within the index.

  • GetGraph (bool) – Forcefully get or ignore GraphTemplate types from the Graph object. This is useful to avoid redundant sample fetching from RAM.

  • DataCache (bool) – Specifies whether to generate a cache after constructing graph objects. If this is enabled without having an event cache, the Event attribute needs to be set.

  • GraphName (str) – The graph name to fetch from cache.

  • Selections (dict[str, SelectionTemplate or Code])

  • ShowSelections (list[str])

  • GetSelection (bool) – Forcefully get or ignore SelectionTemplate types from the Selection object. This is useful to avoid redundant sample fetching from RAM.

  • SelectionName (str) – The selection name to fetch from cache.

  • Optimizer (str) – Expects a string of the specific optimizer to use. Current choices are; SGD - Stochastic Gradient Descent and ADAM.

  • Scheduler (str) – Expects a string of the specific scheduler to use. Current choices are; ExponentialLR, CyclicLR. More can be added under the loss function class.

  • Model (Union[ModelWrapper, Code]) – The target model to be trained.

  • OptimizerParams (dict) – A dictionary containing the specific input parameters for the chosen Optimizer.

  • SchedulerParams (dict) – A dictionary containing the specific input parameters for the chosen Scheduler.

  • ModelParams (dict) – A dictionary used for initializing the model. This is only relevant if the model has input requirements to be initialized.

  • kFold (list[int]) – Explicitly use these kFolds during training. This can be quite useful when doing parallel traning, since each kFold is trained completely independently. The variable can be set to a single integer or list of integers

  • Epoch (int) – The epoch to start from.

  • kFolds (Union[list[int], int]) – Number of folds to use for training

  • Epochs (int) – Number of epochs to train the model with.

  • BatchSize (int) – How many Graphs to group into a single big graph (also known as batch training).

  • GetAll (bool) – Used to forcefully get all event hashes in the tracer index.

  • nHashes (int) – Shows the number of hashes that have been indexed.

  • ShowLength (dict) – Shows information about the number of hashes associated with a particular tree/event/graph/selection implementation.

  • EventStart (Union[int, None]) – The event to start from given a set of ROOT samples. Useful for debugging specific events.

  • EventStop (Union[int, None]) – The number of events to generate.

  • EnablePyAMI (bool) – Try to scan the input samples meta data on PyAmi.

  • Files (dict) – Files found under some specified directory.

  • SampleMap (dict) – A map of the sample names and associated ROOT samples.

  • ProjectName (str) – Specifies the output folder of the analysis. If the folder is non-existent, a folder will be created.

  • OutputDirectory (str) – Specifies the output directory of the analysis. This is useful if the output needs to be placed outside of the working directory.

  • WorkingPath (str) – Returns the current working path of the Analysis. Constructed as; OutputDirectory/ProjectName

  • RunName (str) – The name given to the particular training session of the Graph Neural Network.

  • Caller (str) – A string controlling the verbose information prefix.

  • Verbose (int) – An integer which increases the verbosity of the framework, with 3 being the highest and 0 the lowest.

  • DebugMode (bool) – Expects a boolean, if this is set to True, a complete print out of the training is displayed.

  • Chunks (int) – An integer which regulates the number of entries to process for each given core. This is particularly relevant when constructing events, as to avoid memory issues. As an example, if Threads is set to 2 and Chunks is set to 10, then 10 events will be processed per core.

  • Threads (int) – The number of CPU threads to use for running the framework. If the number of threads is set to 1, then the framework will not print a progress bar.

  • Device (str) – The device used to run PyTorch training on. Options are cuda or cpu.

  • TrainingName (str) – Name of the training sample to be used.

  • SortByNodes (bool) – Sort the input graph sample by nodes. This is useful when the model is node agnostic, but requires recomputation of internal variables based on variable graph node sizes. For instance, when computing the combinatorial of a graph, it is faster to compute the combinations for n-nodes and batch n-sized graphs into a single sample set.

  • ContinueTraining (bool) – Whether to continue the training from the last known checkpoint (after each epoch).

  • KinematicMap (dict) –

    An attribute enabling the mass reconstruction during and post GNN training. The following syntax is used to select a given feature from the GNN;

    <ana>.KinematicMap = {"<the feature to reconstruct>" : "<coordinate system (polar/cartesian)> -> pT, eta, phi, e"}
    

  • PlotLearningMetrics (bool) – Whether to output various metric plots whilst training. This can be enabled before training or re-run after training from the training cache.

  • MaxGPU (float) – This sets the upper limit of the GPU memory allowed during training/validation/testing.

  • MaxRAM (float) – Sets the upper limit of the RAM used by the framework. This is independent from the GPU memory and is predominantly used to monitor general memory usage. If the data index becomes greater than the specified limit, parts of the cache is purged from memory.