The SampleTracer

A core module which is being inherited by all generator classes. This class focuses on tracking sample generation including events, graphs and selections, and how they are cached in memory. The class can be used as a standalone package, but is primarily intended to be integrated into any abstract generator class that one might want to implement, this will be illustrated later. Most of the functionalities, such as the magic functions are implemented within this class, making it a rather useful core module.

Methods and Attributes

class Event

A simple wrapper class used to batch cached objects into a single object. Upon calling for a specific attribute, the class will scan available objects for the attribute. Unlike most sub-modules within the package, this class has limited functionalities in terms of magic functions.

release_event() EventTemplate

A method which releases the event object from the sample tracer batch.

release_graph() GraphTemplate

A method which releases the graph object from the sample tracer batch.

release_selection() SelectionTemplate

A method which releases the selection object from the sample tracer batch.

event_cache_dir() dict

Returns a dictionary of the current caching directory of the given event name.

graph_cache_dir() dict

Returns a dictionary of the current caching directory of the given event name.

selection_cache_dir() dict

Returns a dictionary of the current caching directory of the given event name.

meta() MetaData

Returns a MetaData object for the current event.

__eq__() bool

Returns true if the events have the same hash.

__hash__() bool

Allows for the use of set and dict, where the event can be interpreted as a key in a dictionary.

hash -> str

Returns a the hash of the current event.

__getstate__ tuple[meta_t, batch_t]

Allows the event to be pickled.

__setstate__(tuple[meta_t, batch_t])

Rebuilds the Event from a meta_t and batch_t data type.

class SampleTracer
__getstate__ -> tracer_t

Export this tracer including all samples (selections, graphs, events) and state (settings).

__setstate__(tracer_t inpt)

Import tracer parameters including all samples (selections, graphs, events) and state (settings).

__getitem__(key: list | str) bool or list

Scan indexed content and return a list of matches or a boolean if nothing has been found.

__contains__(str val) bool

Check if query is in sample tracer.

__len__ -> int

Return length of the entire sample.

__add__(SampleTracer other) SampleTracer

Add two SampleTracers to create an independent SampleTracer. Content of both samples is compared and summed as a set.

__radd__(other) SampleTracer
__iadd__(SampleTracer other) SampleTracer

Append the incoming tracer object to this tracer.

__iter__()

Iteratate over the Sample Tracer with given parameters, e.g. cache type etc.

__next__ -> Event

The return of the iterator is an Event (Not to be confused with EventTemplate). This Event is a batched version of SelectionTemplate/GraphTemplate/EventTemplate and MetaData

preiteration -> bool

A place holder for adding last minute behaviour changes to the iteration process. This can include loading specific caches or changing general behaviour, i.e. pre-fetching etc.

DumpTracer(retag: str | None) None

Preserve the index map of the samples within the tracer. The output of this is a set of HDF5 files, which are written in the form of their Logical File Names or original sample name.

Parameters:

retag (str, None) – Allows for tagging specific samples of the tracer to be tagged.

RestoreTracer(dict tracers = {}, sample_name: Union[None, str]) None

Restore the index map of the samples within the tracer.

Parameters:
  • tracers (dict) – Restore these HDF5 file directories

  • sample_name (None, str) – Restore only tracer samples with a particular sample name tag.

DumpEvents -> None

Preserve the EventTemplates in HDF5 files.

DumpGraphs -> None

Preserve the GraphTemplates in HDF5 files.

DumpSelections -> None

Preserve the SelectionTemplates in HDF5 files.

RestoreEvents(list these_hashes = []) None

Restore EventTemplates matching a particular set of hashes. :params list these_hashes: A list of hashes consistent with events indexed by the tracer.

RestoreGraphs(list these_hashes = []) None

Restore GraphTemplates matching a particular set of hashes. :params list these_hashes: A list of hashes consistent with events indexed by the tracer.

RestoreSelections(list these_hashes = []) None

Restore SelectionTemplates matching a particular set of hashes. :params list these_hashes: A list of hashes consistent with events indexed by the tracer.

FlushEvents(list these_hashes = []) None

Delete EventTemplates matching a particular set of hashes from RAM :params list these_hashes: A list of hashes consistent with events indexed by the tracer.

FlushGraphs(list these_hashes = []) None

Delete GraphsTemplates matching a particular set of hashes from RAM. :params list these_hashes: A list of hashes consistent with events indexed by the tracer.

FlushSelections(list these_hashes = []) None

Delete SelectionTemplates matching a particular set of hashes from RAM. :params list these_hashes: A list of hashes consistent with events indexed by the tracer.

_makebar(inpt: int, CustTitle: None | str = None)

Creates a tqdm progress bar. :params int inpt: Length of the sample, i.e. the range of the bar. :params None, str CustTitle: Override the default progress prefix title (see Caller).

trace_code(obj) code_t

Preserve an object which is independent of the current file implementation (see Code). :params obj: Any Python object

rebuild_code(val: list | str | None) list[Code]

Rebuild a set of Code objects which mimic the originally traced code. :params list, str, None val: Rebuild these strings from the traced code of the SampleTracer.

ImportSettings(settings_t inpt) None

Apply settings from the input to the current SampleTracer. :params settings_t inpt: A dictionary like object with specific keys. See the Data Type and Dictionary Section.

ExportSettings -> settings_t

Export the current settings of the SampleTracer.

clone -> SampleTracer

Returns a copy of the current object SampleTracer object. This will NOT clone the content of the source tracer.

is_self(inpt, obj=SampleTracer) bool

Checks whether the input has a type consistent with the object type (also inherited objects are permitted). :params inpt: Any Python object :params obj: The target object type to check against, e.g. SampleTracer type.

makehashes() dict

Returns a dictionary of current hashes not found in RAM.

makelist() list[Event]

Returns a list of Event objects regardless if Templates are not loaded in memory.

AddEvent(event_inpt, meta_inpt=None) None
AddGraph(graph_inpt, meta_inpt=None) None
AddSelections(selection_inpt, meta_inpt=None) None
SetAttribute(fx, str name) bool
Tree -> str

Returns current ROOT Tree being used.

ShowTrees -> list[str]

Returns a list of ROOT Trees found within the index.

Event -> EventTemplate or Code

Specifies the an EventTemplate inherited event implementation to use for building Event objects from ROOT Files.

ShowEvents -> list[str]

Returns a list of EventTemplate implementations found within the index.

GetEvent -> bool

Forcefully get or ignore EventTemplate types from the Event object. This is useful to avoid redundant sample fetching from RAM.

EventCache -> bool

Specifies whether to generate a cache after constructing Event objects. If this is enabled without specifying a ProjectName, a folder called UNTITLED is generated.

EventName -> str
Graph -> GraphTemplate or Code

Specifies the event graph implementation to use for constructing graphs.

ShowGraphs -> list[str]
GetGraph -> bool
DataCache -> bool

Specifies whether to generate a cache after constructing graph objects. If this is enabled without having an event cache, the Event attribute needs to be set.

GraphName -> str
Selections -> dict[str, SelectionTemplate or Code]
ShowSelections -> list[str]
GetSelection -> bool
SelectionName -> str
Optimizer -> str

Expects a string of the specific optimizer to use. Current choices are; SGD - Stochastic Gradient Descent and ADAM.

Scheduler -> str

Expects a string of the specific scheduler to use. Current choices are - ExponentialLR - CyclicLR

Model -> ModelWrapper or Code

The target model to be trained.

OptimizerParams -> dict

A dictionary containing the specific input parameters for the chosen Optimizer.

SchedulerParams -> dict

A dictionary containing the specific input parameters for the chosen Scheduler.

ModelParams -> dict
kFold -> list[int]

Explicitly use these kFolds during training. This can be quite useful when doing parallel traning, since each kFold is trained completely independently. The variable can be set to a single integer or list of integers

Epoch -> int
kFolds -> int

Number of folds to use for training

Epochs -> int
BatchSize -> int

How many Graphs to group into a single big graph (also known as batch training).

GetAll -> bool
nHashes -> int
ShowLength -> dict
EventStart -> int or None

The event to start from given a set of ROOT samples. Useful for debugging specific events.

EventStop -> int or None

The number of events to generate.

EnablePyAMI -> bool
Files -> dict
SampleMap -> dict
ProjectName -> str

Specifies the output folder of the analysis. If the folder is non-existent, a folder will be created.

OutputDirectory -> str

Specifies the output directory of the analysis. This is useful if the output needs to be placed outside of the working directory.

WorkingPath -> str

Returns the current working path of the Analysis. Constructed as; OutputDirectory/ProjectName

RunName -> str

The name given to the particular training session of the Graph Neural Network.

Caller -> str

A string controlling the verbose information prefix.

Verbose -> int

An integer which increases the verbosity of the framework, with 3 being the highest and 0 the lowest.

DebugMode -> bool

Expects a boolean, if this is set to True, a complete print out of the training is displayed.

Chunks -> int

An integer which regulates the number of entries to process for each given core. This is particularly relevant when constructing events, as to avoid memory issues. As an example, if Threads is set to 2 and chnk is set to 10, then 10 events will be processed per core.

Threads -> int

The number of CPU threads to use for running the framework. If the number of threads is set to 1, then the framework will not print a progress bar.

Device -> str

The device used to run PyTorch training on. Options are cuda or cpu.

TrainingName -> str

Name of the training sample to be used.

SortByNodes -> bool

Sort the input graph sample by nodes. This is useful when the model is node agnostic, but requires recomputation of internal variables based on variable graph node sizes. For instance, when computing the combinatorial of a graph, it is faster to compute the combinations for n-nodes and batch n-sized graphs into a single sample set.

ContinueTraining -> bool

Whether to continue the training from the last known checkpoint (after each epoch).

KinematicMap -> dict

place holder

EnableReconstruction -> bool

place holder

PlotLearningMetrics -> bool

Whether to output various metric plots whilst training. This can be enabled before training or re-run after training from the training cache.