pygama.evt package¶

Utilities for grouping hit data into events.

Submodules¶

pygama.evt.aggregators module¶

This module provides aggregators to build the evt tier.

pygama.evt.aggregators.evaluate_at_channel(datainfo, tcm, channels, channels_skip, expr, field_list, ch_comp, pars_dict=None, default_value=nan, channel_mapping=None)¶

Aggregates by evaluating the expression at a given channel.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
tcm – TCM data arrays in an object that can be accessed by attribute.
channels – list of channels to be included for evaluation.
channels_skip – list of channels to be skipped from evaluation and set to default value.
expr – expression string to be evaluated.
field_list – list of dsp/hit/evt parameter tuples in expression (tier, field).
ch_comp – array of rawids at which the expression is evaluated.
pars_dict – dictionary of evt and additional parameters and their values.
default_value – default value.

Return type:

Array

pygama.evt.aggregators.evaluate_at_channel_vov(datainfo, tcm, expr, field_list, ch_comp, channels, channels_skip, pars_dict=None, default_value=nan, channel_mapping=None)¶

Same as evaluate_at_channel() but evaluates expression at non flat channels VectorOfVectors.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
tcm – TCM data arrays in an object that can be accessed by attribute.
expr – expression string to be evaluated.
field_list – list of dsp/hit/evt parameter tuples in expression (tier, field).
ch_comp – array of “rawid”s at which the expression is evaluated.
channels – list of channels to be included for evaluation.
channels_skip – list of channels to be skipped from evaluation and set to default value.
pars_dict – dictionary of evt and additional parameters and their values.
default_value – default value.

Return type:

VectorOfVectors

pygama.evt.aggregators.evaluate_to_aoesa(datainfo, tcm, channels, channels_skip, expr, field_list, query, n_rows, pars_dict=None, default_value=nan, missing_value=nan, channel_mapping=None)¶

Aggregates by returning an ArrayOfEqualSizedArrays of evaluated expressions of channels that fulfill a query expression.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
tcm – TCM data arrays in an object that can be accessed by attribute.
channels – list of channels to be aggregated.
channels_skip – list of channels to be skipped from evaluation and set to default value.
expr – expression string to be evaluated.
field_list – list of dsp/hit/evt parameter tuples in expression (tier, field).
query – query expression to mask aggregation.
n_rows – length of output VectorOfVectors.
ch_comp – array of “rawid”s at which the expression is evaluated.
pars_dict – dictionary of evt and additional parameters and their values.
default_value – default value.
missing_value – missing value.
sorter – sorts the entries in the vector according to sorter expression.

Return type:

ArrayOfEqualSizedArrays

pygama.evt.aggregators.evaluate_to_first_or_last(datainfo, tcm, channels, channels_skip, expr, field_list, query, n_rows, sorter, pars_dict=None, default_value=nan, is_first=True, channel_mapping=None)¶

Aggregates across channels by returning the expression of the channel with value of sorter.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
tcm – TCM data arrays in an object that can be accessed by attribute.
channels – list of channels to be aggregated.
channels_skip – list of channels to be skipped from evaluation and set to default value.
expr – expression string to be evaluated.
field_list – list of dsp/hit/evt parameter tuples in expression (tier, field).
query – query expression to mask aggregation.
n_rows – length of output array.
sorter – tuple of field in hit/dsp/evt tier to evaluate (tier, field).
pars_dict – dictionary of evt and additional parameters and their values.
default_value – default value.
is_first (bool) – defines if sorted by smallest or largest value of sorter

Return type:

Array

pygama.evt.aggregators.evaluate_to_scalar(datainfo, tcm, mode, channels, channels_skip, expr, field_list, query, n_rows, pars_dict=None, default_value=nan, channel_mapping=None)¶

Aggregates by summation across channels.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
tcm – TCM data arrays in an object that can be accessed by attribute.
mode – aggregation mode.
channels – list of channels to be aggregated.
channels_skip – list of channels to be skipped from evaluation and set to default value.
expr – expression string to be evaluated.
field_list – list of dsp/hit/evt parameter tuples in expression (tier, field).
query – query expression to mask aggregation.
n_rows – length of output array
pars_dict – dictionary of evt and additional parameters and their values.
default_value – default value.

Return type:

Array

pygama.evt.aggregators.evaluate_to_vector(datainfo, tcm, channels, channels_skip, expr, field_list, query, n_rows, pars_dict=None, default_value=nan, sorter=None, channel_mapping=None)¶

Aggregates by returning a VectorOfVector of evaluated expressions of channels that fulfill a query expression.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
tcm – TCM data arrays in an object that can be accessed by attribute.
channels – list of channels to be aggregated.
channels_skip – list of channels to be skipped from evaluation and set to default value.
expr – expression string to be evaluated.
field_list – list of dsp/hit/evt parameter tuples in expression (tier, field).
query – query expression to mask aggregation.
n_rows – length of output VectorOfVectors.
ch_comp – array of “rawids” at which the expression is evaluated.
pars_dict – dictionary of evt and additional parameters and their values.
default_value – default value.
sorter – sorts the entries in the vector according to sorter expression. ascend_by:<hit|dsp.field> results in an vector ordered ascending, decend_by:<hit|dsp.field> sorts descending.

Return type:

VectorOfVectors

pygama.evt.build_evt module¶

This module implements routines to build the evt tier.

pygama.evt.build_evt.build_evt(datainfo, config, wo_mode='write_safe', buffer_len=10000)¶

Transform data from hit-structured tiers to event-structured data.

Parameters:

datainfo (DataInfo | Mapping[str, Sequence[str, ...]]) –

input and output LH5 datainfo with HDF5 groups where tables are found, (see utils.DataInfo). Example:

# syntax: {"tier-name": ("file-name", "hdf5-group"[, "table-format"])}
{
  "tcm": ("data-tier_tcm.lh5", "hardware_tcm_1"),
  "dsp": ("data-tier_dsp.lh5", "dsp", "ch{}"),
  "hit": ("data-tier_hit.lh5", "hit", "ch{}"),
  "evt": ("data-tier_evt.lh5", "evt")
}

config (str | Mapping[str, ...]) –

name of configuration file or dictionary defining event fields. Channel lists can be defined by importing a metadata module.

channels specifies the channels used to for this field (either a string or a list of strings).
channel_mapping specifies a dictionary that maps the channel to a name
operations defines the event fields (name=key). If the key contains slahes it will be interpreted as the path to the output field inside nested sub-tables.
outputs defines the fields that are actually included in the output table.

Inside the operations block:

aggregation_mode defines how the channels should be combined (see evaluate_expression()).
expression defines the expression or function call to apply (see evaluate_expression()),
query defines an expression to mask the aggregation.
parameters defines any other parameter used in expression.
dtype defines the NumPy data type of the resulting data.
initial defines the initial/default value. Useful with some types of aggregators.

For example:

{
  "channels": {
    "geds_on": ["ch1084803", "ch1084804", "ch1121600"],
    "spms_on": ["ch1057600", "ch1059201", "ch1062405"],
    "muon": "ch1027202",
  },
  "channelmap" : {
    "ch1084803": "Gethin",
    "ch1084804": "Gertrude",
    "ch1121600": "Geoffrey",
    "ch1057600": "Simon",
    "ch1059201": "Sinbad",
    "ch1062405": "Silvia",
    "ch1027202": "Mulan"
    },
  "outputs": ["energy_id", "multiplicity"],
  "operations": {
    "energy_id":{
      "channels": "geds_on",
      "aggregation_mode": "gather",
      "query": "hit.cuspEmax_ctc_cal > 25",
      "expression": "tcm.table_key",
      "sort": "ascend_by:dsp.tp_0_est"
    },
    "energy":{
      "aggregation_mode": "keep_at_ch:evt.energy_id",
      "expression": "hit.cuspEmax_ctc_cal > 25"
    }
    "is_muon_rejected":{
      "channels": "muon",
      "aggregation_mode": "any",
      "expression": "dsp.wf_max > a",
      "parameters": {"a": 15100},
      "initial": false
    },
    "multiplicity":{
      "channels":  ["geds_on", "geds_no_psd", "geds_ac"],
      "aggregation_mode": "sum",
      "expression": "hit.cuspEmax_ctc_cal > a",
      "parameters": {"a": 25},
      "initial": 0
    },
    "t0":{
      "aggregation_mode": "keep_at_ch:evt.energy_id",
      "expression": "dsp.tp_0_est",
      "initial": "np.nan"
    },
    "lar_energy":{
      "channels": "spms_on",
      "aggregation_mode": "function",
      "expression": "pygama.evt.modules.spms.gather_pulse_data(<...>, observable='hit.energy_in_pe')"
    },
  }
}

wo_mode (str) – writing mode, see lh5.io.core.write().

Return type:

None | Table

pygama.evt.build_evt.build_evt_cols(datainfo, config, channels, wo_mode='write_safe', buffer_len=10000, channel_mapping=None)¶

Iterates through the TCM file and builds the event table according to the configuration file. The event table is written to the output file if an evt output file is specified otherwise it is returned.

Parameters:

datainfo (DataInfo | Mapping[str, Sequence[str, ...]]) –

input and output LH5 datainfo with HDF5 groups where tables are found, (see utils.DataInfo). Example:

# syntax: {"tier-name": ("file-name", "hdf5-group"[, "table-format"])}
{
  "tcm": ("data-tier_tcm.lh5", "hardware_tcm_1"),
  "dsp": ("data-tier_dsp.lh5", "dsp", "ch{}"),
  "hit": ("data-tier_hit.lh5", "hit", "ch{}"),
  "evt": ("data-tier_evt.lh5", "evt")
}

config (dict) – dict as defined in the build_evt() function.
channels (list) – list of channels to be used in the event table.
wo_mode (str) – writing mode, see lh5.io.core.write().
buffer_len – number of rows to be processed at once.
channel_mapping (dict | None) – dictionary that maps the channel to a name. This can be used in functions to get the channel name instead of the channel number.

Returns:

None – if an evt output file is specified otherwise the event table is returned.

Return type:

None | Table

pygama.evt.build_evt.evaluate_expression(datainfo, tcm, channels, channels_skip, mode, expr, n_rows, table=None, parameters=None, query=None, default_value=nan, sorter=None, channel_mapping=None)¶

Evaluates the expression defined by the user across all channels according to the mode.

Parameters:

datainfo (DataInfo | Mapping[str, Sequence[str, ...]]) – input and output LH5 files with HDF5 groups where tables are found. (see utils.DataInfo)
tcm (TCMData) – tcm data structure (see utils.TCMData)
channels (Sequence[str]) – list of channel names across which expression gets evaluated
channels_skip (Sequence[list]) – list of channels which get set to default value during evaluation. In function mode they are removed entirely
mode (str) –
The mode determines how the event entry is calculated across channels. Options are:
- first_at:sorter: aggregates across channels by returning the expression of the channel with smallest value of sorter.
- last_at: aggregates across channels by returning the expression of the channel with largest value of sorter.
- sum: aggregates by summation.
- any: aggregates by logical or.
- all: aggregates by logical and.
- keep_at_ch:ch_field: aggregates according to passed ch_field.
- keep_at_idx:tcm_idx_field: aggregates according to passed tcm index field.
- gather: channels are not combined, but result saved as VectorOfVectors.
- function: the function call specified in expr is evaluated, and the resulting column is inserted into the output table.
query (str | None) – a query that can mask the aggregation.
expr (str) – the expression. That can be any mathematical equation/comparison. If mode is function, the expression needs to be a special processing function defined in modules. In the expression, parameters from either evt or lower tiers (from operations performed before this one! Dictionary operations order matters), or from the parameters field can be used. Fields can be prefixed with the tier id (e.g. evt.energy or hit.quality_flag`).
n_rows (int) – number of rows to be processed.
table (Table) – table of evt tier data.
parameters (Mapping[str, Any] | None) – dictionary of parameters defined in the parameters field in the configuration dictionary.
default_value (bool | int | float) – default value of evaluation.
sorter (str | None) – can be used to sort vector outputs according to sorter expression (see evaluate_to_vector()).
channel_mapping (dict | None) – dictionary that maps the channel to a name. This can be used in functions to get the channel name instead of the channel number.

Return type:

Array | ArrayOfEqualSizedArrays | VectorOfVectors

Note

The specification of custom functions that can be used as expression is documented in modules.

pygama.evt.build_tcm module¶

pygama.evt.build_tcm._concat_tables(tbls)¶

pygama.evt.build_tcm.build_tcm(input_tables, coin_cols, hash_func='\\d+', coin_windows=0, window_refs='last', out_file=None, out_name='tcm', wo_mode='write_safe', buffer_len=None, out_fields=None)¶

Build a Time Coincidence Map (TCM).

Given a list of input tables, create an output table containing an entry list of coincidences among the inputs. Uses evt.tcm.generate_tcm_cols(). For use with the DataLoader.

Parameters:

input_tables (list[tuple[str, str | list[str]]]) – Each entry is (filename, table_name_pattern). table_name_pattern may be a string or list of strings. All tables matching each pattern in filename will be used as input tables.
coin_cols (str | list[str]) – Name of the column (or columns) in each table used to build coincidences. All input tables must contain these columns.
hash_func (str | None) – mapping of table names to integers for use in the TCM. hash_func is a regexp pattern that acts on each table name. The default hash_func r"\d+" pulls the first integer out of the table name. Setting to None will use a table’s index in input_tables.
coin_windows (float | list[float]) – Width of the clustering window(s). If a single value is supplied it will be used for all coin_cols.
window_refs (str | list[str]) – Window reference for the clustering window. Currently only "last" is implemented.
out_file (str | None) – name (including path) for the output file. If None, no file will be written; the TCM will just be returned in memory.
out_name (str) – name for the TCM table in the output file.
wo_mode (str) – mode to send to write().
out_fields (str | list[str] | None) – Optional additional fields to propagate from the input tables into the output TCM.

Returns:

lgdo.Table or None – If out_file is None the resulting TCM is returned as a lgdo.Table. Otherwise None is returned after writing the table to out_file.

Return type:

Table | None

pygama.evt.tcm module¶

class pygama.evt.tcm.coin_groups(name, window, window_ref)¶

Bases: tuple

Create new instance of coin_groups(name, window, window_ref)

_asdict()¶: Return a new dict which maps field names to their values.

_field_defaults = {}¶

_fields = ('name', 'window', 'window_ref')¶

classmethod _make(iterable)¶: Make a new coin_groups object from a sequence or iterable

_replace(**kwds)¶: Return a new coin_groups object replacing specified fields with new values

name¶: Alias for field number 0

window¶: Alias for field number 1

window_ref¶: Alias for field number 2

pygama.evt.tcm.generate_tcm_cols(iterators, coin_windows=0, table_keys=None, row_in_tables=None, fields=None)¶

Generate the columns of a time coincidence map.

Generate the columns of a time coincidence map from a list of arrays of coincidence data (e.g. hit times from different channels). Returns 3 numpy.ndarrays representing a vector-of-vector-like structure: two flattened arrays table_key (e.g. channel number) and row_in_table (e.g. hit index) that specify the location in the input coin_data of each datum belonging to a coincidence event, and a cumulative_length array that specifies which rows of the other two output arrays correspond to which coincidence event. These can be used to retrieve other data at the same tier as the input data into coincidence structures.

The 0’th entry of cumulative_length contains the number of hits in the zeroth coincidence event, and the i’th entry is set to cumulative_length[i-1] plus the number of hits in the i’th event. Thus, the hits of the i’th event can be found in rows cumulative_length[i-1] to cumulative_length[i] - 1 of table_key and row_in_table.

An example: cumulative_length = [4, 7, ...]. Then rows 0 to 3 in table_key and row_in_table correspond to the hits in event 0, rows 4 to 6 correspond to event 1, and so on.

This implementation uses Awkward Arrays for concatenation, sorting, and clustering.

Parameters:

coin_data – a list of arrays of the data to be clustered.
coin_window – the clustering window width. coin_data within the coin_window get aggregated into the same coincidence cluster. A value of 0 means an equality test.
window_ref –
when testing one datum for inclusion in a cluster, test if it is within coin_window of
- "first" – the first element in the cluster (rigid window width)
- "last" – the last element in the cluster (window grows until two data are separated by more than coin_window)
table_keys (list[int] | None) – if provided, use table_keys in place of “index in coin_data” as the integer corresponding to each element of coin_data (e.g. a channel number).
row_in_tables (list[int] | None) – if provided, use these values in places of the DataFrame index for the return values of row_in_table.

Returns:

col_dict – keys are cumulative_length, table_key, and row_in_table. cumulative_length specifies which rows of the other two output arrays correspond to which coincidence event. table_key and row_in_table specify the location in coin_data of each datum belonging to the coincidence event.

Return type:

dict[ndarray]

pygama.evt.utils module¶

This module provides utilities to build the evt tier.

class pygama.evt.utils.DataInfo(raw, tcm, evt)¶

Bases: tuple

Create new instance of DataInfo(raw, tcm, evt)

_asdict()¶: Return a new dict which maps field names to their values.

_field_defaults = {'evt': None, 'raw': None, 'tcm': None}¶

_fields = ('raw', 'tcm', 'evt')¶

classmethod _make(iterable)¶: Make a new DataInfo object from a sequence or iterable

_replace(**kwds)¶: Return a new DataInfo object replacing specified fields with new values

evt¶: Alias for field number 2

raw¶: Alias for field number 0

tcm¶: Alias for field number 1

class pygama.evt.utils.H5DataLoc(file, group, table_fmt)¶

Bases: tuple

Create new instance of H5DataLoc(file, group, table_fmt)

_asdict()¶: Return a new dict which maps field names to their values.

_field_defaults = {'file': None, 'group': None, 'table_fmt': None}¶

_fields = ('file', 'group', 'table_fmt')¶

classmethod _make(iterable)¶: Make a new H5DataLoc object from a sequence or iterable

_replace(**kwds)¶: Return a new H5DataLoc object replacing specified fields with new values

file¶: Alias for field number 0

group¶: Alias for field number 1

table_fmt¶: Alias for field number 2

class pygama.evt.utils.TCMData(table_key, row_in_table)¶

Bases: tuple

Create new instance of TCMData(table_key, row_in_table)

_asdict()¶: Return a new dict which maps field names to their values.

_field_defaults = {}¶

_fields = ('table_key', 'row_in_table')¶

classmethod _make(iterable)¶: Make a new TCMData object from a sequence or iterable

_replace(**kwds)¶: Return a new TCMData object replacing specified fields with new values

row_in_table¶: Alias for field number 1

table_key¶: Alias for field number 0

pygama.evt.utils.copy_lgdo_attrs(obj)¶

pygama.evt.utils.find_parameters(datainfo, ch, idx_ch, field_list)¶

Finds and returns parameters from non tcm, evt tiers.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
ch – “rawid” in the tiers.
idx_ch – index array of entries to be read from datainfo.
field_list – list of tuples (tier, field) to be found in non tcm, evt tiers.

Return type:

dict

pygama.evt.utils.get_data_at_channel(datainfo, ch, tcm, expr, field_list, pars_dict)¶

Evaluates an expression and returns the result.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
ch – “rawid” of channel to be evaluated.
tcm – TCM data arrays in an object that can be accessed by attribute.
expr – expression to be evaluated.
field_list – list of parameter-tuples (root_group, field) found in the expression.
pars_dict – dict of additional parameters that are not channel dependent.
is_evaluated – if false, the expression does not get evaluated but an array of default values is returned.
default_value – default value.

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

pygama.evt.utils.get_mask_from_query(datainfo, query, length, ch, idx_ch)¶

Evaluates a query expression and returns a mask accordingly.

Parameters:

datainfo – input and output LH5 datainfo with HDF5 groups where tables are found.
query – query expression.
length – length of the return mask.
ch – “rawid” of channel to be evaluated.
idx_ch – channel indices to be read.

Return type:

ndarray[tuple[Any, …], dtype[_ScalarT]]

pygama.evt.utils.get_table_name_by_pattern(table_id_fmt, ch_id)¶

Return type:: str

pygama.evt.utils.get_tcm_id_by_pattern(table_id_fmt, ch)¶

Return type:: int

pygama.evt.utils.make_files_config(data)¶

pygama.evt.utils.make_numpy_full(size, fill_value, try_dtype)¶

pygama.evt package¶

Subpackages¶

Submodules¶

pygama.evt.aggregators module¶

pygama.evt.build_evt module¶

pygama.evt.build_tcm module¶

pygama.evt.tcm module¶

pygama.evt.utils module¶