pygama.lgdo package#

Pygama works with “LEGEND Data Objects” (LGDO) defined in the LEGEND data format specification. This subpackage serves as the Python implementation of that specification. The general strategy for the implementation is to dress standard Python and NumPy objects with an attr dictionary holding LGDO metadata, plus some convenience functions. The basic data object classes are:

  • Scalar: typed Python scalar. Access data via the value attribute

  • Array: basic numpy.ndarray. Access data via the nda attribute.

  • FixedSizeArray: basic numpy.ndarray. Access data via the nda attribute.

  • ArrayOfEqualSizedArrays: multi-dimensional numpy.ndarray. Access data via the nda attribute.

  • VectorOfVectors: a variable length array of variable length arrays. Implemented as a pair of Array: flattened_data holding the raw data, and cumulative_length whose ith element is the sum of the lengths of the vectors with index <= i

  • Struct: a dictionary containing LGDO objects. Derives from dict

  • Table: a Struct whose elements (“columns”) are all array types with the same length (number of rows)

Currently the primary on-disk format for LGDO object is LEGEND HDF5 (LH5) files. IO is done via the class lh5_store.LH5Store. LH5 files can also be browsed easily in python like any HDF5 file using h5py.

Submodules#

pygama.lgdo.array module#

Implements a LEGEND Data Object representing an n-dimensional array and corresponding utilities.

class pygama.lgdo.array.Array(nda: Optional[ndarray] = None, shape: tuple[int, ...] = (), dtype: Optional[dtype] = None, fill_val: Optional[Union[float, int]] = None, attrs: Optional[dict[str, Any]] = None)#

Bases: object

Holds an numpy.ndarray and attributes.

Array (and the other various array types) holds an nda instead of deriving from numpy.ndarray for the following reasons:

  • It keeps management of the nda totally under the control of the user. The user can point it to another object’s buffer, grab the nda and toss the Array, etc.

  • It allows the management code to send just the nda’s the central routines for data manpulation. Keeping LGDO’s out of that code allows for more standard, reusable, and (we expect) performant Python.

  • It allows the first axis of the nda to be treated as “special” for storage in Tables.

Parameters:
  • nda (np.ndarray) – An numpy.ndarray to be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.

  • shape (tuple[int, ...]) – A numpy-format shape specification for shape of the internal ndarray. Required if nda is None, otherwise unused.

  • dtype (np.dtype) – Specifies the type of the data in the array. Required if nda is None, otherwise unused.

  • fill_val (float | int) – If None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is not None, this parameter is ignored.

  • attrs (dict[str, Any]) – A set of user attributes to be carried along with this LGDO.

datatype_name() str#

The name for this LGDO’s datatype attribute.

Return type:

str

form_datatype() str#

Return this LGDO’s datatype attribute string.

Return type:

str

resize(new_size: int) None#

Resize the array to new_size.

pygama.lgdo.arrayofequalsizedarrays module#

Implements a LEGEND Data Object representing an array of equal-sized arrays and corresponding utilities.

class pygama.lgdo.arrayofequalsizedarrays.ArrayOfEqualSizedArrays(dims: Optional[tuple[int, ...]] = None, nda: Optional[ndarray] = None, shape: tuple[int, ...] = (), dtype: Optional[dtype] = None, fill_val: Optional[Union[float, int]] = None, attrs: Optional[dict[str, Any]] = None)#

Bases: Array

An array of equal-sized arrays.

Arrays of equal size within a file but could be different from application to application. Canonical example: array of same-length waveforms.

Parameters:
  • dims (tuple[int, ...]) – specifies the dimensions required for building the ArrayOfEqualSizedArraysdatatype attribute.

  • nda (numpy.ndarray) – An numpy.ndarray to be used for this object’s internal array. Note: the array is used directly, not copied. If not supplied, internal memory is newly allocated based on the shape and dtype arguments.

  • shape (tuple[int, ...]) – A NumPy-format shape specification for shape of the internal array. Required if nda is None, otherwise unused.

  • dtype (numpy.dtype) – Specifies the type of the data in the array. Required if nda is None, otherwise unused.

  • fill_val (int | float) – If None, memory is allocated without initialization. Otherwise, the array is allocated with all elements set to the corresponding fill value. If nda is not None, this parameter is ignored.

  • attrs (dict[str, Any]) – A set of user attributes to be carried along with this LGDO.

Notes

If shape is not “1D array of arrays of shape given by axes 1-N” (of nda) then specify the dimensionality split in the constructor.

See also

Array

datatype_name() str#

Returns the name for this LGDO’s datatype attribute.

Return type:

str

form_datatype() str#

Return this LGDO’s datatype attribute string.

Return type:

str

pygama.lgdo.fixedsizearray module#

Implements a LEGEND Data Object representing an n-dimensional array of fixed size and corresponding utilities.

class pygama.lgdo.fixedsizearray.FixedSizeArray(nda: Optional[ndarray] = None, shape: tuple[int, ...] = (), dtype: Optional[dtype] = None, fill_val: Optional[Union[float, int]] = None, attrs: Optional[dict[str, Any]] = None)#

Bases: Array

An array of fixed-size arrays.

Arrays with guaranteed shape along axes > 0: for example, an array of vectors will always length 3 on axis 1, and it will never change from application to application. This data type is used for optimized memory handling on some platforms. We are not that sophisticated so we are just storing this identification for LGDO validity, i.e. for now this class is just an alias for Array, but keeps track of the datatype name.

See also

Array

datatype_name() str#

The name for this object’s HDF5 datatype attribute.

Return type:

str

pygama.lgdo.lgdo_utils module#

Implements utilities for LEGEND Data Objects.

pygama.lgdo.lgdo_utils.expand_path(path: str, list: bool = False) str | list#

Expand environment variables and wildcards to return absolute path

Parameters:
  • path (str) – name of path, which may include environment variables and wildcards

  • list (bool) – if True, return a list. If False, return a string; if False and a unique file is not found, raise an Exception

Returns:

path or list of paths – Unique absolute path, or list of all absolute paths

Return type:

str | list

pygama.lgdo.lgdo_utils.get_element_type(obj: object) str#

Get the LGDO element type of a scalar or array.

For use in LGDO datatype attributes.

Parameters:

obj (object) – if a str, will automatically return string if the object has a numpy.dtype, that will be used for determining the element type otherwise will attempt to case the type of the object to a numpy.dtype.

Returns:

element_type – A string stating the determined element type of the object.

Return type:

str

pygama.lgdo.lgdo_utils.parse_datatype(datatype: str) tuple[str, tuple[int, ...], str | list[str]]#

Parse datatype string and return type, dimensions and elements.

Parameters:

datatype (str) – a LGDO-formatted datatype string.

Returns:

element_type – the datatype name dims if not None, a tuple of dimensions for the LGDO. Note this is not the same as the NumPy shape of the underlying data object. See the LGDO specification for more information. Also see ArrayOfEqualSizedArrays and lh5_store.LH5Store.read_object() for example code elements for numeric objects, the element type for struct-like objects, the list of fields in the struct.

Return type:

tuple[str, tuple[int, …], str | list[str]]

pygama.lgdo.lh5_store module#

This module implements routines from reading and writing LEGEND Data Objects in HDF5 files.

class pygama.lgdo.lh5_store.LH5Iterator(lh5_files: str | list[str], group: str, base_path: str = '', entry_list: Optional[Union[list[int], list[list[int]]]] = None, entry_mask: Optional[Union[list[bool], list[list[bool]]]] = None, field_mask: Optional[Union[dict[str, bool], list[str], tuple[str]]] = None, buffer_len: int = 3200)#

Bases: object

A class for iterating through one or more LH5 files, one block of entries at a time. This also accepts an entry list/mask to enable event selection, and a field mask.

This class can be used either for random access:

>>> lh5_obj, n_rows = lh5_it.read(entry)

to read the block of entries starting at entry. In case of multiple files or the use of an event selection, entry refers to a global event index across files and does not count events that are excluded by the selection.

This can also be used as an iterator:

>>> for lh5_obj, entry, n_rows in LH5Iterator(...):
>>>    # do the thing!

This is intended for if you are reading a large quantity of data but want to limit your memory usage (particularly when reading in waveforms!). The lh5_obj that is read by this class is reused in order to avoid reallocation of memory; this means that if you want to hold on to data between reads, you will have to copy it somewhere!

Parameters:
  • lh5_files (str | list[str]) – file or files to read from. May include wildcards and environment variables.

  • group (str) – HDF5 group to read.

  • base_path (str) – HDF5 path to prepend.

  • entry_list (list[int] | list[list[int]]) – list of entry numbers to read. If a nested list is provided, expect one top-level list for each file, containing a list of local entries. If a list of ints is provided, use global entries.

  • entry_mask (list[bool] | list[list[bool]]) – mask of entries to read. If a list of arrays is provided, expect one for each file. Ignore if a selection list is provided.

  • field_mask (dict[str, bool] | list[str] | tuple[str]) – mask of which fields to read. See LH5Store.read_object() for more details.

  • buffer_len (int) – number of entries to read at a time while iterating through files.

read(entry: int) tuple[Union[pygama.lgdo.array.Array, pygama.lgdo.scalar.Scalar, pygama.lgdo.struct.Struct, pygama.lgdo.vectorofvectors.VectorOfVectors], int]#

Read the next chunk of events, starting at entry. Return the LH5 buffer and number of rows read.

Return type:

tuple[Union[pygama.lgdo.array.Array, pygama.lgdo.scalar.Scalar, pygama.lgdo.struct.Struct, pygama.lgdo.vectorofvectors.VectorOfVectors], int]

class pygama.lgdo.lh5_store.LH5Store(base_path: str = '', keep_open: bool = False)#

Bases: object

Class to represent a store of LEGEND HDF5 files. The two main methods implemented by the class are read_object() and write_object().

Examples

>>> from pygama.lgdo import LH5Store
>>> store = LH5Store()
>>> obj, _ = store.read_object("/geds/waveform", "file.lh5")
>>> type(obj)
pygama.lgdo.waveform_table.WaveformTable
Parameters:
  • base_path (str) – directory path to prepend to LH5 files.

  • keep_open (bool) – whether to keep files open by storing the h5py objects as class attributes.

get_buffer(name: str, lh5_file: str | h5py._hl.files.File | list[str | h5py._hl.files.File], size: Optional[int] = None, field_mask: Optional[Union[dict[str, bool], list[str], tuple[str]]] = None) Union[Array, Scalar, Struct, VectorOfVectors]#

Returns an LH5 object appropriate for use as a pre-allocated buffer in a read loop. Sets size to size if object has a size.

Return type:

Union[Array, Scalar, Struct, VectorOfVectors]

gimme_file(lh5_file: str | h5py._hl.files.File, mode: str = 'r') File#

Returns a h5py file object from the store or creates a new one.

Parameters:
  • lh5_file (str | h5py._hl.files.File) – LH5 file name.

  • mode (str) – mode in which to open file. See h5py.File documentation.

Return type:

File

gimme_group(group: str, base_group: Group, grp_attrs: Optional[dict[str, Any]] = None, overwrite: bool = False) Group#

Returns an existing h5py group from a base group or creates a new one. Can also set (or replace) group attributes.

Parameters:
  • group (str) – name of the HDF5 group.

  • base_group (Group) – HDF5 group to be used as a base.

  • grp_attrs (Optional[dict[str, Any]]) – HDF5 group attributes.

  • overwrite (bool) – whether overwrite group attributes, ignored is grp_attrs is None.

Return type:

Group

read_n_rows(name: str, lh5_file: str | h5py._hl.files.File) int | None#

Look up the number of rows in an Array-like object called name in lh5_file.

Return None if it is a Scalar or a Struct.

Return type:

int | None

read_object(name: str, lh5_file: str | h5py._hl.files.File | list[str | h5py._hl.files.File], start_row: int = 0, n_rows: int = 9223372036854775807, idx: Optional[Union[ndarray, list, tuple, list[numpy.ndarray | list | tuple]]] = None, field_mask: Optional[Union[dict[str, bool], list[str], tuple[str]]] = None, obj_buf: Optional[Union[Array, Scalar, Struct, VectorOfVectors]] = None, obj_buf_start: int = 0) tuple[Union[pygama.lgdo.array.Array, pygama.lgdo.scalar.Scalar, pygama.lgdo.struct.Struct, pygama.lgdo.vectorofvectors.VectorOfVectors], int]#

Read LH5 object data from a file.

Parameters:
  • name (str) – Name of the LH5 object to be read (including its group path).

  • lh5_file (str | h5py._hl.files.File | list[str | h5py._hl.files.File]) – The file(s) containing the object to be read out. If a list of files, array-like object data will be concatenated into the output object.

  • start_row (int) – Starting entry for the object read (for array-like objects). For a list of files, only applies to the first file.

  • n_rows (int) – The maximum number of rows to read (for array-like objects). The actual number of rows read will be returned as one of the return values (see below).

  • idx (Optional[Union[ndarray, list, tuple, list[numpy.ndarray | list | tuple]]]) – For NumPy-style “fancying indexing” for the read. Used to read out rows that pass some selection criteria. Only selection along the first axis is supported, so tuple arguments must be one-tuples. If n_rows is not false, idx will be truncated to n_rows before reading. To use with a list of files, can pass in a list of idx’s (one for each file) or use a long contiguous list (e.g. built from a previous identical read). If used in conjunction with start_row and n_rows, will be sliced to obey those constraints, where n_rows is interpreted as the (max) number of selected values (in idx) to be read out.

  • field_mask (Optional[Union[dict[str, bool], list[str], tuple[str]]]) – For tables and structs, determines which fields get written out. Only applies to immediate fields of the requested objects. If a dict is used, a default dict will be made with the default set to the opposite of the first element in the dict. This way if one specifies a few fields at False, all but those fields will be read out, while if one specifies just a few fields as True, only those fields will be read out. If a list is provided, the listed fields will be set to True, while the rest will default to False.

  • obj_buf (Optional[Union[Array, Scalar, Struct, VectorOfVectors]]) – Read directly into memory provided in obj_buf. Note: the buffer will be expanded to accommodate the data requested. To maintain the buffer length, send in n_rows = len(obj_buf).

  • obj_buf_start (int) – Start location in obj_buf for read. For concatenating data to array-like objects.

Returns:

(object, n_rows_read)object is the read-out object n_rows_read is the number of rows successfully read out. Essential for arrays when the amount of data is smaller than the object buffer. For scalars and structs n_rows_read will be``1``. For tables it is redundant with table.loc.

Return type:

tuple[Union[pygama.lgdo.array.Array, pygama.lgdo.scalar.Scalar, pygama.lgdo.struct.Struct, pygama.lgdo.vectorofvectors.VectorOfVectors], int]

write_object(obj: Union[Array, Scalar, Struct, VectorOfVectors], name: str, lh5_file: str | h5py._hl.files.File, group: str | h5py._hl.group.Group = '/', start_row: int = 0, n_rows: Optional[int] = None, wo_mode: str = 'append', write_start: int = 0) None#

Write an LGDO into an LH5 file.

Parameters:
  • obj (Union[Array, Scalar, Struct, VectorOfVectors]) – LH5 object. if object is array-like, writes n_rows starting from start_row in obj.

  • name (str) – name of the object in the output HDF5 file.

  • lh5_file (str | h5py._hl.files.File) – HDF5 file name or h5py.File object.

  • group (str | h5py._hl.group.Group) – HDF5 group name or h5py.Group object in which obj should be written.

  • start_row (int) – first row in obj to be written.

  • n_rows (Optional[int]) – number of rows in obj to be written.

  • wo_mode (str) –

    • write_safe or w: only proceed with writing if the object does not already exist in the file.

    • append or a: append along axis 0 (the first dimension) of array-like objects and array-like subfields of structs. Scalar objects get overwritten.

    • overwrite or o: replace data in the file if present, starting from write_start. Note: overwriting with write_start = end of array is the same as append.

    • overwrite_file or of: delete file if present prior to writing to it. write_start should be 0 (its ignored).

  • write_start (int) – row in the output file (if already existing) to start overwriting from.

pygama.lgdo.lh5_store._make_fd_idx(starts, stops, idx)#
pygama.lgdo.lh5_store.load_dfs(f_list: str | list[str], par_list: list[str], lh5_group: str = '', idx_list: Optional[list[numpy.ndarray | list | tuple]] = None) DataFrame#

Build a pandas.DataFrame from LH5 data.

Given a list of files (can use wildcards), a list of LH5 columns, and optionally the group path, return a pandas.DataFrame with all values for each parameter.

See also

load_nda()

Returns:

dataframe – contains columns for each parameter in par_list, and rows containing all data for the associated parameters concatenated over all files in f_list.

Return type:

DataFrame

pygama.lgdo.lh5_store.load_nda(f_list: str | list[str], par_list: list[str], lh5_group: str = '', idx_list: Optional[list[numpy.ndarray | list | tuple]] = None) dict[str, numpy.ndarray]#

Build a dictionary of numpy.ndarrays from LH5 data.

Given a list of files, a list of LH5 table parameters, and an optional group path, return a NumPy array with all values for each parameter.

Parameters:
  • f_list (str | list[str]) – A list of files. Can contain wildcards.

  • par_list (list[str]) – A list of parameters to read from each file.

  • lh5_group (str) – group path within which to find the specified parameters.

  • idx_list (Optional[list[numpy.ndarray | list | tuple]]) – for fancy-indexed reads. Must be one index array for each file in f_list.

Returns:

par_data – A dictionary of the parameter data keyed by the elements of par_list. Each entry contains the data for the specified parameter concatenated over all files in f_list.

Return type:

dict[str, numpy.ndarray]

pygama.lgdo.lh5_store.ls(lh5_file: str | h5py._hl.group.Group, lh5_group: str = '') list[str]#

Return a list of LH5 groups in the input file and group, similar to ls or h5ls. Supports wildcards in group names.

Parameters:
  • lh5_file (str | h5py._hl.group.Group) – name of file.

  • lh5_group (str) – group to search. add a / to the end of the group name if you want to list all objects inside that group.

Return type:

list[str]

pygama.lgdo.lh5_store.show(lh5_file: str | h5py._hl.group.Group, lh5_group: str = '/', indent: str = '', header: bool = True) None#

Print a tree of LH5 file contents with LGDO datatype.

Parameters:
  • lh5_file (str | h5py._hl.group.Group) – the LH5 file.

  • lh5_group (str) – print only contents of this HDF5 group.

  • indent (str) – indent the diagram with this string.

  • header (bool) – print lh5_group at the top of the diagram.

Examples

>>> from pygama.lgdo import show
>>> show("file.lh5", "/geds/raw")
/geds/raw
├── channel · array<1>{real}
├── energy · array<1>{real}
├── timestamp · array<1>{real}
├── waveform · table{t0,dt,values}
│   ├── dt · array<1>{real}
│   ├── t0 · array<1>{real}
│   └── values · array_of_equalsized_arrays<1,1>{real}
└── wf_std · array<1>{real}

pygama.lgdo.scalar module#

Implements a LEGEND Data Object representing a scalar and corresponding utilities.

class pygama.lgdo.scalar.Scalar(value: int | float, attrs: Optional[dict[str, Any]] = None)#

Bases: object

Holds just a scalar value and some attributes (datatype, units, …).

Parameters:
  • value (int | float) – the value for this scalar.

  • attrs (dict[str, Any]) – a set of user attributes to be carried along with this LGDO.

datatype_name() str#

Returns the name for this LGDO’s datatype attribute.

Return type:

str

form_datatype() str#

Return this LGDO’s datatype attribute string.

Return type:

str

pygama.lgdo.struct module#

Implements a LEGEND Data Object representing a struct and corresponding utilities.

class pygama.lgdo.struct.Struct(obj_dict: Optional[dict[str, Union[pygama.lgdo.scalar.Scalar, pygama.lgdo.array.Array, pygama.lgdo.vectorofvectors.VectorOfVectors, pygama.lgdo.struct.Struct]]] = None, attrs: Optional[dict[str, Any]] = None)#

Bases: dict

A dictionary of LGDO’s with an optional set of attributes.

After instantiation, add fields using add_field() to keep the datatype updated, or call update_datatype() after adding.

Parameters:
  • obj_dict (dict[str, LGDO | Struct]) – instantiate this Struct using the supplied named LGDO’s. Note: no copy is performed, the objects are used directly.

  • attrs (dict[str, Any]) – a set of user attributes to be carried along with this LGDO.

add_field(name: str, obj: Union[Scalar, Array, VectorOfVectors, Struct]) None#

Add a field to the table.

datatype_name() str#

The name for this LGDO’s datatype attribute.

Return type:

str

form_datatype() str#

Return this LGDO’s datatype attribute string.

Return type:

str

remove_field(name: str, delete: bool = False) None#

Remove a field from the table.

Parameters:
  • name (str) – name of the field to be removed

  • delete (bool) – if True, delete the field object by calling The del statement.

update_datatype() None#

pygama.lgdo.table module#

Implements a LEGEND Data Object representing a special struct of arrays of equal length and corresponding utilities.

class pygama.lgdo.table.Table(size: Optional[int] = None, col_dict: Optional[dict[str, Union[pygama.lgdo.scalar.Scalar, pygama.lgdo.array.Array, pygama.lgdo.vectorofvectors.VectorOfVectors, pygama.lgdo.struct.Struct]]] = None, attrs: Optional[dict[str, Any]] = None)#

Bases: Struct

A special struct of arrays or subtable columns of equal length.

Holds onto an internal read/write location loc that is useful in managing table I/O using functions like push_row(), is_full(), and clear().

Note

If you write to a table and don’t fill it up to its total size, be sure to resize it before passing to data processing functions, as they will call __len__() to access valid data, which returns the size attribute.

Parameters:
  • size (int) – sets the number of rows in the table. Arrays in col_dict will be resized to match size if both are not ``None`. If size is left as None, the number of table rows is determined from the length of the first array in col_dict. If neither is provided, a default length of 1024 is used.

  • col_dict (dict[str, LGDO]) – instantiate this table using the supplied named array-like LGDO’s. Note 1: no copy is performed, the objects are used directly. Note 2: if size is not None, all arrays will be resized to match it. Note 3: if the arrays have different lengths, all will be resized to match the length of the first array.

  • attrs (dict[str, Any]) – A set of user attributes to be carried along with this LGDO.

Notes

the loc attribute is initialized to 0.

add_column(name: str, obj: Union[Scalar, Struct, Array, VectorOfVectors], use_obj_size: bool = False, do_warn: bool = True) None#

Alias for add_field() using table terminology ‘column’.

add_field(name: str, obj: Union[Scalar, Struct, Array, VectorOfVectors], use_obj_size: bool = False, do_warn=True) None#

Add a field (column) to the table.

Use the name “field” here to match the terminology used in Struct.

Parameters:
  • name (str) – the name for the field in the table.

  • obj (Union[Scalar, Struct, Array, VectorOfVectors]) – the object to be added to the table.

  • use_obj_size (bool) – if True, resize the table to match the length of obj.

  • do_warn – print or don’t print useful info. Passed to resize() when use_obj_size is True.

clear() None.  Remove all items from D.#
datatype_name() str#

The name for this LGDO’s datatype attribute.

Return type:

str

eval(expr_config: dict) Table#

Apply column operations to the table and return a new table holding the resulting columns.

Currently defers all the job to numexpr.evaluate(). This might change in the future.

Parameters:

expr_config (dict) –

dictionary that configures expressions according the following specification:

{
    "O1": {
        "expression": "p1 + p2 * a**2",
        "parameters": {
            "p1": "2",
            "p2": "3"
        }
    },
    "O2": {
        "expression": "O1 - b"
    }
    // ...
}

where:

  • expression is an expression string supported by numexpr.evaluate() (see also here for documentation). Note: because of internal limitations, reduction operations must appear the last in the stack.

  • parameters is a dictionary of function parameters. Passed to numexpr.evaluate`() as local_dict argument.

Return type:

Table

Warning

Blocks in expr_config must be ordered according to mutual dependency.

get_dataframe(cols: Optional[list[str]] = None, copy: bool = False) DataFrame#

Get a pandas.DataFrame from the data in the table.

Notes

The requested data must be array-like, with the nda attribute.

Parameters:
  • cols (Optional[list[str]]) – a list of column names specifying the subset of the table’s columns to be added to the dataframe.

  • copy (bool) – When True, the dataframe allocates new memory and copies data into it. Otherwise, the raw nda’s from the table are used directly.

Return type:

DataFrame

is_full() bool#
Return type:

bool

join(other_table: Table, cols: Optional[list[str]] = None, do_warn: bool = True) None#

Add the columns of another table to this table.

Notes

Following the join, both tables have access to other_table’s fields (but other_table doesn’t have access to this table’s fields). No memory is allocated in this process. other_table can go out of scope and this table will retain access to the joined data.

Parameters:
  • other_table (Table) – the table whose columns are to be joined into this table.

  • cols (Optional[list[str]]) – a list of names of columns from other_table to be joined into this table.

  • do_warn (bool) – set to False to turn off warnings associated with mismatched loc parameter or add_column() warnings.

push_row() None#
remove_column(name: str, delete: bool = False) None#

Alias for remove_field() using table terminology ‘column’.

resize(new_size: Optional[int] = None, do_warn: bool = False) None#

pygama.lgdo.vectorofvectors module#

Implements a LEGEND Data Object representing a variable-length array of variable-length arrays and corresponding utilities.

class pygama.lgdo.vectorofvectors.VectorOfVectors(flattened_data: Optional[Array] = None, cumulative_length: Optional[Array] = None, shape_guess: Optional[tuple[int, int]] = None, dtype: Optional[dtype] = None, attrs: Optional[dict[str, Any]] = None)#

Bases: object

A variable-length array of variable-length arrays.

For now only a 1D vector of 1D vectors is supported. Internal representation is as two NumPy arrays, one to store the flattened data contiguosly and one to store the cumulative sum of lengths of each vector.

Parameters:
  • flattened_data (Array) – If not None, used as the internal memory array for flattened_data. Otherwise, an internal flattened_data is allocated based on shape_guess and dtype.

  • cumulative_length (Array) – If not None, used as the internal memory array for cumulative_length. Should be dtype numpy.uint32. If cumulative_length is None, an internal cumulative_length is allocated based on the first element of shape_guess.

  • shape_guess (tuple[int, int]) – A NumPy-format shape specification, required if either of flattened_data or cumulative_length are not supplied. The first element should not be a guess and sets the number of vectors to be stored. The second element is a guess or approximation of the typical length of a stored vector, used to set the initial length of flattened_data if it was not supplied.

  • dtype (np.dtype) – Sets the type of data stored in flattened_data. Required if flattened_data is None.

  • attrs (dict[str, Any]) – A set of user attributes to be carried along with this LGDO.

datatype_name() str#

The name for this LGDO’s datatype attribute.

Return type:

str

form_datatype() str#

Return this LGDO’s datatype attribute string.

Return type:

str

get_vector(i_vec: int) ndarray#

Get vector at index i_vec.

Return type:

ndarray

resize(new_size: int) None#
set_vector(i_vec: int, nda: ndarray) None#

Insert vector nda at location i_vec.

Notes

flattened_data is doubled in length until nda can be appended to it.

to_aoesa() ArrayOfEqualSizedArrays#

Convert to ArrayOfEqualSizedArrays, padding with NaNs

Return type:

ArrayOfEqualSizedArrays

pygama.lgdo.vectorofvectors.build_cl(sorted_array_in: Array, cumulative_length_out: Optional[ndarray] = None) ndarray#

build a cumulative_length array from an array of sorted data

So for example if sorted_array_in contains [ 3, 3, 3, 4 ], would return [ 2, 3 ]

For a sorted_array_in of indices, this is the inverse of explode_cl() below, in the sense that doing build_cl(explode_cl(cumulative_length)) would recover the original cumulative_length.

Parameters:
  • sorted_array_in (Array) – Array of data already sorted; each N matching contiguous entries will be converted into a new row of cumulative_length_out

  • cumulative_length_out (Optional[ndarray]) – This is an optional pre-allocated array for the output cumulative_length. It will always have length <= sorted_array_in, so giving them the same length is safe if there is not a better guess.

Returns:

cumulative_length_out – The output cumulative_length. If the user provides a cumulative_length_out that is too long, this return value is sliced to contain only the used portion of the allocated memory

Return type:

ndarray

pygama.lgdo.vectorofvectors.explode(cumulative_length: Array, array_in: Array, array_out: Optional[ndarray] = None) ndarray#

explode a data array using a cumulative_length array

This is identical to allocated_explode_cl, except array_in gets exploded instead of cumulative_length. So for example, if array_in = [ 3, 4 ] and cumulative_length = [ 2, 3 ], array_out would be [ 3, 3, 3, 4 ]

Parameters:
  • cumulative_length (Array) – the cumulative_length array to use for exploding

  • array_in (Array) – the data to be exploded. Must have same length as cumulative_length

  • array_out (Optional[ndarray]) – a pre-allocated array to hold the exploded data. The length should be equal to cumulative_length[-1]

Return type:

ndarray

pygama.lgdo.vectorofvectors.explode_arrays(cumulative_length: Array, arrays: list, out_arrays: Optional[list] = None) list#

explode a set of arrays using a cumulative_length array

Parameters:
  • cumulative_length (Array) – the cumulative_length array to use for exploding

  • arrays (list) – the data arrays to be exploded. Each array must have same length as cumulative_length

  • out_arrays (Optional[list]) – an optional list of pre-allocated arrays to hold the exploded data. The length of the list should be equal to the number of “arrays”, and each entry in array_out should have length cumulative_length[-1]. If not provided, output arrays are allocated for the user.

Return type:

list

pygama.lgdo.vectorofvectors.explode_cl(cumulative_length: Array, array_out: Optional[ndarray] = None) ndarray#

explode a cumulative_length array

So for example if cumulative_length is [ 2, 3 ], would return [ 0, 0, 0, 1]

This is the inverse of build_cl() above, in the sense that doing build_cl(explode_cl(cumulative_length)) would recover the original cumulative_length.

Parameters:
  • cumulative_length (Array) – the cumulative_length array to be exploded

  • array_out (Optional[ndarray]) – an optional pre-allocated array to hold the exploded cumulative_length. The length should be equal to cumulative_length[-1]

Returns:

array_out – the exploded cumulative_length array

Return type:

ndarray

pygama.lgdo.vectorofvectors.nb_build_cl(sorted_array_in: np.ndarray, cumulative_length_out: np.ndarray) np.ndarray#

numbified inner loop for build_cl

Return type:

np.ndarray

pygama.lgdo.vectorofvectors.nb_explode(cumulative_length: np.ndarray, array_in: np.ndarray, array_out: np.ndarray) np.ndarray#

numbified inner loop for explode

Return type:

np.ndarray

pygama.lgdo.vectorofvectors.nb_explode_cl(cumulative_length: np.ndarray, array_out: np.ndarray) np.ndarray#

numbified inner loop for explode_cl

Return type:

np.ndarray

pygama.lgdo.waveform_table module#

Implements a LEGEND Data Object representing a special Table to store blocks of one-dimensional time-series data.

class pygama.lgdo.waveform_table.WaveformTable(size: Optional[int] = None, t0: float | pygama.lgdo.array.Array | numpy.ndarray = 0, t0_units: Optional[str] = None, dt: float | pygama.lgdo.array.Array | numpy.ndarray = 1, dt_units: Optional[str] = None, values: Optional[Union[ArrayOfEqualSizedArrays, VectorOfVectors, ndarray]] = None, values_units: Optional[str] = None, values_adc_bit_depth: Optional[int] = None, wf_len: Optional[int] = None, dtype: Optional[dtype] = None, attrs: Optional[dict[str, Any]] = None)#

Bases: Table

An LGDO for storing blocks of (1D) time-series data.

A WaveformTable is an LGDO Table with the 3 columns t0, dt, and values:

  • t0[i] is a time offset (relative to a user-defined global reference) for the sample in values[i][0]. Implemented as an LGDO Array with optional attribute units.

  • dt[i] is the sampling period for the waveform at values[i]. Implemented as an LGDO Array with optional attribute units.

  • values[i] is the i’th waveform in the table. Internally, the waveforms values may be either an LGDO ArrayOfEqualSizedArrays<1,1> or as an LGDO VectorOfVectors that supports waveforms of unequal length. Can optionally be given a units attribute, as well as an adc_bit_depth attribute.

Note

On-disk and in-memory versions could be different e.g. if a compression routine is used.

Parameters:
  • size (int) – sets the number of rows in the table. If None, the size will be determined from the first among t0, dt, or values to return a valid length. If not None, t0, dt, and values will be resized as necessary to match size. If size is None and t0, dt, and values are all non-array-like, a default size of 1024 is used.

  • t0 (float | Array | np.ndarray) – \(t_0\) values to be used (or broadcast) to the t0 column.

  • t0_units (str) – units for the \(t_0\) values. If not None and t0 is an LGDO Array, overrides what’s in t0.

  • dt (float | Array | np.ndarray) – \(\delta t\) values (sampling period) to be used (or broadcasted) to the t0 column.

  • dt_units (str) – units for the dt values. If not None and dt is an LGDO Array, overrides what’s in dt.

  • values (ArrayOfEqualSizedArrays | VectorOfVectors | np.ndarray) – The waveform data to be stored in the table. If None a block of data is prepared based on the wf_len and dtype arguments.

  • values_units (str) – units for the waveform values. If not None and values is an LGDO Array, overrides what’s in values.

  • values_adc_bit_depth (int) – an integer for storing the ADC bit depth used to record this waveform

  • wf_len (int) – The length of the waveforms in each entry of a table. If None (the default), unequal lengths are assumed and VectorOfVectors is used for the values column. Ignored if values is a 2D ndarray, in which case values.shape[1] is used.

  • dtype (np.dtype) – The NumPy numpy.dtype of the waveform data. If values is not None, this argument is ignored. If both values and dtype are None, numpy.float64 is used.

  • attrs (dict[str, Any]) – A set of user attributes to be carried along with this LGDO.

property dt: Array#
property dt_units: str#
resize_wf_len(new_len: int) None#

Alias for wf_len.setter, for when we want to make it clear in the code that memory is being reallocated.

property t0: Array#
property t0_units: str#
property values: pygama.lgdo.arrayofequalsizedarrays.ArrayOfEqualSizedArrays | pygama.lgdo.vectorofvectors.VectorOfVectors#
property values_adc_bit_depth: str#
property values_units: str#
property wf_len: int#