pygama.raw package#

The primary function for data conversion into raw-tier LH5 files is build_raw(). This is a one-to many function: one input DAQ file can generate one or more output raw files. Control of which data ends up in which files, and in which HDF5 groups inside of each file, is controlled via raw_buffer (see below). If no raw buffers specification is specified, all decoded data should be written to a single output file, with all fields from each hardware decoder in their own output table.

Currently we support the following hardware:

  • FlashCam ADC (requires fcutils)

  • FlashCam ADC read out with ORCA

Subpackages#

Submodules#

pygama.raw.build_raw module#

pygama.raw.build_raw.build_raw(in_stream: int, in_stream_type: Optional[str] = None, out_spec: Optional[Union[str, dict, RawBufferLibrary]] = None, buffer_size: int = 8192, n_max: int = inf, overwrite: bool = False, **kwargs) None#

Convert data into LEGEND HDF5 raw-tier format.

Takes an input stream of a given type and writes to output file(s) according to the user’s a specification.

Parameters:
  • in_stream (int) – the name of the input stream to be converted. Typically a filename, including path. Can use environment variables. Some streamers may be able to (eventually) accept e.g. streaming over a port as an input.

  • in_stream_type ('ORCA', 'FlashCam', 'LlamaDaq', 'Compass' or 'MGDO') – type of stream used to write the input file.

  • out_spec (Optional[Union[str, dict, RawBufferLibrary]]) –

    Specification for the output stream.

    • if None, uses {in_stream}.lh5 as the output filename.

    • if a str not ending in .json, interpreted as the output filename.

    • if a str ending in .json, interpreted as a filename containing json-shorthand for the output specification (see raw_buffer).

    • if a JSON dict, should be a dict loaded from the json shorthand notation for RawBufferLibraries (see raw_buffer), which is then used to build a RawBufferLibrary.

    • if a RawBufferLibrary, the mapping of data to output file / group is taken from that.

  • buffer_size (int) – default size to use for data buffering.

  • n_max (int) – maximum number of rows of data to process from the input file.

  • overwrite (bool) – sets whether to overwrite the output file(s) if it (they) already exist.

  • **kwargs – sent to RawBufferLibrary generation as kw_dict.

pygama.raw.data_decoder module#

Base classes for decoding data into raw LGDO Tables or files

class pygama.raw.data_decoder.DataDecoder(garbage_length: int = 256, packet_size_guess: int = 1024)#

Bases: object

Decodes packets from a data stream.

Most decoders will repeatedly decode the same set of values from each packet. The values that get decoded need to be described by a dict called decoded_values that helps determine how to set up the buffers and write them to file. Tables are made whose columns correspond to the elements of decoded_values, and packet data gets pushed to the end of the table one row at a time. See FCEventDecoder or ORCAStruck3302 for an example.

Some decoders (like for file headers) do not need to push to a table, so they do not need decoded_values. Such classes should still derive from DataDecoder and define how data gets formatted into LGDO’s.

Subclasses should define a method for decoding data to a buffer like decode_packet(packet, raw_buffer_list, packet_id). This function should return the number of bytes read.

Garbage collection writes binary data as an array of uint32s to a variable-length array in the output file. If a problematic packet is found, call put_in_garbage(). User should set up an enum or bitbank of garbage codes to be stored along with the garbage packets.

buffer_is_full(rb: RawBuffer) bool#

Returns whether the buffer is too full to read in another packet.

Return type:

bool

get_decoded_values(key: Optional[Union[int, str]] = None) dict#

Get decoded values (optionally for a given key, typically a channel).

Notes

Must overload for your decoder if it has key-specific decoded values. Must also implement key = None returns a “default” decoded_values. Otherwise, just returns self.decoded_values, which should be defined in the constructor.

Return type:

dict

get_key_list() list[int | str]#

Overload with list of keys for this decoder, e.g. return range(n_channels). The default version works for decoders with single / no keys.

Return type:

list[int | str]

get_max_rows_in_packet() int#

Returns the maximum number of rows that could be read out in a packet.

1 by default, overload as necessary to avoid writing past the ends of buffers.

Return type:

int

make_lgdo(key: Optional[Union[int, str]] = None, size: Optional[int] = None) Union[Scalar, Struct, Array, VectorOfVectors]#

Make an LGDO for this DataDecoder to fill.

This default version of this function allocates a Table using the decoded_values for key. If a different type of LGDO object is required for this decoder, overload this function.

Parameters:
  • key (Optional[Union[int, str]]) – used to initialize the LGDO for a particular key (e.g. to have different trace lengths for different channels of a piece of hardware). Leave as None if such specialization is not necessary.

  • size (Optional[int]) – the size to be allocated for the LGDO, if applicable.

Returns:

data_obj – the newly allocated LGDO.

Return type:

Union[Scalar, Struct, Array, VectorOfVectors]

put_in_garbage(packet: int, packet_id: int, code: int) None#
write_out_garbage(filename: str, group: str = '/', lh5_store: Optional[LH5Store] = None) None#

pygama.raw.data_streamer module#

Base classes for streaming data.

class pygama.raw.data_streamer.DataStreamer#

Bases: ABC

Base clase for data streams.

Provides a uniform interface for streaming, e.g.:

>>> header = ds.open_stream(stream_name)
>>> for chunk in ds: do_something(chunk)

Also provides default management of the RawBufferLibrary used for data reading: allocation (if needed), configuration (to match the stream) and fill level checking. Derived classes must define the functions get_decoder_list(), open_stream(), and read_packet(); see below.

_abc_impl = <_abc._abc_data object>#
build_default_rb_lib(out_stream: str = '') RawBufferLibrary#

Build the most basic RawBufferLibrary that will work for this stream.

A RawBufferList containing a single RawBuffer is built for each decoder name returned by get_decoder_list(). Each buffer’s out_name is set to the decoder name. The LGDO’s do not get initialized.

Return type:

RawBufferLibrary

abstract close_stream() None#

Close this data stream.

Note

Needs to be overloaded.

abstract get_decoder_list() list#

Returns a list of decoder objects for this data stream.

Notes

Needs to be overloaded. Gets called during open_stream().

Return type:

list

abstract open_stream(stream_name: str, rb_lib: Optional[RawBufferLibrary] = None, buffer_size: int = 8192, chunk_mode: str = 'any_full', out_stream: str = '') tuple[list[pygama.raw.raw_buffer.RawBuffer], int]#

Open and initialize a data stream.

Open the stream, read in the header, set up the buffers.

Call super().initialize([args]) from derived class after loading header info to run this default version that sets up buffers in rb_lib using the stream’s decoders.

Notes

this default version has no actual return value! You must overload this function, set self.n_bytes_read to the header packet size, and return the header data.

Parameters:
  • stream_name (str) – typically a filename or e.g. a port for streaming.

  • rb_lib (Optional[RawBufferLibrary]) – a library of buffers for readout from the data stream. rb_lib will have its LGDO’s initialized during this function.

  • buffer_size (int) – length of buffers to be read out in read_chunk() (for buffers with variable length).

  • chunk_mode ('any_full', 'only_full' or 'single_packet') – sets the mode use for read_chunk().

  • out_stream (str) – optional name of output stream for default rb_lib generation.

Returns:

header_data – header_data is a list of RawBuffer‘s containing any file header data, ready for writing to file or further processing. It’s not a RawBufferList since the buffers may have a different format.

Return type:

tuple[list[pygama.raw.raw_buffer.RawBuffer], int]

read_chunk(chunk_mode_override: Optional[str] = None, rp_max: int = 1000000, clear_full_buffers: bool = True) tuple[list[pygama.raw.raw_buffer.RawBuffer], int]#

Reads a chunk of data into raw buffers.

Reads packets until at least one buffer is too full to perform another read. Default version just calls read_packet() over and over. Overload as necessary.

Notes

user is responsible for resetting / clearing the raw buffers prior to calling read_chunk() again.

Parameters:
  • chunk_mode_override ('any_full', 'only_full' or 'single_packet') –

    • None : do not override self.chunk_mode

    • any_full : returns all raw buffers with data as soon as any one buffer gets full

    • only_full : returns only those raw buffers that became full (or nearly full) during the read. This minimizes the number of write calls.

    • single_packet : returns all raw buffers with data after a single read is performed. This is useful for streaming data out as soon as it is read in (e.g. for diagnostics or in-line analysis).

  • rp_max (int) – maximum number of packets to read before returning anyway, even if one of the other conditions is not met.

  • clear_full_buffers (bool) – automatically clear any buffers that report themselves as being full prior to reading the chunk. Set to False if clearing manually for a minor speed-up.

Returns:

chunk_list (list of RawBuffers, int) – chunk_list is the list of RawBuffers with data ready for writing to file or further processing. The list contains all buffers with data or just all full buffers depending on the flag full_only. Note chunk_list is not a RawBufferList since the RawBuffers inside may not all have the same structure

Return type:

tuple[list[pygama.raw.raw_buffer.RawBuffer], int]

abstract read_packet() bool#

Reads a single packet’s worth of data in to the RawBufferLibrary.

Needs to be overloaded. Gets called by read_chunk() Needs to update self.any_full if any buffers would possibly over-fill on the next read. Needs to update self.n_bytes_read too.

Returns:

still_has_data – returns True while there is still data to read.

Return type:

bool

pygama.raw.raw_buffer module#

Utilities to manage data buffering for raw data conversion. This module manages LGDO buffers and their corresponding output streams. Allows for one-to-many mapping of input streams to output streams.

Primary Classes#

RawBuffer: an LGDO (e.g. a table) along with buffer metadata, such as the current write location, the list of keys (e.g. channels) that write to it, the output stream it is associated with (if any), etc. Each DataDecoder is associated with a RawBuffer of a particular format.

RawBufferList: a collection of RawBuffer with LGDO’s that all have the same structure (same type, same fields, etc). A DataDecoder will write its output to a RawBufferList.

RawBufferLibrary: a dictionary of RawBufferLists, e.g. one for each DataDecoder. Keyed by the decoder name.

RawBuffer supports a JSON short-hand notation, see RawBufferLibrary.set_from_json_dict() for full specification.

Example JSON yielding a valid RawBufferLibrary is below. In the example, the user would call RawBufferLibrary.set_from_json_dict(json_dict, kw_dict) with kw_dict containing an entry for 'file_key'. The other keywords {key} and {name} are understood by and filled in during RawBufferLibrary.set_from_json_dict() unless overloaded in kw_dict. Note the use of the wildcard *: this will match all other decoder names / keys.

{
  "FCEventDecoder" : {
    "g{key:0>3d}" : {
      "key_list" : [ [24,64] ],
      "out_stream" : "$DATADIR/{file_key}_geds.lh5:/geds"
    },
    "spms" : {
      "key_list" : [ [6,23] ],
      "out_stream" : "$DATADIR/{file_key}_spms.lh5:/spms"
    },
    "puls" : {
      "key_list" : [ 0 ],
      "out_stream" : "$DATADIR/{file_key}_auxs.lh5:/auxs"
    },
    "muvt" : {
      "key_list" : [ 1, 5 ],
      "out_stream" : "$DATADIR/{file_key}_auxs.lh5:/auxs"
    }
  },
  "*" : {
    "{name}" : {
      "key_list" : [ "*" ],
      "out_stream" : "$DATADIR/{file_key}_{name}.lh5"
    }
  }
}
class pygama.raw.raw_buffer.RawBuffer(lgdo: Optional[Union[Scalar, Struct, Array, VectorOfVectors]] = None, key_list: Optional[list[int | str]] = None, out_stream: str = '', out_name: str = '')#

Bases: object

Base class to represent a buffer of raw data.

A RawBuffer is in essence a an LGDO object (typically a Table) to which decoded data will be written, along with some meta-data distinguishing what data goes into it, and where the LGDO gets written out. Also holds on to the current location in the buffer for writing.

Variables:
  • lgdo – the LGDO used as the actual buffer. Typically a Table. Set to None upon creation so that the user or a decoder can initialize it later.

  • key_list – a list of keys (e.g. channel numbers) identifying data to be written into this buffer. The key scheme is specific to the decoder with which the RawBuffer is associated. This is called key_list instead of keys to avoid confusion with the dict function dict.keys(), i.e. raw_buffer.lgdo.keys().

  • out_stream – the output stream to which the RawBuffer‘s LGDO should be sent or written. A colon (,) can be used to separate the stream name/address from an in-stream path/port: - file example: /path/filename.lh5:/group - socket example: 198.0.0.100:8000

  • out_name – the name or identifier of the object in the output stream.

is_full() bool#
Return type:

bool

class pygama.raw.raw_buffer.RawBufferLibrary(json_dict: Optional[dict] = None, kw_dict: Optional[dict[str, str]] = None)#

Bases: dict

A RawBufferLibrary is a collection of RawBufferLists associated with the names of decoders that can write to them.

clear_full() None#
get_list_of(attribute: str, unique: bool = True) list#

Return a list of values of RawBuffer attributes.

Parameters:
  • attribute (str) – The RawBuffer attribute queried to make the list.

  • unique (bool) – whether to remove duplicates.

Returns:

values – The list of values of RawBuffer.attribute.

Return type:

list

Examples

>>> output_file_list = rbl.get_list_of('out_stream')
set_from_json_dict(json_dict: dict, kw_dict: Optional[dict[str, str]] = None) None#

Set up a RawBufferLibrary from a dictionary written in JSON shorthand.

Basic structure:

{
"list_name" : {
  "name" : {
      "key_list" : [ "key1", "key2", "..." ],
      "out_stream" : "out_stream_str",
      "out_name" : "out_name_str" // (optional)
  }
}

By default name is used for the RawBuffer‘s out_name attribute, but this can be overridden if desired by providing an explicit out_name.

Allowed shorthands, in order of expansion:

  • key_list may have entries that are 2-integer lists corresponding to the first and last integer keys in a contiguous range (e.g. of channels) that get stored to the same buffer. These simply get replaced with the explicit list of integers in the range. We use lists not tuples for JSON compliance.

  • The name can include {key:xxx} format specifiers, indicating that each key in key_list should be given its own buffer with the corresponding name. The same specifier can appear in out_path to write the key’s data to its own output path.

  • You may also include keywords in your out_stream and out_name specification whose values get sent in via kw_dict. These get evaluated simultaneously with the {key:xxx} specifiers.

  • Environment variables can also be used in out_stream. They get expanded after kw_dict is handled and thus can be used inside kw_dict.

  • list_name can use the wildcard * to match any other list_name known to a streamer.

  • out_stream and out_name can also include {name}, to be replaced with the buffer’s name. In the case of list_name="*", {name} evaluates to list_name.

Parameters:
  • json_dict (dict) – loaded from a JSON file written in the allowed shorthand. json_dict is changed by this function.

  • kw_dict (Optional[dict[str, str]]) – dictionary of keyword-value pairs for substitutions into the out_stream and out_name fields.

class pygama.raw.raw_buffer.RawBufferList#

Bases: list

A RawBufferList holds a collection of RawBuffers of identical structure (same format LGDO’s with the same fields).

clear_full() None#
get_keyed_dict() dict[int | str, pygama.raw.raw_buffer.RawBuffer]#

Returns a dictionary of RawBuffers built from the buffers’ key_lists.

Different keys may point to the same buffer. Requires the buffers in the RawBufferList to have non-overlapping key lists.

Return type:

dict[int | str, pygama.raw.raw_buffer.RawBuffer]

get_list_of(attribute: str) list#

Return a list of values of RawBuffer attributes.

Parameters:

attribute (str) – The RawBuffer attribute queried to make the list.

Returns:

values – The list of values of RawBuffer.attribute.

Return type:

list

Examples

>>> output_file_list = rbl.get_list_of('out_stream')
set_from_json_dict(json_dict: dict, kw_dict: Optional[dict[str, str]] = None) None#

Set up a RawBufferList from a dictionary written in JSON shorthand. See RawBufferLibrary.set_from_json_dict() for details.

Notes

json_dict is changed by this function.

pygama.raw.raw_buffer.expand_rblist_json_dict(json_dict: dict, kw_dict: dict[str, str]) None#

Expand shorthands in a JSON dictionary representing a RawBufferList.

See RawBufferLibrary.set_from_json_dict() for details.

Notes

The input JSON dictionary is changed by this function.

pygama.raw.raw_buffer.write_to_lh5_and_clear(raw_buffers: list[pygama.raw.raw_buffer.RawBuffer], lh5_store: Optional[LH5Store] = None, wo_mode: str = 'append') None#

Write a list of RawBuffers to LH5 files and then clears them.

Parameters:
  • raw_buffers (list(RawBuffer)) – The list of RawBuffers to be written to file. Note this is not a RawBufferList because the RawBuffers may not have the same structure.

  • lh5_store (LH5Store or None) – Allows user to send in a store holding a collection of already open files (saves some time opening / closing files)

  • wo_mode (str) – write mode, see also lgdo.lh5_store.LH5Store.write_object()