pygama.raw package#
The primary function for data conversion into raw-tier LH5 files is
build_raw(). This is a one-to many function: one input DAQ file can
generate one or more output raw files. Control of which data ends up in which
files, and in which HDF5 groups inside of each file, is controlled via
raw_buffer (see below). If no raw buffers specification is specified,
all decoded data should be written to a single output file, with all fields
from each hardware decoder in their own output table.
Currently we support the following hardware:
Subpackages#
- pygama.raw.fc package
- pygama.raw.orca package
- Submodules
- pygama.raw.orca.orca_base module
- pygama.raw.orca.orca_digitizers module
- pygama.raw.orca.orca_flashcam module
- pygama.raw.orca.orca_header module
- pygama.raw.orca.orca_header_decoder module
- pygama.raw.orca.orca_packet module
- pygama.raw.orca.orca_streamer module
Submodules#
pygama.raw.build_raw module#
- pygama.raw.build_raw.build_raw(in_stream: int, in_stream_type: Optional[str] = None, out_spec: Optional[Union[str, dict, RawBufferLibrary]] = None, buffer_size: int = 8192, n_max: int = inf, overwrite: bool = False, **kwargs) None#
Convert data into LEGEND HDF5 raw-tier format.
Takes an input stream of a given type and writes to output file(s) according to the user’s a specification.
- Parameters:
in_stream (int) – the name of the input stream to be converted. Typically a filename, including path. Can use environment variables. Some streamers may be able to (eventually) accept e.g. streaming over a port as an input.
in_stream_type ('ORCA', 'FlashCam', 'LlamaDaq', 'Compass' or 'MGDO') – type of stream used to write the input file.
out_spec (Optional[Union[str, dict, RawBufferLibrary]]) –
Specification for the output stream.
if None, uses
{in_stream}.lh5as the output filename.if a str not ending in
.json, interpreted as the output filename.if a str ending in
.json, interpreted as a filename containing json-shorthand for the output specification (seeraw_buffer).if a JSON dict, should be a dict loaded from the json shorthand notation for RawBufferLibraries (see
raw_buffer), which is then used to build aRawBufferLibrary.if a
RawBufferLibrary, the mapping of data to output file / group is taken from that.
buffer_size (int) – default size to use for data buffering.
n_max (int) – maximum number of rows of data to process from the input file.
overwrite (bool) – sets whether to overwrite the output file(s) if it (they) already exist.
**kwargs – sent to
RawBufferLibrarygeneration as kw_dict.
pygama.raw.data_decoder module#
Base classes for decoding data into raw LGDO Tables or files
- class pygama.raw.data_decoder.DataDecoder(garbage_length: int = 256, packet_size_guess: int = 1024)#
Bases:
objectDecodes packets from a data stream.
Most decoders will repeatedly decode the same set of values from each packet. The values that get decoded need to be described by a dict called decoded_values that helps determine how to set up the buffers and write them to file.
Tables are made whose columns correspond to the elements of decoded_values, and packet data gets pushed to the end of the table one row at a time. SeeFCEventDecoderorORCAStruck3302for an example.Some decoders (like for file headers) do not need to push to a table, so they do not need decoded_values. Such classes should still derive from
DataDecoderand define how data gets formatted into LGDO’s.Subclasses should define a method for decoding data to a buffer like
decode_packet(packet, raw_buffer_list, packet_id). This function should return the number of bytes read.Garbage collection writes binary data as an array of
uint32s to a variable-length array in the output file. If a problematic packet is found, callput_in_garbage(). User should set up an enum or bitbank of garbage codes to be stored along with the garbage packets.- buffer_is_full(rb: RawBuffer) bool#
Returns whether the buffer is too full to read in another packet.
- Return type:
- get_decoded_values(key: Optional[Union[int, str]] = None) dict#
Get decoded values (optionally for a given key, typically a channel).
Notes
Must overload for your decoder if it has key-specific decoded values. Must also implement
key = Nonereturns a “default” decoded_values. Otherwise, just returnsself.decoded_values, which should be defined in the constructor.- Return type:
- get_key_list() list[int | str]#
Overload with list of keys for this decoder, e.g.
return range(n_channels). The default version works for decoders with single / no keys.
- get_max_rows_in_packet() int#
Returns the maximum number of rows that could be read out in a packet.
1 by default, overload as necessary to avoid writing past the ends of buffers.
- Return type:
- make_lgdo(key: Optional[Union[int, str]] = None, size: Optional[int] = None) Union[Scalar, Struct, Array, VectorOfVectors]#
Make an LGDO for this
DataDecoderto fill.This default version of this function allocates a
Tableusing the decoded_values for key. If a different type of LGDO object is required for this decoder, overload this function.- Parameters:
key (Optional[Union[int, str]]) – used to initialize the LGDO for a particular key (e.g. to have different trace lengths for different channels of a piece of hardware). Leave as
Noneif such specialization is not necessary.size (Optional[int]) – the size to be allocated for the LGDO, if applicable.
- Returns:
data_obj – the newly allocated LGDO.
- Return type:
pygama.raw.data_streamer module#
Base classes for streaming data.
- class pygama.raw.data_streamer.DataStreamer#
Bases:
ABCBase clase for data streams.
Provides a uniform interface for streaming, e.g.:
>>> header = ds.open_stream(stream_name) >>> for chunk in ds: do_something(chunk)
Also provides default management of the
RawBufferLibraryused for data reading: allocation (if needed), configuration (to match the stream) and fill level checking. Derived classes must define the functionsget_decoder_list(),open_stream(), andread_packet(); see below.- _abc_impl = <_abc._abc_data object>#
- build_default_rb_lib(out_stream: str = '') RawBufferLibrary#
Build the most basic
RawBufferLibrarythat will work for this stream.A
RawBufferListcontaining a singleRawBufferis built for each decoder name returned byget_decoder_list(). Each buffer’s out_name is set to the decoder name. The LGDO’s do not get initialized.- Return type:
- abstract get_decoder_list() list#
Returns a list of decoder objects for this data stream.
Notes
Needs to be overloaded. Gets called during
open_stream().- Return type:
- abstract open_stream(stream_name: str, rb_lib: Optional[RawBufferLibrary] = None, buffer_size: int = 8192, chunk_mode: str = 'any_full', out_stream: str = '') tuple[list[pygama.raw.raw_buffer.RawBuffer], int]#
Open and initialize a data stream.
Open the stream, read in the header, set up the buffers.
Call
super().initialize([args])from derived class after loading header info to run this default version that sets up buffers in rb_lib using the stream’s decoders.Notes
this default version has no actual return value! You must overload this function, set
self.n_bytes_readto the header packet size, and return the header data.- Parameters:
stream_name (str) – typically a filename or e.g. a port for streaming.
rb_lib (Optional[RawBufferLibrary]) – a library of buffers for readout from the data stream. rb_lib will have its LGDO’s initialized during this function.
buffer_size (int) – length of buffers to be read out in
read_chunk()(for buffers with variable length).chunk_mode ('any_full', 'only_full' or 'single_packet') – sets the mode use for
read_chunk().out_stream (str) – optional name of output stream for default rb_lib generation.
- Returns:
header_data – header_data is a list of
RawBuffer‘s containing any file header data, ready for writing to file or further processing. It’s not aRawBufferListsince the buffers may have a different format.- Return type:
- read_chunk(chunk_mode_override: Optional[str] = None, rp_max: int = 1000000, clear_full_buffers: bool = True) tuple[list[pygama.raw.raw_buffer.RawBuffer], int]#
Reads a chunk of data into raw buffers.
Reads packets until at least one buffer is too full to perform another read. Default version just calls
read_packet()over and over. Overload as necessary.Notes
user is responsible for resetting / clearing the raw buffers prior to calling
read_chunk()again.- Parameters:
chunk_mode_override ('any_full', 'only_full' or 'single_packet') –
None: do not override self.chunk_modeany_full: returns all raw buffers with data as soon as any one buffer gets fullonly_full: returns only those raw buffers that became full (or nearly full) during the read. This minimizes the number of write calls.single_packet: returns all raw buffers with data after a single read is performed. This is useful for streaming data out as soon as it is read in (e.g. for diagnostics or in-line analysis).
rp_max (int) – maximum number of packets to read before returning anyway, even if one of the other conditions is not met.
clear_full_buffers (bool) – automatically clear any buffers that report themselves as being full prior to reading the chunk. Set to False if clearing manually for a minor speed-up.
- Returns:
chunk_list (list of RawBuffers, int) – chunk_list is the list of RawBuffers with data ready for writing to file or further processing. The list contains all buffers with data or just all full buffers depending on the flag full_only. Note chunk_list is not a RawBufferList since the RawBuffers inside may not all have the same structure
- Return type:
- abstract read_packet() bool#
Reads a single packet’s worth of data in to the
RawBufferLibrary.Needs to be overloaded. Gets called by
read_chunk()Needs to updateself.any_fullif any buffers would possibly over-fill on the next read. Needs to updateself.n_bytes_readtoo.- Returns:
still_has_data – returns True while there is still data to read.
- Return type:
pygama.raw.raw_buffer module#
Utilities to manage data buffering for raw data conversion. This module manages LGDO buffers and their corresponding output streams. Allows for one-to-many mapping of input streams to output streams.
Primary Classes#
RawBuffer: an LGDO (e.g. a table) along with buffer metadata, such as the
current write location, the list of keys (e.g. channels) that write to it, the
output stream it is associated with (if any), etc. Each
DataDecoder is associated with a
RawBuffer of a particular format.
RawBufferList: a collection of RawBuffer with LGDO’s that
all have the same structure (same type, same fields, etc). A
DataDecoder will write its output to a
RawBufferList.
RawBufferLibrary: a dictionary of RawBufferLists, e.g. one
for each DataDecoder. Keyed by the decoder name.
RawBuffer supports a JSON short-hand notation, see
RawBufferLibrary.set_from_json_dict() for full specification.
Example JSON yielding a valid RawBufferLibrary is below. In the
example, the user would call RawBufferLibrary.set_from_json_dict(json_dict,
kw_dict) with kw_dict containing an entry for 'file_key'. The other
keywords {key} and {name} are understood by and filled in during
RawBufferLibrary.set_from_json_dict() unless overloaded in kw_dict.
Note the use of the wildcard *: this will match all other decoder names /
keys.
{
"FCEventDecoder" : {
"g{key:0>3d}" : {
"key_list" : [ [24,64] ],
"out_stream" : "$DATADIR/{file_key}_geds.lh5:/geds"
},
"spms" : {
"key_list" : [ [6,23] ],
"out_stream" : "$DATADIR/{file_key}_spms.lh5:/spms"
},
"puls" : {
"key_list" : [ 0 ],
"out_stream" : "$DATADIR/{file_key}_auxs.lh5:/auxs"
},
"muvt" : {
"key_list" : [ 1, 5 ],
"out_stream" : "$DATADIR/{file_key}_auxs.lh5:/auxs"
}
},
"*" : {
"{name}" : {
"key_list" : [ "*" ],
"out_stream" : "$DATADIR/{file_key}_{name}.lh5"
}
}
}
- class pygama.raw.raw_buffer.RawBuffer(lgdo: Optional[Union[Scalar, Struct, Array, VectorOfVectors]] = None, key_list: Optional[list[int | str]] = None, out_stream: str = '', out_name: str = '')#
Bases:
objectBase class to represent a buffer of raw data.
A
RawBufferis in essence a an LGDO object (typically aTable) to which decoded data will be written, along with some meta-data distinguishing what data goes into it, and where the LGDO gets written out. Also holds on to the current location in the buffer for writing.- Variables:
lgdo – the LGDO used as the actual buffer. Typically a
Table. Set toNoneupon creation so that the user or a decoder can initialize it later.key_list – a list of keys (e.g. channel numbers) identifying data to be written into this buffer. The key scheme is specific to the decoder with which the
RawBufferis associated. This is called key_list instead of keys to avoid confusion with the dict functiondict.keys(), i.e.raw_buffer.lgdo.keys().out_stream – the output stream to which the
RawBuffer‘s LGDO should be sent or written. A colon (,) can be used to separate the stream name/address from an in-stream path/port: - file example:/path/filename.lh5:/group- socket example:198.0.0.100:8000out_name – the name or identifier of the object in the output stream.
- class pygama.raw.raw_buffer.RawBufferLibrary(json_dict: Optional[dict] = None, kw_dict: Optional[dict[str, str]] = None)#
Bases:
dictA
RawBufferLibraryis a collection ofRawBufferLists associated with the names of decoders that can write to them.- get_list_of(attribute: str, unique: bool = True) list#
Return a list of values of
RawBufferattributes.- Parameters:
- Returns:
values – The list of values of RawBuffer.attribute.
- Return type:
Examples
>>> output_file_list = rbl.get_list_of('out_stream')
- set_from_json_dict(json_dict: dict, kw_dict: Optional[dict[str, str]] = None) None#
Set up a
RawBufferLibraryfrom a dictionary written in JSON shorthand.Basic structure:
{ "list_name" : { "name" : { "key_list" : [ "key1", "key2", "..." ], "out_stream" : "out_stream_str", "out_name" : "out_name_str" // (optional) } }
By default
nameis used for theRawBuffer‘sout_nameattribute, but this can be overridden if desired by providing an explicitout_name.Allowed shorthands, in order of expansion:
key_listmay have entries that are 2-integer lists corresponding to the first and last integer keys in a contiguous range (e.g. of channels) that get stored to the same buffer. These simply get replaced with the explicit list of integers in the range. We use lists not tuples for JSON compliance.The
namecan include{key:xxx}format specifiers, indicating that each key inkey_listshould be given its own buffer with the corresponding name. The same specifier can appear inout_pathto write the key’s data to its own output path.You may also include keywords in your
out_streamandout_namespecification whose values get sent in via kw_dict. These get evaluated simultaneously with the{key:xxx}specifiers.Environment variables can also be used in
out_stream. They get expanded after kw_dict is handled and thus can be used inside kw_dict.list_namecan use the wildcard*to match any otherlist_nameknown to a streamer.out_streamandout_namecan also include{name}, to be replaced with the buffer’sname. In the case oflist_name="*",{name}evaluates tolist_name.
- class pygama.raw.raw_buffer.RawBufferList#
Bases:
listA
RawBufferListholds a collection ofRawBuffers of identical structure (same format LGDO’s with the same fields).- get_keyed_dict() dict[int | str, pygama.raw.raw_buffer.RawBuffer]#
Returns a dictionary of
RawBuffers built from the buffers’ key_lists.Different keys may point to the same buffer. Requires the buffers in the
RawBufferListto have non-overlapping key lists.- Return type:
- get_list_of(attribute: str) list#
Return a list of values of
RawBufferattributes.- Parameters:
attribute (str) – The
RawBufferattribute queried to make the list.- Returns:
values – The list of values of RawBuffer.attribute.
- Return type:
Examples
>>> output_file_list = rbl.get_list_of('out_stream')
- set_from_json_dict(json_dict: dict, kw_dict: Optional[dict[str, str]] = None) None#
Set up a
RawBufferListfrom a dictionary written in JSON shorthand. SeeRawBufferLibrary.set_from_json_dict()for details.Notes
json_dict is changed by this function.
- pygama.raw.raw_buffer.expand_rblist_json_dict(json_dict: dict, kw_dict: dict[str, str]) None#
Expand shorthands in a JSON dictionary representing a
RawBufferList.See
RawBufferLibrary.set_from_json_dict()for details.Notes
The input JSON dictionary is changed by this function.
- pygama.raw.raw_buffer.write_to_lh5_and_clear(raw_buffers: list[pygama.raw.raw_buffer.RawBuffer], lh5_store: Optional[LH5Store] = None, wo_mode: str = 'append') None#
Write a list of
RawBuffers to LH5 files and then clears them.- Parameters:
raw_buffers (list(RawBuffer)) – The list of RawBuffers to be written to file. Note this is not a RawBufferList because the RawBuffers may not have the same structure.
lh5_store (LH5Store or None) – Allows user to send in a store holding a collection of already open files (saves some time opening / closing files)
wo_mode (str) – write mode, see also
lgdo.lh5_store.LH5Store.write_object()