Hit-tier production — pygama.hit¶
The pygama.hit sub-package transforms dsp-tier tables into
hit-tier tables by evaluating user-defined column expressions. It is the
principal mechanism through which calibrated quantities, quality-cut flags, and
other derived parameters are added to the data before event building.
Overview¶
The hit tier is produced by build_hit(). The
function reads one or more Table objects from an
LH5 file and, for each table, evaluates a set of string expressions against
the existing columns. The resulting new columns are written to an output LH5
file.
Expressions are evaluated column-by-column (not row-by-row) using
eval(), which internally relies on numexpr for fast, vectorised execution without
Python overhead.
The transformation is entirely configuration-driven: no Python code is required beyond providing the JSON configuration. Parameters that change between detector channels or calibration periods (e.g. calibration coefficients) can be injected as named scalars in the configuration, keeping the expressions readable and the parameters easily updatable.
Configuration format¶
The hit configuration is a JSON object (or equivalent Python dict) with two mandatory keys:
outputsA list of column names to write to the output file. Only columns listed here appear in the
hittier; intermediate columns used only for subsequent expressions are discarded.operationsA mapping from output-column name to an operation descriptor. Each descriptor has the following fields:
expressionA string expression referencing existing columns by name. Supports standard arithmetic operators, NumPy ufuncs available through
numexpr, and references to columns in the input table.parameters(optional)A mapping of parameter name to scalar value (e.g. numbers or strings) supported by
eval(). These are made available to the expression under their given names, allowing calibration constants to be stored alongside the expression without hard-coding them.lgdo_attrs(optional)A mapping of LGDO attribute name to value (e.g.
{"units": "keV"}), which is attached to the output column as metadata.
Example¶
The following configuration computes a calibrated energy calE from the
raw trapezoidal-filter energy trapEmax, and the amplitude-over-energy
ratio AoE:
{
"outputs": ["calE", "AoE"],
"operations": {
"calE": {
"expression": "sqrt(a + b * trapEmax**2)",
"parameters": {"a": "1.23", "b": "42.69"},
"lgdo_attrs": {"units": "keV"}
},
"AoE": {
"expression": "A_max / calE"
}
}
}
Note that AoE references calE, which is itself a derived column.
Within a single table, build_hit() automatically
orders operations based on their expression dependencies, so columns are
evaluated in a dependency-respecting order rather than strictly in JSON
insertion order. This dependency-based reordering is what allows forward
references like this to be supported.
Per-table configuration¶
When an LH5 file contains tables for many channels, it is often convenient to
apply slightly different configurations to different channels (e.g. different
calibration constants). build_hit() supports this
through the lh5_tables_config argument, which maps LH5 table paths to
individual configuration dictionaries:
lh5_tables_config = {
"ch1084803/dsp": {"outputs": [...], "operations": {...}},
"ch1084804/dsp": {"outputs": [...], "operations": {...}},
}
API reference¶
Function |
Description |
|---|---|
Read DSP-tier LH5 tables and write calibrated hit-tier quantities by evaluating the supplied configuration expressions. |
For the complete parameter reference see pygama.hit.