Hit-tier production — pygama.hit

The pygama.hit sub-package transforms dsp-tier tables into hit-tier tables by evaluating user-defined column expressions. It is the principal mechanism through which calibrated quantities, quality-cut flags, and other derived parameters are added to the data before event building.

Overview

The hit tier is produced by build_hit(). The function reads one or more Table objects from an LH5 file and, for each table, evaluates a set of string expressions against the existing columns. The resulting new columns are written to an output LH5 file.

Expressions are evaluated column-by-column (not row-by-row) using eval(), which internally relies on numexpr for fast, vectorised execution without Python overhead.

The transformation is entirely configuration-driven: no Python code is required beyond providing the JSON configuration. Parameters that change between detector channels or calibration periods (e.g. calibration coefficients) can be injected as named scalars in the configuration, keeping the expressions readable and the parameters easily updatable.

Configuration format

The hit configuration is a JSON object (or equivalent Python dict) with two mandatory keys:

outputs

A list of column names to write to the output file. Only columns listed here appear in the hit tier; intermediate columns used only for subsequent expressions are discarded.

operations

A mapping from output-column name to an operation descriptor. Each descriptor has the following fields:

expression

A string expression referencing existing columns by name. Supports standard arithmetic operators, NumPy ufuncs available through numexpr, and references to columns in the input table.

parameters (optional)

A mapping of parameter name to scalar value (e.g. numbers or strings) supported by eval(). These are made available to the expression under their given names, allowing calibration constants to be stored alongside the expression without hard-coding them.

lgdo_attrs (optional)

A mapping of LGDO attribute name to value (e.g. {"units": "keV"}), which is attached to the output column as metadata.

Example

The following configuration computes a calibrated energy calE from the raw trapezoidal-filter energy trapEmax, and the amplitude-over-energy ratio AoE:

{
  "outputs": ["calE", "AoE"],
  "operations": {
    "calE": {
      "expression": "sqrt(a + b * trapEmax**2)",
      "parameters": {"a": "1.23", "b": "42.69"},
      "lgdo_attrs": {"units": "keV"}
    },
    "AoE": {
      "expression": "A_max / calE"
    }
  }
}

Note that AoE references calE, which is itself a derived column. Within a single table, build_hit() automatically orders operations based on their expression dependencies, so columns are evaluated in a dependency-respecting order rather than strictly in JSON insertion order. This dependency-based reordering is what allows forward references like this to be supported.

Bit aggregations

An optional aggregations block packs several boolean columns into a single integer column, with one bit per source column. Each aggregation entry is a mapping from bit name to source-column name; bit i corresponds to the i-th entry in insertion order. Example:

{
  "outputs": ["aggr1"],
  "operations": {
    "is_valid_rt":   {"expression": "(tp_90-tp_10) > 96", "parameters": {}},
    "is_valid_t0":   {"expression": "tp_0_est > 47000",   "parameters": {}},
    "is_valid_tmax": {"expression": "tp_max < 120000",    "parameters": {}}
  },
  "aggregations": {
    "aggr1": {
      "bit0": "is_valid_rt",
      "bit1": "is_valid_t0",
      "bit2": "is_valid_tmax"
    }
  }
}

The dtype of the aggregated column is the smallest unsigned integer that fits the number of bits (uint8, uint16, uint32, or uint64). The source-column names are persisted on disk as the bit_names LGDO attribute of the aggregated column (a comma-separated string in bit order), so the encoding can be recovered at read time. Use unpack_bitmask() to expand an aggregation back into an awkward record array with one boolean field per bit name.

Per-table configuration

When an LH5 file contains tables for many channels, it is often convenient to apply slightly different configurations to different channels (e.g. different calibration constants). build_hit() supports this through the lh5_tables_config argument, which maps LH5 table paths to individual configuration dictionaries:

lh5_tables_config = {
    "ch1084803/dsp": {"outputs": [...], "operations": {...}},
    "ch1084804/dsp": {"outputs": [...], "operations": {...}},
}

API reference

Function

Description

build_hit()

Read DSP-tier LH5 tables and write calibrated hit-tier quantities by evaluating the supplied configuration expressions.

unpack_bitmask()

Expand a bit-aggregation column into an awkward record array with one boolean field per bit name.

For the complete parameter reference see pygama.hit.