pygama.pargen package

Utilities to generate and optimize parameters of interest from data (e.g. calibration routines)

Submodules

pygama.pargen.AoE_cal module

This module provides functions for correcting the a/e energy dependence, determining the cut level and calculating survival fractions.

class pygama.pargen.AoE_cal.CalAoE(cal_dicts=None, cal_energy_param='cuspEmax_ctc_cal', eres_func=<function CalAoE.<lambda>>, pdf=<pygama.math.functions.sum_dists.SumDists object>, selection_string='index==index', dt_corr=False, dep_correct=False, dt_cut=None, dt_param='dt_eff', high_cut_val=3, mean_func=<class 'pygama.pargen.AoE_cal.Pol1'>, sigma_func=<class 'pygama.pargen.AoE_cal.SigmaFit'>, compt_bands_width=20, debug_mode=False)

Bases: object

Class for calibrating the A/E, this follow 5 steps: the time correction, drift time correction, energy correction, the cut level calculation and the survival fraction calculation.

Parameters:
  • cal_dicts (dict | None) – Dictionary of calibration parameters; can be empty/None for a single run, or keyed by timestamp in the format YYYYMMDDTHHMMSSZ for multiple runs.

  • cal_energy_param (str) – Calibrated energy parameter to use for A/E calibrations and for determining peak events.

  • eres_func (callable) – Function to determine the energy resolution; should accept a single calibrated energy value.

  • pdf – PDF to fit to the A/E distribution.

  • selection_string (str) – Selection string passed as a query to the data DataFrame.

  • dt_corr (bool) – Whether to apply drift-time correction.

  • dep_correct (bool) – Whether to correct the double escape peak into the single-site band before cut determination.

  • dt_cut (dict | None) –

    Dictionary of the drift time cut parameters in the form:

    {"out_param": "dt_cut", "hard": False}
    

    where out_param is the name of the parameter to cut on in the dataframe (should have been precalculated) and "hard" controls whether these events are removed completely for survival fraction calculations, or only for cut determination / A/E calibration steps.

  • dt_param (str) – Name of the drift-time parameter in the dataframe.

  • high_cut_val (float) – Value to cut the A/E distribution at on the high side.

  • mean_func (Callable) – Class implementing the energy-dependence model for the A/E mean (e.g. Pol1).

  • sigma_func (Callable) – Class implementing the energy-dependence model for the A/E sigma (e.g. SigmaFit).

  • compt_bands_width (float) – Width of the Compton bands in keV used for the energy correction.

  • debug_mode (bool) – If True, re-raise exceptions from the A/E calibration steps instead of returning NaN.

calculate_survival_fractions(data, aoe_param, peaks, fit_widths, mode='greater')

Calculate survival fractions for the A/E cut for a list of peaks for the final A/E cut value

Parameters:
  • data (DataFrame) – DataFrame containing the data.

  • aoe_param (str) – Name of the parameter in the dataframe for the final A/E classifier.

  • peaks (list) – List of peak energies in keV to calculate survival fractions for.

  • fit_widths (list[tuple]) – List of (below, above) tuples in keV defining the fit range around each peak.

  • mode (str) – Whether to keep events with A/E "greater" or "less" than the cut value.

calculate_survival_fractions_sweep(data, aoe_param, peaks, fit_widths, n_samples=26, cut_range=(-5, 5), mode='greater')

Calculate survival fractions for the A/E cut for a list of peaks by sweeping through values of the A/E cut to show how this varies

Parameters:
  • data (DataFrame) – DataFrame containing the data.

  • aoe_param (str) – Name of the parameter in the dataframe for the final A/E classifier.

  • peaks (list) – List of peak energies in keV to calculate survival fractions for.

  • fit_widths (list[tuple]) – List of (below, above) tuples in keV defining the fit range around each peak.

  • n_samples – Number of cut values to sample in the sweep.

  • cut_range – Range of A/E cut values to sweep through.

  • mode – Whether to keep events with A/E "greater" or "less" than the cut value.

calibrate(df, initial_aoe_param, peaks_of_interest=None, fit_widths=None, cut_peak_idx=0, dep_acc=0.9, sf_nsamples=11, sf_cut_range=(-5, 5), timecorr_mode='full', dtcorr_mode='mcdrift', override_dict=None)

Main function to run a full A/E calibration with all steps i.e. time correction, drift time correction, energy correction, A/E cut determination and survival fraction calculation

Parameters:
  • df (DataFrame) – DataFrame containing the data.

  • initial_aoe_param (str) – Name of the A/E parameter in the dataframe to use as starting point.

  • peaks_of_interest (list | None) – List of peak energies in keV to calculate survival fractions for.

  • fit_widths (list[tuple] | None) – List of (below, above) tuples in keV for the fit range around each peak.

  • cut_peak_idx (int) – Index into peaks_of_interest of the peak used for cut determination.

  • dep_acc (float) – Desired survival fraction in the DEP for the final cut value.

  • sf_nsamples (int) – Number of samples to take in the survival fraction sweep.

  • sf_cut_range (tuple) – Range of A/E cut values to use for the survival fraction sweep.

  • timecorr_mode (str) – Mode to use for the time correction; see time_correction() for details.

  • dtcorr_mode (str) – Mode to use for the drift time correction; see drift_time_correction() for details.

  • override_dict (dict | None) –

    Optional dictionary of pre-computed calibration entries that bypass one or more fit steps. Each key should map to a sub-dict with "expression" (a string expression) and "parameters" (a dict of named constants used by the expression). Supported keys and the fit step they replace:

    • "AoE_Timecorr" - skips time_correction(); the expression is evaluated with columns from df plus the stored parameters to populate df["AoE_Timecorr"].

    • "AoE_DTcorr" - skips drift_time_correction() (only relevant when self.dt_corr is True); populates df["AoE_DTcorr"]. Silently ignored if self.dt_corr is False.

    • "AoE_Corrected" and "AoE_Classifier" – together skip energy_correction(). Both keys must be present to take the override path; if only one is given a warning is logged and normal energy correction runs instead. "_AoE_Classifier_intermediate" is optional: if present it is evaluated and stored before "AoE_Classifier" is evaluated (useful when the "AoE_Classifier" expression references it).

    • "AoE_Low_Cut" - skips get_aoe_cut_fit(); the expression should evaluate to a boolean array. The numeric cut threshold is read from parameters["a"] (or the sole parameter value if "a" is absent) and stored as self.low_cut_val.

drift_time_correction(data, aoe_param='AoE_Timecorr', out_param='AoE_DTcorr', mode='mcdrift', display=0)

Apply a linear drift-time correction to the A/E distribution.

Estimates a linear correction coefficient to align the A/E values across drift time and writes the corrected parameter to the DataFrame and calibration dictionary.

Parameters:
  • data (DataFrame) – DataFrame containing the data.

  • aoe_param (str) – Name of the A/E parameter to use as the starting point for correction.

  • out_param (str) – Name of the output parameter for the drift-time-corrected A/E, stored in the DataFrame and added to the calibration dictionary.

  • mode (str) – Mode used to estimate the correction coefficient. Options: - "mcdrift" : Minimum Covariance Determinant (MCD) based estimation (improved stability for low statistic and no evident structure) - "bimodal_dt_fit" : Fit the position of two drift time peaks (heavily relying on DEP events and clear structure).

  • display (int) – Plot verbosity level.

energy_correction(data, aoe_param, corrected_param='AoE_Corrected', classifier_param='AoE_Classifier', display=0)

Calculates the corrections needed for the energy dependence of the A/E. Does this by fitting the compton continuum in slices and then applies fits to the centroid and variance.

Parameters:
  • data (DataFrame) – DataFrame containing the data.

  • aoe_param (str) – Name of the A/E parameter to use as starting point.

  • corrected_param (str) – Name of the output parameter for the energy-mean-corrected A/E to be added in the dataframe and to the calibration dictionary.

  • classifier_param (str) – Name of the output parameter for the full mean- and sigma-corrected A/E classifier to be added in the dataframe and to the calibration dictionary.

  • display (int) – Plot verbosity level.

get_aoe_cut_fit(data, aoe_param, peak, ranges, dep_acc=0.9, output_cut_param='AoE_Low_Cut', _display=1)

Determines A/E cut by sweeping through values and for each one fitting the DEP to determine how many events survive. Fits the resulting distribution and interpolates to get cut value at desired DEP survival fraction (typically 90%)

Parameters:
  • data (DataFrame) – DataFrame containing the data.

  • aoe_param (str) – Name of the A/E parameter to use as starting point.

  • peak (float) – Energy in keV of the peak to use for cut determination, e.g. 1592.5.

  • ranges (tuple) – Tuple of (below, above) in keV defining the fit range around peak, e.g. (20, 40).

  • dep_acc (float) – Desired DEP survival fraction for the final cut.

  • output_cut_param (str) – Name of the output boolean parameter for events passing the A/E cut in the dataframe and to be added to the calibration dictionary.

  • display – Plot verbosity level.

time_correction(df, aoe_param, mode='full', output_name='AoE_Timecorr', display=0)

Calculates the time correction for the A/E parameter by fitting the 1-1.3 MeV Compton continuum and extracting the centroid. If only a single run is passed will just perform a shift of the A/E parameter to 1 using the centroid otherwise for multiple runs the shift will be determined on the mode given in.

Parameters:
  • df (DataFrame) – DataFrame containing the data.

  • aoe_param (str) – Name of the A/E parameter to use.

  • mode (str) –

    Mode to use for the time correction, can be "full", "partial", or "none":

    none: just use the mean of the a/e centroids to shift all the data partial: iterate through the centroids if vary by less than 0.4 sigma then group and take mean otherwise when a run higher than 0.4 sigma is found if it is a single run set to nan otherwise start a new block full : each run will be corrected individually average_consecutive: average the consecutive centroids interpolate_consecutive: interpolate between the consecutive centroids

  • output_name (str) – Name of the output parameter for the time-corrected A/E in the dataframe and to be added to the calibration dictionary.

  • display (int) – Plot verbosity level.

update_cal_dicts(update_dict)

Merge new entries into the A/E calibration dictionary.

If cal_dicts is keyed by run timestamps, each timestamp’s sub-dict is updated individually, using update_dict directly as a fallback when a timestamp is absent. Otherwise update_dict is merged directly.

Parameters:

update_dict – Dictionary of new calibration entries to merge.

class pygama.pargen.AoE_cal.Pol1

Bases: object

Linear model for the A/E mean as a function of energy: mean(E) = a·E + b.

static func(x, a, b)

Evaluate a*x + b.

static guess(_bands, _means, _mean_errs)

Return initial parameter guess [a, b].

static string_func(input_param)

Return the expression string for the hit-dict format.

class pygama.pargen.AoE_cal.SigmaFit

Bases: object

Energy-dependent A/E sigma model: sigma(E) = sqrt(a + (b/E)^c).

static func(x, a, b, c)

Evaluate sqrt(a + (b/x)^c), returning NaN for non-positive arguments.

static guess(_bands, sigmas, _sigma_errs)

Return initial parameter guess [a, b, c].

static string_func(input_param)

Return the expression string for the hit-dict format.

class pygama.pargen.AoE_cal.SigmoidFit

Bases: object

Sigmoid model for A/E cut survival: sf(E) = (a + b·E) · erfc(c·E + d).

static func(x, a, b, c, d)

Evaluate (a + b*x) * erfc(c*x + d).

static guess(_xs, ys, _y_errs)

Return initial parameter guess [a, b, c, d].

pygama.pargen.AoE_cal.aoe_peak_bounds(func, guess, **kwargs)

Build parameter bounds for an A/E peak fit.

Parameters:
  • func – PDF to bound; one of aoe_peak, aoe_peak_with_high_tail, exgauss, or gaussian.

  • guess – Current parameter guess dict (used to derive relative bounds for mu and sigma).

  • **kwargs – Override specific bounds by name (e.g. tau=(0, 0.5)).

Returns:

bounds_dict – Dict mapping each parameter name to a (lower, upper) tuple suitable for passing to limits.

pygama.pargen.AoE_cal.aoe_peak_fixed(func, **_kwargs)

Return the fixed parameters and free-parameter mask for an A/E peak fit.

Parameters:
  • func – PDF to query; one of aoe_peak, aoe_peak_with_high_tail, exgauss, or gaussian.

  • **kwargs – Unused; reserved for future overrides.

Returns:

  • fixed – List of parameter names to hold fixed (typically the fit-range boundaries x_lo and x_hi).

  • mask – Boolean array aligned with func.required_args(); True for free parameters, False for fixed ones.

pygama.pargen.AoE_cal.aoe_peak_guess(func, hist, bins, var, **kwargs)

Build initial parameter guesses for an A/E peak fit.

Estimates the peak centroid and width from the histogram using pgf.gauss_mode_width_max(), falling back to a simple Gaussian guess when that fails. The returned values are packaged via convert_to_minuit() so they can be passed directly to Minuit.

Parameters:
  • func – PDF to guess parameters for; one of aoe_peak, aoe_peak_with_high_tail, exgauss, or gaussian.

  • hist – Histogram counts.

  • bins – Histogram bin edges.

  • var – Histogram bin variances.

  • **kwargs – Override specific guess values by name (e.g. mu=0.95).

Returns:

valuesValueView of initial parameter guesses for func.

pygama.pargen.AoE_cal.average_consecutive(tstamps, means)

Fit the time dependence of the means of the A/E distribution by average consecutive entries

Parameters:
  • tstamps – Timestamps of the data.

  • means – Means of the A/E distribution.

Returns:

out_dict – Dictionary mapping each timestamp to the average of its mean and the next.

pygama.pargen.AoE_cal.bimodal_dt_fit(data, aoe_param, dt_param, fit_selection, cal_energy_param, pdf, debug_mode=False, display=0, **_kwargs)

Calculates the correction needed to align the two drift time regions for ICPC detectors. This is done by fitting the two drift time peaks in the drift time spectrum and then fitting the A/E peaks in each of these regions. A simple linear correction is then applied to align these regions.

Parameters:
  • data (DataFrame) – DataFrame containing the data.

  • aoe_param (str) – Name of the A/E parameter to use as starting point.

  • dt_param (str) – Name of the drift time parameter.

  • fit_selection (str) – Selection string for the fit.

  • cal_energy_param (str) – Name of the calibrated energy parameter.

  • pdf – PDF to use for the A/E fit.

  • debug_mode (bool) – If True, re-raises exceptions instead of logging them.

  • display (int) – Plot verbosity level.

Returns:

  • alpha – Linear correction coefficient.

  • dt_res_dict – Dictionary containing intermediate fit results.

Return type:

tuple[float, dict]

pygama.pargen.AoE_cal.drifttime_corr_plot(aoe_class, data, aoe_param='AoE_Timecorr', aoe_param_corr='AoE_DTcorr', figsize=(12, 8), fontsize=12)
pygama.pargen.AoE_cal.fit_time_means(tstamps, means, sigmas)

Fit the time dependence of the means of the A/E distribution

Parameters:
  • tstamps – Timestamps of the data.

  • means – Means of the A/E distribution.

  • sigmas – Sigmas of the A/E distribution.

Returns:

out_dict – Dictionary mapping each timestamp to its corrected mean value.

pygama.pargen.AoE_cal.get_peak_label(peak)
Return type:

str

pygama.pargen.AoE_cal.interpolate_consecutive(tstamps, means, times, aoe_param, output_name)

Fit the time dependence of the means of the A/E distribution by average consecutive entries

Parameters:
  • tstamps – Timestamps of the data.

  • means – Means of the A/E distribution.

  • times – Unix timestamps of each mean sample.

  • aoe_param – Name of the A/E parameter in the dataframe.

  • output_name – Key to use for the correction expression in the output dict.

Returns:

out_dict – Dictionary mapping each timestamp to a per-output-name correction expression.

pygama.pargen.AoE_cal.mcdrift(data, aoe_param, dt_param, fit_selection, cal_energy_param, debug_mode=False, **_kwargs)

Calculates the drift time correction for the A/E parameter using the Minimum Covariance Determinant (MCD) method and PCA on the (AoE, dt_eff) distribution.

Parameters:
  • data (DataFrame) – DataFrame containing the data.

  • aoe_param (str) – Name of the A/E parameter to use as starting point.

  • dt_param (str) – Name of the drift time parameter.

  • fit_selection (str) – Selection string for the fit.

  • cal_energy_param (str) – Name of the calibrated energy parameter.

  • debug_mode (bool) – If True, re-raises exceptions instead of logging them.

Returns:

  • alpha – Linear correction coefficient.

  • dt_res_dict – Dictionary containing intermediate fit results.

Return type:

tuple[float, dict]

pygama.pargen.AoE_cal.plot_aoe_mean_time(aoe_class, _data, time_param='AoE_Timecorr', figsize=(12, 8), fontsize=12)
pygama.pargen.AoE_cal.plot_aoe_res_time(aoe_class, _data, _time_param='AoE_Timecorr', figsize=(12, 8), fontsize=12)
pygama.pargen.AoE_cal.plot_classifier(aoe_class, data, aoe_param='AoE_Classifier', xrange=(900, 3000), yrange=(-50, 10), xn_bins=700, yn_bins=500, figsize=(12, 8), fontsize=12)
Return type:

figure

pygama.pargen.AoE_cal.plot_compt_bands_overlayed(aoe_class, data, eranges, aoe_param='AoE_Timecorr', aoe_range=None, title='Compton Bands', density=True, n_bins=50, figsize=(12, 8), fontsize=12)

Function to plot various compton bands to check energy dependence and corrections

pygama.pargen.AoE_cal.plot_cut_fit(aoe_class, _data, dep_acc=0.9, figsize=(12, 8), fontsize=12)
Return type:

figure

pygama.pargen.AoE_cal.plot_dt_dep(aoe_class, data, eranges, titles=None, aoe_param='AoE_Timecorr', bins=(200, 100), dt_max=2000, figsize=(12, 8), fontsize=12)

Function to produce 2d histograms of A/E against drift time to check dependencies

pygama.pargen.AoE_cal.plot_mean_fit(aoe_class, _data, figsize=(12, 8), fontsize=12)
Return type:

figure

pygama.pargen.AoE_cal.plot_sf_vs_energy(aoe_class, data, xrange=(900, 3000), n_bins=701, figsize=(12, 8), fontsize=12)
Return type:

figure

pygama.pargen.AoE_cal.plot_sigma_fit(aoe_class, _data, figsize=(12, 8), fontsize=12)
Return type:

figure

pygama.pargen.AoE_cal.plot_spectra(aoe_class, data, xrange=(900, 3000), n_bins=2101, xrange_inset=(1580, 1640), n_bins_inset=200, figsize=(12, 8), fontsize=12)
Return type:

figure

pygama.pargen.AoE_cal.plot_survival_fraction_curves(aoe_class, _data, figsize=(12, 8), fontsize=12)
Return type:

figure

pygama.pargen.AoE_cal.unbinned_aoe_fit(aoe, pdf=<pygama.math.functions.sum_dists.SumDists object>, display=0)

Fitting function for A/E, first fits just a Gaussian before using the full pdf to fit if fails will return NaN values

Parameters:
  • aoe (np.array) – A/E values.

  • pdf – PDF to fit to.

  • display (int) – Verbosity level; 0 = silent, >1 = diagnostic plots.

Returns:

  • values – Fitted parameter values.

  • errors – Fitted parameter errors.

  • covariance – Covariance matrix.

  • fit – Full fit info tuple (values, errors, covariance, gof, pdf, mask, valid, m).

Return type:

tuple(np.array, np.array)

pygama.pargen.data_cleaning module

This module provides routines for calculating and applying quality cuts

pygama.pargen.data_cleaning.find_pulser_properties(df, energy='daqenergy')

Identify pulser peaks in an energy spectrum by their time periodicity.

Locates local maxima in the energy histogram, fits each peak top to estimate its centroid and width, and then checks whether events in each peak arrive with a regular time spacing — a hallmark of pulser signals. Only peaks with a clearly periodic inter-arrival time are returned.

Parameters:
  • df – DataFrame with at least an energy column (named by energy) and a timestamp column in seconds.

  • energy – Name of the energy column to analyse.

Returns:

out_pulsers – List of (pulser_energy, peak_e_err, period, energy_name) tuples, one per detected pulser peak. pulser_energy is the peak centroid, peak_e_err is its half-width, period is the repetition period in seconds, and energy_name is the value of energy.

pygama.pargen.data_cleaning.fit_distributions(x_lo, x_hi, norm_par_array, display=0)

Fit several distribution models to a parameter distribution and return the best-fitting one.

Candidates (Gaussian, exGauss, skewed Gaussian, Gaussian-on-exGauss, double exGauss) are fitted unbinned over the range [x_lo, x_hi]. The winner is chosen by p-value (or lowest chi-squared when all p-values are zero).

Parameters:
  • x_lo – Lower bound of the fitting range.

  • x_hi – Upper bound of the fitting range.

  • norm_par_array – 1-D array of (typically normalised) parameter values.

  • display – Verbosity / plotting level. Values > 0 trigger diagnostic plots.

Returns:

  • func – Best-fitting distribution callable.

  • pars – Parameter array for the winning model.

Return type:

tuple

pygama.pargen.data_cleaning.generate_cut_classifiers(data, cut_dict, rounding=4, display=0)

Finds double sided cut boundaries for a file for the parameters specified

Parameters:
  • data (dict[str, ndarray]) – Input data as an LH5 table, a dictionary of arrays, or a pandas.DataFrame.

  • cut_dict (dict[str, int]) –

    Dictionary of the form:

    {
        "output_parameter_name": {
            "cut_parameter": "parameter_to_cut_on",
            "cut_level": "number_of_sigmas",
            "mode": "inclusive/exclusive"
        }
    }
    

    cut_level may also be a dict to set asymmetric or one-sided cuts:

    {
        "output_parameter_name": {
            "cut_parameter": "parameter_to_cut_on",
            "cut_level": {"low_side": "3", "high_side": "2"},
            "mode": "inclusive/exclusive"
        }
    }
    

    Alternatively, pass through an arbitrary expression directly:

    {
        "is_valid_t0":{
            "expression":"(tp_0_est>a)&(tp_0_est<b)",
            "parameters":{"a":"46000", "b":"52000"}
        }
    }
    

  • rounding (int) – Number of decimal places to round cut boundaries to.

  • display (int) – Verbosity level. Values > 0 produce diagnostic histograms.

Returns:

  • output_dict – Cut expressions in hit-dict format with an additional {out_par}_classifier normalisation entry for each fitted parameter:

    {
        "output_parameter_name": {
            "expression": "cut_expression",
            "parameters": {"a": lower_bound, "b": upper_bound},
        }
    }
    
  • plot_dict – Dictionary of matplotlib.figure.Figure objects keyed by parameter name. Only returned when display > 0.

Return type:

dict

pygama.pargen.data_cleaning.generate_cuts(data, cut_dict, rounding=4, display=0)

Finds double sided cut boundaries for a file for the parameters specified

Parameters:
  • data (dict[str, ndarray]) – Input data as an LH5 table, a dictionary of arrays, or a pandas.DataFrame.

  • cut_dict (dict[str, int]) –

    Dictionary of the form:

    {
        "output_parameter_name": {
            "cut_parameter": "parameter_to_cut_on",
            "cut_level": "number_of_sigmas",
            "mode": "inclusive/exclusive"
        }
    }
    

    cut_level may also be a dict to set asymmetric or one-sided cuts:

    {
        "output_parameter_name": {
            "cut_parameter": "parameter_to_cut_on",
            "cut_level": {"low_side": "3", "high_side": "2"},
            "mode": "inclusive/exclusive"
        }
    }
    

    Alternatively, pass through an arbitrary expression directly:

    {
        "is_valid_t0":{
            "expression":"(tp_0_est>a)&(tp_0_est<b)",
            "parameters":{"a":"46000", "b":"52000"}
        }
    }
    

  • rounding (int) – Number of decimal places to round cut boundaries to.

  • display (int) – Verbosity level. Values > 0 produce diagnostic histograms.

Returns:

  • output_dict – Cut expressions in hit-dict format:

    {
        "output_parameter_name": {
            "expression": "cut_expression",
            "parameters": {"a": lower_bound, "b": upper_bound},
        }
    }
    
  • plot_dict – Dictionary of matplotlib.figure.Figure objects keyed by parameter name. Only returned when display > 0.

Return type:

dict

pygama.pargen.data_cleaning.get_cut_indexes(data, cut_parameters)

Compute a boolean mask of events that pass all specified cuts.

Derives cut boundaries via generate_cuts(), evaluates each cut expression, and returns a combined pass/fail mask.

Parameters:
Returns:

ct_mask – Boolean array of length len(data); True for events passing all cuts.

pygama.pargen.data_cleaning.get_keys(in_data, cut_dict)

Extract the data column names referenced by a cut dictionary.

Scans each entry in cut_dict for the cut_parameter or expression key and returns the subset of those names that actually appear in in_data.

Parameters:
  • in_data – Source of available column names. Either a mapping (e.g. a dict or LH5 Table) whose .keys() give the column names, or an iterable of name strings.

  • cut_dict – Cut specification dictionary, as used by generate_cuts().

Returns:

keys – Sorted list of unique column names from in_data that are referenced by cut_dict.

pygama.pargen.data_cleaning.get_mode_stdev(par_array)

Estimate the mode and standard deviation of a parameter distribution.

Uses a histogram-based approach: the distribution is first clipped to the 1st-99th percentile range, the modal bin is located, and a Gaussian is fit in the neighbourhood of that bin to refine the estimates. Multiple fallback strategies are applied when the primary fit fails.

Parameters:

par_array – 1-D array of parameter values.

Returns:

  • mean – Estimated mode (peak position) of the distribution.

  • std – Estimated standard deviation corresponding to the FWHM of the peak.

Return type:

tuple

pygama.pargen.data_cleaning.get_tcm_pulser_ids(tcm_file, channel, multiplicity_threshold)

Identify pulser event indices from a TCM (Time Coincidence Map) file.

Reads the hardware TCM table and returns the indices of events in which the specified channel appears and the total multiplicity exceeds the given threshold — a signature typical of pulser events that fire multiple channels simultaneously.

Parameters:
  • tcm_file – Path to a single TCM LH5 file, or a list of such file paths. When a list is supplied the results are concatenated.

  • channel – Channel identifier. Accepts an integer or a string with an optional "ch" prefix (e.g. "ch34" or "34").

  • multiplicity_threshold – Minimum event multiplicity required for an event to be labelled as a pulser.

Returns:

  • ids – Array of integer indices of pulser events within the channel’s event list.

  • mask – Boolean mask of the same length as the channel event list.

Return type:

tuple

pygama.pargen.data_cleaning.skewed_fit(x, n_sig, mu, sigma, alpha)
pygama.pargen.data_cleaning.skewed_pdf(x, n_sig, mu, sigma, alpha)
pygama.pargen.data_cleaning.tag_pulsers(df, chan_info, window=0.01)

Tag pulser events in a DataFrame using energy and periodicity cuts.

For each channel descriptor in chan_info, events are selected by a narrow energy window around the pulser peak energy. A linear fit to the pulse timestamps is then used to determine the precise period and phase of the pulser train; events that fall within window seconds of an expected pulse are labelled as pulsers via an isPulser column.

Parameters:
  • df – DataFrame containing at minimum a timestamp column and the energy column(s) referenced in chan_info.

  • chan_info – A single tuple (pulser_energy, peak_e_err, period, energy_name) or a list of such tuples, one per channel. pulser_energy and peak_e_err define the energy selection window; period is the approximate repetition period in seconds; energy_name is the column in df to use.

  • window – Half-width (seconds) of the time residual window around each expected pulse used to define the period cut.

Returns:

df – The input DataFrame with an added (or updated) isPulser column set to 1 for pulser events and 0 for physics events.

Return type:

DataFrame

pygama.pargen.dplms_ge_dict module

This module is for creating dplms dictionary for ge processing

pygama.pargen.dplms_ge_dict.dplms_ge_dict(raw_fft, raw_cal, dsp_config, par_dsp, dplms_dict, fom_func, decay_const=0, ene_par='dplmsEmax', ctc_par='dt_eff', display=0)

This function calculates the dplms dictionary for HPGe detectors.

Parameters:
  • raw_fft (Table) – Table with FFT raw data already baseline-selected upstream. All events in this table are used to estimate the noise matrix.

  • raw_cal (Table) – table with cal raw data

  • dsp_config (dict) – dsp config file

  • par_dsp (dict) – Dictionary with db parameters for dsp processing

  • dplms_dict (dict) – Dictionary with various parameters

  • fom_func – Function for peak fits

Returns:

out_dict

Return type:

dict

pygama.pargen.dplms_ge_dict.filter_synthesis(ref, nmat, rmat, za, pmat, fmat, length, size, flip=True)

Synthesise a DPLMS optimal filter for a single channel.

Solves the linear system (N + R + za*I + P + F) x = r for the filter coefficients x, where N, R, P, and F are the noise, reference, pile-up, and flat-top matrices respectively, za is a zero-area regularisation term, and r is the reference pulse window.

Parameters:
  • ref (array) – Mean reference pulse of length size.

  • nmat (array) – Noise covariance matrix, shape (length, length).

  • rmat (array) – Reference signal matrix, shape (length, length).

  • za (int) – Zero-area regularisation coefficient.

  • pmat (array) – Pile-up penalty matrix, shape (length, length).

  • fmat (array) – Flat-top derivative matrix, shape (length, length).

  • length (int) – Filter length (samples).

  • size (int) – Full waveform buffer length (samples); used to centre the reference window.

  • flip (bool) – If True (default), return the time-reversed filter coefficients so that the filter can be applied via convolution.

Returns:

  • x – Filter coefficients of length length.

  • y – Response of the filter convolved with the reference pulse.

  • refy – Reference pulse window aligned to y for comparison.

Return type:

tuple

pygama.pargen.dplms_ge_dict.filter_synthesis_corr(ref, nmat_corr, rmat, za, pmat, fmat, length, size, flip=True)

Synthesise a DPLMS optimal filter exploiting cross-channel noise correlations.

Extends filter_synthesis() to the multi-channel case by building block-extended versions of the signal-shape matrices and solving the combined linear system. Only the primary-channel slice of the solution is returned.

Parameters:
  • ref (array) – Mean reference pulse for the primary channel, length size.

  • nmat_corr (array) – Block noise covariance matrix from noise_matrix_corr(), shape (n_channels * length, n_channels * length).

  • rmat (array) – Reference signal matrix for the primary channel, shape (length, length).

  • za (int) – Zero-area regularisation coefficient.

  • pmat (array) – Pile-up penalty matrix for the primary channel, shape (length, length).

  • fmat (array) – Flat-top derivative matrix for the primary channel, shape (length, length).

  • length (int) – Filter length per channel (samples).

  • size (int) – Full waveform buffer length (samples).

  • flip (bool) – If True (default), return the time-reversed filter coefficients.

Returns:

  • x – Primary-channel filter coefficients of length length.

  • y – Response of the filter convolved with the (extended) reference pulse.

  • refy – Reference pulse window aligned to y for comparison.

Return type:

tuple

pygama.pargen.dplms_ge_dict.is_not_pile_up(peak_pos, peak_pos_neg, thr, lim, size)

Reject pile-up events based on the presence of secondary peaks.

A histogram of positive-peak positions is used to define a single-peak band; events with additional peaks outside that band, or any negative peaks, are flagged as pile-up.

Parameters:
  • peak_pos (array) – 2-D array of positive peak positions per event (samples).

  • peak_pos_neg (array) – 2-D array of negative peak positions per event (samples).

  • thr (int) – Amplitude threshold (percent of histogram maximum) used to define the edges of the primary peak band.

  • lim (int) – Half-width of the search window around the nominal waveform centre (samples).

  • size (int) – Length of the waveform window used for filter synthesis (samples).

Returns:

  • idxs – Per-event mask; True means the event is pile-up free.

  • llow – Lower boundary of the accepted peak band (samples).

  • lupp – Upper boundary of the accepted peak band (samples).

Return type:

tuple

pygama.pargen.dplms_ge_dict.is_valid_centroid(centroid, lim, size, full_size)

Select waveforms whose centroid lies within a valid alignment window.

Parameters:
  • centroid (array) – Per-event centroid positions (samples).

  • lim (int) – Half-width of the allowed centroid displacement from the nominal centre of the waveform window (samples).

  • size (int) – Length of the waveform window used for filter synthesis (samples).

  • full_size (int) – Total waveform buffer length (samples).

Returns:

  • idxs – Boolean mask that is True for valid events.

  • llim – Lower centroid boundary used (samples).

  • hlim – Upper centroid boundary used (samples).

Return type:

tuple

pygama.pargen.dplms_ge_dict.is_valid_risetime(risetime, llim, perc)

Select waveforms within an acceptable risetime range.

Parameters:
  • risetime (array) – Per-event risetime values (samples), e.g. tp_90 - tp_10.

  • llim (int) – Minimum allowed risetime (samples).

  • perc (float) – Upper percentile cutoff applied to the non-NaN risetime distribution.

Returns:

  • idxs – Boolean mask; True for events within the accepted risetime range.

  • llim – Lower risetime boundary used (samples).

  • hlim – Upper risetime boundary used (samples, computed from perc).

Return type:

tuple

pygama.pargen.dplms_ge_dict.noise_matrix(bls, length)

Compute the noise covariance matrix from baseline waveforms.

The baselines are mean-subtracted, the raw covariance is estimated, and then compressed to the filter length via a sliding-window average (convolution with an identity kernel).

Parameters:
  • bls (array) – 2-D array of baseline waveform segments, shape (n_events, n_samples).

  • length (int) – Target filter length (samples). The output matrix has shape (length, length).

Returns:

nmat – Noise covariance matrix of shape (length, length).

Return type:

ndarray

pygama.pargen.dplms_ge_dict.noise_matrix_corr(bls, bls_corr, length)

Compute a block noise covariance matrix including cross-channel correlations.

Extends noise_matrix() to the multi-channel case by building an (n_channels * length, n_channels * length) block matrix where each block is the cross-channel noise covariance between two channels, compressed to length via the same sliding-window average used in noise_matrix().

Parameters:
  • bls (ndarray) – Baseline waveforms for the primary channel, shape (n_events, n_samples).

  • bls_corr (list[ndarray]) – List of baseline waveform arrays for correlated channels, each of shape (n_events, n_samples).

  • length (int) – Target filter length (samples).

Returns:

nmat – Symmetrised block noise covariance matrix of shape (n_channels * length, n_channels * length).

Return type:

ndarray

pygama.pargen.dplms_ge_dict.signal_matrices(wfs, length, decay_const, ff=2)

Compute the signal-shape matrices needed for DPLMS filter synthesis.

Three matrices are constructed from the waveform ensemble:

  • rmat - outer product of the mean reference pulse (shape constraint).

  • pmat - pile-up penalty matrix built from an exponential decay model.

  • fmat - flat-top matrix encoding the derivative of the flat-top region.

Parameters:
  • wfs (array) – 2-D array of aligned signal waveforms, shape (n_events, n_samples).

  • length (int) – Filter length (samples); determines the output matrix dimensions.

  • decay_const (float) – Exponential decay constant used to build the pile-up penalty matrix. Set to 0 to disable the pile-up term.

  • ff (int) – Flat-top length (samples) used to construct the flat-top derivative matrix.

Returns:

  • ref – Mean reference pulse of length n_samples.

  • rmat – Reference signal matrix, shape (length, length).

  • pmat – Pile-up penalty matrix, shape (length, length).

  • fmat – Flat-top derivative matrix, shape (length, length).

Return type:

tuple

pygama.pargen.dplms_ge_dict.signal_selection(dsp_cal, dplms_dict, coeff_values)

Apply centroid, pile-up, and risetime cuts to select clean calibration signals.

Parameters:
  • dsp_cal – DSP output table (lgdo.Table) containing at minimum peak_pos, peak_pos_neg, centroid, tp_90, and tp_10 fields.

  • dplms_dict – Dictionary with DPLMS configuration parameters including rt_low, peak_lim, wsize, bsize, centroid_lim, and dp_def sub-dictionary with rt and pt defaults.

  • coeff_values – Dictionary of per-optimisation-point overrides; may contain rt (risetime percentile) and pt (pile-up threshold) keys.

Returns:

sel_dict – Selection dictionary with keys:

  • idxs - combined boolean mask of passing events

  • ct_ll, ct_hh - centroid window boundaries

  • pp_ll, pp_hh - pile-up peak band boundaries

  • rt_ll, rt_hh - risetime window boundaries

Return type:

dict

pygama.pargen.dsp_optimize module

class pygama.pargen.dsp_optimize.BayesianOptimizer(acq_func, batch_size, kernel=None, sampling_rate=None, fom_value='y_val', fom_error='y_val_err')

Bases: object

Bayesian optimization using a Gaussian Process regression.

This optimizer fits a Gaussian Process (GP) to fit kernel to data, takes in a series of init samples for this fit and then calculates the next point using the acquisition function specified.

The GP model is implemented using sklearn.gaussian_process.GaussianProcessRegressor.

Parameters:
  • acq_func (str) –

    Acquisition function used to select new sampling points.

    Supported values are:
    • "ei" : Expected Improvement

    • "ucb" : Upper Confidence Bound

    • "lcb" : Lower Confidence Bound

  • batch_size (int) – Number of candidate points proposed at each optimization iteration.

  • kernel (Kernel | None) – Optional kernel used by the Gaussian Process regressor. If None, the default kernel from GaussianProcessRegressor is used.

  • sampling_rate (str | pint.Quantity | None) – Optional sampling rate associated with evaluations. If given as a string it will be converted to a pint.Quantity.

  • fom_value (str) – Column name containing the figure-of-merit values in the input data,

  • fom_error (str) – Column name containing the uncertainty associated with the figure-of-merit values.

Variables:
  • gauss_pr – Gaussian Process model used.

  • acq_function – Acquisition function method corresponding to the specified acq_func.

  • batch_size – Batch size used in the Gaussian Process optimization.

  • dims – List of dimensions (parameters) added to the optimizer’s search space.

  • iters – Counter for the number of optimization iterations performed.

  • best_samples – DataFrame storing the best evaluated samples and their acquisition values.

  • distances – List of distances between sampled points, used for diagnostics.

  • x_init – Initial input points provided via add_initial_values.

  • y_init – Observed function values corresponding to x_init.

  • yerr_init – Uncertainty associated with y_init.

  • y_min – Minimum observed value in y_init. Set by get_first_point.

  • optimal_x – Input corresponding to y_min. Set by get_first_point.

  • optimal_ei – Placeholder for expected improvement at optimal_x. Set by get_first_point.

  • current_x – Current optimization point being evaluated. Updated by update_db_dict().

  • current_ei – Acquisition function value for current_x. Updated by update_db_dict().

  • current_iter – Optimization iteration counter. Incremented each time update_db_dict() is called.

Examples

Simple 1D optimization:

import numpy as np

# Create optimizer
optimizer = BayesianOptimizer(acq_func="ei", batch_size=1)
optimizer.add_dimension("x", parameter="x", min_val=-5, max_val=5)

# Add initial observations
optimizer.add_initial_values(
    np.array([[0], [2], [-1]]), np.array([10.0, 5.0, 7.0]), np.array([0.1, 0.2, 0.1])
)

# Update DSP parameters database with the current optimization point
db_dict = optimizer.update_db_dict({})

# Simulate new evaluation and update optimizer
optimizer.update({"x": [1.0], "y_val": 3.5, "y_val_err": 0.1})

# Plot the result
fig = optimizer.plot()
_extend_prior_with_posterior_data(x, y, yerr)

Append a new observation (x, y, yerr) to the internal training arrays.

_get_expected_improvement(x_new)

Compute the Expected Improvement acquisition value at x_new.

_get_lcb(x_new)

Compute the Lower Confidence Bound acquisition value at x_new.

_get_next_probable_point()

Find the next candidate point by minimising the acquisition function.

Samples batch_size random starting points within the search bounds and runs L-BFGS-B from each. Values that coincide with an existing sample are perturbed slightly. Parameters with round_to_samples set are rounded to the nearest multiple of sampling_rate.

Returns:

  • x_optimal – List of parameter values for the next candidate point.

  • min_ei – Acquisition function value at x_optimal.

_get_ucb(x_new)

Compute the Upper Confidence Bound acquisition value at x_new.

add_dimension(name, parameter, min_val, max_val, round_to_samples=False, unit=None)

Add a new dimension (parameter) to the optimizer’s search space.

Each dimension defines a variable that the Bayesian optimizer will vary to explore the objective function.

Parameters:
  • name (str) – Name of the dimension.

  • parameter (float) – Initial value or reference for this parameter.

  • min_val (float) – Minimum allowed value for this parameter.

  • max_val (float) – Maximum allowed value for this parameter.

  • round_to_samples (bool) – If True, sampled values along this dimension will be rounded to multiples of sampling_rate. Requires sampling_rate to be set in the optimizer.

  • unit (str | Quantity | None) – Physical unit of the parameter (e.g., “K”, “mL”). Converted to a pint.Quantity internally if provided.

add_initial_values(x_init, y_init, yerr_init)

Add initial observed data to the optimizer.

These initial samples are used to fit the Gaussian Process model before starting the Bayesian optimization iterations.

Parameters:
  • x_init (Any) – Initial input points. Shape should match the number of dimensions added via add_dimension.

  • y_init (Any) – Observed function values corresponding to x_init.

  • yerr_init (Any) – Uncertainty (error) associated with y_init.

Notes

The lengths of x_init, y_init, and yerr_init must all match.

get_best_vals()

Return the best-observed parameter values as a DSP database dictionary.

Formats optimal_x into the same {name: {parameter: value_str}} structure expected by run_one_dsp().

Returns:

out_dict – DSP parameter database dictionary with the optimal values.

get_first_point()

Get the first optimization point based on initial data.

Selects the input point corresponding to the minimum observed value in the initial samples (y_init). This point can serve as the starting point for the Bayesian optimization process.

Sets the following attributes on the optimizer: - y_min : Minimum observed value in y_init. - optimal_x : Input corresponding to y_min. - optimal_ei : Initialized to None.

Returns:

  • optimal_x – Input point corresponding to the minimum observed y_init.

  • optimal_ei – Placeholder for the expected improvement at optimal_x (initialized to None).

Return type:

tuple[Any, None]

get_n_dimensions()

Return the number of dimensions in the search space.

iterate_values()

Fit the GP to current data and propose the next evaluation point.

Trains the Gaussian Process on all valid (non-NaN) observations accumulated so far, then calls _get_next_probable_point() to select the next candidate.

Returns:

  • x_next – Parameter values for the next candidate point.

  • ei – Acquisition function value at x_next.

plot(init_samples=None)

Plot the Gaussian Process regression and acquisition function.

For 1D and 2D optimization problems, this method visualizes the fitted Gaussian Process and the acquisition function across the search space. Initial samples, failed samples, and the optimal point are highlighted.

Parameters:

init_samples – Optional list of initial sample points to highlight on the plot.

Returns:

fig – Matplotlib Figure object containing the plot of the Gaussian Process regression and acquisition function.

Return type:

Figure

plot_acq(init_samples=None)

Plot the acquisition function across the search space.

update(results)

Update the optimizer with new evaluation results.

This method updates the internal state of the Bayesian optimizer with the latest evaluation at current_x. It extends the prior data used by the Gaussian Process, updates the current best observed point, and tracks distances between consecutive sampled points.

Parameters:

results (Mapping[str, Any]) – Dictionary-like object containing the latest evaluation. Must include keys corresponding to fom_value and fom_error.

update_db_dict(db_dict)

Update a DSP parameters database with the current optimization point.

This method updates db_dict with the values of the current point being evaluated by the optimizer. If it is the first iteration, it retrieves the initial point; otherwise, it computes the next point using iterate_values(). Each dimension’s value is stored in the database, with units appended if specified.

Parameters:

db_dict (Mapping) – Dictionary representing the DSP parameters database. Keys are dimension names, and values are dictionaries mapping parameter identifiers to their corresponding values as strings (including units).

Returns:

Mapping – Updated DSP parameters database including the current optimization point.

Return type:

Mapping

class pygama.pargen.dsp_optimize.OptimiserDimension(name, parameter, min_val, max_val, round, unit)

Bases: tuple

Create new instance of OptimiserDimension(name, parameter, min_val, max_val, round, unit)

_asdict()

Return a new dict which maps field names to their values.

_field_defaults = {}
_fields = ('name', 'parameter', 'min_val', 'max_val', 'round', 'unit')
classmethod _make(iterable)

Make a new OptimiserDimension object from a sequence or iterable

_replace(**kwds)

Return a new OptimiserDimension object replacing specified fields with new values

max_val

Alias for field number 3

min_val

Alias for field number 2

name

Alias for field number 0

parameter

Alias for field number 1

round

Alias for field number 4

unit

Alias for field number 5

class pygama.pargen.dsp_optimize.ParGrid

Bases: object

Parameter Grid class Each ParGrid entry corresponds to a dsp parameter to be varied. The ntuples must follow the pattern: ( name parameter value_strs) : ( str, str, list of str) where name and parameter are the same as ‘db.name.parameter’ in the processing chain, value_strs is the array of strings to set the argument to.

add_dimension(name, parameter, value_strs)

Add a new dimension to the parameter grid.

Parameters:
  • name – Dimension name, matching the db.name prefix in the DSP chain.

  • parameter – Parameter name within name, matching db.name.parameter.

  • value_strs – List of string values to sweep over for this dimension.

get_data(i_dim, i_par)

Return the name, parameter, and value string for a grid cell.

Parameters:
  • i_dim – Dimension index.

  • i_par – Parameter (point) index within that dimension.

Returns:

  • name – Dimension name string.

  • parameter – Parameter name string.

  • value_str – String value at position i_par along dimension i_dim.

get_n_dimensions()

Return the number of dimensions in the grid.

get_n_grid_points()

Return the total number of grid points (product of all axis lengths).

get_n_points_of_dim(i)

Return the number of grid points along dimension i.

get_par_meshgrid(copy=False, sparse=False)

return a meshgrid of parameter values Always uses Matrix indexing (natural for par grid) so that mg[i1][i2][…] corresponds to index order in self.dims Note copy is False by default as opposed to numpy default of True

get_shape()

Return the shape of the N-dimensional grid as a tuple.

get_zero_indices()

Return a zero-initialised index array of length get_n_dimensions().

iterate_indices(indices)

iterate given indices [i1, i2, …] by one. For easier iteration. The convention here is arbitrary, but its the order the arrays would be traversed in a series of nested for loops in the order appearing in dims (first dimension is first for loop, etc): Return False when the grid runs out of indices. Otherwise returns True.

print_data(indices)

Format a human-readable description of a grid point.

Parameters:

indices – Sequence of per-dimension indices identifying the grid point.

Returns:

print_string – Multi-line string listing each name.parameter = value entry.

set_dsp_pars(db_dict, indices)

Populate a DSP parameter database from grid indices.

Parameters:
  • db_dict – Existing database dictionary to update, or None to create a new one.

  • indices – Per-dimension index array identifying the grid point.

Returns:

db_dict – Updated database dictionary with the parameter values for the specified grid point.

class pygama.pargen.dsp_optimize.ParGridDimension(name, parameter, value_strs)

Bases: tuple

Create new instance of ParGridDimension(name, parameter, value_strs)

_asdict()

Return a new dict which maps field names to their values.

_field_defaults = {}
_fields = ('name', 'parameter', 'value_strs')
classmethod _make(iterable)

Make a new ParGridDimension object from a sequence or iterable

_replace(**kwds)

Return a new ParGridDimension object replacing specified fields with new values

name

Alias for field number 0

parameter

Alias for field number 1

value_strs

Alias for field number 2

pygama.pargen.dsp_optimize.get_grid_points(grid)

Generate a list of all grid-point index combinations.

Iterates all grids simultaneously, cycling each grid back to its zero index when it is exhausted. Iteration stops when every grid has completed at least one full cycle.

Parameters:

grid – List of ParGrid objects defining the search space.

Returns:

out – List of [tuple, ...] entries, where each entry contains one index tuple per grid.

pygama.pargen.dsp_optimize.run_bayesian_optimisation(tb_data, dsp_config, fom_function, optimisers, fom_kwargs=None, db_dict=None, nan_val=10, n_iter=10)

Run Bayesian optimization loop to optimize DSP parameters.

Parameters:
  • tb_data (Table) – An input table of the data for which the DSP is performed.

  • dsp_config (str | Mapping) – Configuration for the DSP processing chain, see run_one_dsp()

  • fom_function (list | Callable) – Function(s) to calculate the figure-of-merit from the DSP output table. If a list is given, each function will be applied to the output of each optimiser in order. If a single function is given, it will be applied to the output of all optimisers. This function should return a dictionary containing a keys of name fom_value and fom_error in the optimisers, which will be used to inform the optimization process.

  • optimisers (list | BayesianOptimizer) – A list of BayesianOptimizer instances to run in parallel. Each optimizer will update the DSP parameters database and receive the results of each iteration to inform its next point selection.

  • fom_kwargs (Mapping | None) – Optional keyword arguments for the fom_function. If a list is given, each set of kwargs will be applied to the corresponding function in fom_function. If a single set of kwargs is given, it will be applied to all functions.

  • db_dict (Mapping | None) – Optional initial DSP parameters database to be updated by the optimizers. If not provided, an empty dictionary will be used.

  • nan_val (float | list) – Value to assign to the figure-of-merit if it is NaN. Can be a single float applied to all optimizers or a list of floats corresponding to each optimizer.

  • n_iter (int) – Number of optimization iterations to perform. Each iteration consists of each optimizer proposing a new point, running the DSP with those parameters, evaluating the figure-of-merit, and updating the optimizers with the results.

Returns:

  • out_param_dict – A dictionary containing the best parameter values found by each optimizer, formatted for updating the DSP parameters database.

  • out_results_list – A list of dictionaries containing the optimal results corresponding to the best parameters found by each optimizer.

Return type:

tuple[Mapping, list]

Note

The optimisers need to be initialised with the parameters set, and initial values added.

pygama.pargen.dsp_optimize.run_grid(tb_data, dsp_config, grid, fom_function, db_dict=None, verbosity=1, **fom_kwargs)

Extract a table of optimization values for a grid of DSP parameters The grid argument defines a list of parameters and values over which to run the DSP defined in dsp_config on tb_data. At each point, a scalar figure-of-merit is extracted.

Returns a N-dimensional ndarray of figure-of-merit values, where the array axes are in the order they appear in grid.

Parameters:
  • tb_data – Input LH5 table of waveform data.

  • dsp_config – DSP processing chain configuration.

  • gridParGrid defining the parameter sweep.

  • fom_function – Callable that receives the output table and returns a scalar FOM.

  • db_dict – Optional base DSP parameter database dictionary.

  • verbosity – Verbosity level for the processing chain and FOM callable.

  • **fom_kwargs – Additional keyword arguments forwarded to fom_function.

Returns:

grid_values – N-dimensional ndarray of FOM values; the M-th axis corresponds to the M-th dimension of grid.

pygama.pargen.dsp_optimize.run_grid_point(tb_data, dsp_config, grids, fom_function, iii, db_dict=None, verbosity=1, fom_kwargs=None)

Run DSP and evaluate the figure-of-merit at a single grid point.

Sets the DSP parameter database from grids and iii, runs the DSP chain, and optionally evaluates one or more figure-of-merit functions.

Parameters:
  • tb_data – Input LH5 table of waveform data.

  • dsp_config – DSP processing chain configuration dictionary.

  • grids – A single ParGrid or a list of them.

  • fom_function – A single callable or list of callables; each is called as fom_function(tb_out, verbosity, fom_kwargs[i]). When None, the raw output table is returned.

  • iii – List of per-grid index arrays, one per entry in grids.

  • db_dict – Optional base DSP parameter database to update.

  • verbosity – Verbosity level forwarded to the DSP chain and FOM callables.

  • fom_kwargs – List of keyword-argument dicts, one per FOM callable.

Returns:

out – Dictionary with keys "indexes" (list of grid-point tuples) and "results" (array of FOM values, or the output table when fom_function is None).

pygama.pargen.dsp_optimize.run_one_dsp(tb_data, dsp_config, db_dict=None, fom_function=None, verbosity=0, fom_kwargs=None)

Run one iteration of DSP on tb_data`, based on the DSP specified in ``dsp_config.

Optionally returns a value for optimization

Parameters:
  • tb_data (Table) – Input LH5 table of waveform data. Typically a preselected subset is passed so optimisation does not run over the full dataset.

  • dsp_config (Mapping) – DSP processing chain configuration (see build_processing_chain).

  • db_dict (Mapping | None) – Optional DSP parameter database dictionary.

  • fom_function (Callable | None) – When provided, called on the output table to return a scalar figure-of-merit. Should accept a verbosity argument.

  • verbosity (int) – Verbosity level for the processing chain and FOM callable.

  • fom_kwargs – Optional keyword arguments forwarded to fom_function.

Returns:

  • figure_of_merit – Scalar FOM value returned by fom_function when it is not None.

  • tb_out – Output LH5 table when fom_function is None.

Return type:

float | Table

pygama.pargen.energy_cal module

routines for automatic calibration.

  • hpge_find_energy_peaks (Find uncalibrated E peaks whose E spacing matches the pattern in peaks_kev)

  • hpge_get_energy_peaks (Get uncalibrated E peaks at the energies of peaks_kev)

  • hpge_fit_energy_peaks (fits the energy peals)

  • hpge_E_calibration (main routine – finds and fits peaks specified)

class pygama.pargen.energy_cal.FWHMLinear

Bases: object

Energy-resolution model FWHM(E) = sqrt(a + b·E).

static bounds(ys)

Return parameter bounds [(a_lo, a_hi), (b_lo, b_hi)].

static func(x, a, b)

Evaluate sqrt(a + b*x).

static guess(xs, ys, y_errs)

Return initial parameter guess [a, b].

static string_func(input_param)

Return the expression string for the hit-dict format.

class pygama.pargen.energy_cal.FWHMQuadratic

Bases: object

Energy-resolution model FWHM(E) = sqrt(a + b·E + c·E²).

static bounds(ys)

Return parameter bounds [(a_lo, a_hi), (b_lo, b_hi), (c_lo, c_hi)].

static func(x, a, b, c)

Evaluate sqrt(a + b*x + c*x**2).

static guess(xs, ys, y_errs)

Return initial parameter guess [a, b, c].

static string_func(input_param)

Return the expression string for the hit-dict format.

class pygama.pargen.energy_cal.HPGeCalibration(energy_param, glines, guess_kev, deg=1, uncal_is_int=False, fixed=None, debug_mode=False)

Bases: object

Calibrate HPGe data to a set of known peaks.

Stores the calibration polynomial, peak locations, and intermediate fit results. Each method appends to a results dictionary so that the full calibration history is preserved.

Parameters:
  • energy_param – Name of the uncalibrated energy parameter (used in logging).

  • glines – List of known peak energies in keV to fit to.

  • guess_kev (float) – Rough initial conversion factor from ADC to keV (must be positive).

  • deg (int) – Degree of the calibration polynomial E_kev = poly(e_uncal). deg = 0 is a simple scaling; deg = -1 fixes the constant term. Follows the numpy.polynomial.polynomial convention (lowest to highest order).

  • uncal_is_int (bool) – If True, avoids picket-fence binning artefacts when e_uncal is integer-valued.

  • fixed – Dictionary of fixed polynomial coefficient indices and their values.

  • debug_mode (bool) – If True, exceptions are re-raised instead of being caught and logged.

calibrate_prominent_peak(e_uncal, peak, peak_pars, allowed_p_val=1e-20, tail_weight=0, peak_param='mode', n_events=None)

Calibrate using a single prominent peak (degree-0 calibration).

Fits only the specified peak with update_cal_pars=True, then re-fits all peaks with update_cal_pars=False to populate the full results, and fits both energy-resolution curves. Requires deg == 0 (pure scaling calibration).

Parameters:
  • e_uncal – 1-D array of uncalibrated energy values.

  • peak – Known energy of the prominent peak in keV.

  • peak_pars – List of (peak_kev, (kev_lo, kev_hi), func) tuples.

  • allowed_p_val – Minimum chi-squared p-value for an accepted fit.

  • tail_weight – Weight for the tail-suppression prior.

  • peak_param – Parameter used to extract the peak position.

  • n_events – Maximum number of events to use; None uses all.

fit_calibrated_peaks(e_uncal, peak_pars)

Fit peaks using an existing calibration without updating the polynomial.

Calls hpge_get_energy_peaks() and hpge_fit_energy_peaks() with update_cal_pars=False, then fits the energy resolution curves for both FWHMLinear and FWHMQuadratic.

Parameters:
  • e_uncal – 1-D array of uncalibrated energy values.

  • peak_pars – List of (peak_kev, (kev_lo, kev_hi), func) tuples.

static fit_energy_res_curve(fwhm_func, fwhm_peaks, fwhms, dfwhms)

Fit an energy-resolution model to a set of measured FWHMs.

Performs a weighted least-squares fit of fwhm_func to fwhms at fwhm_peaks and evaluates the chi-squared p-value. Returns a NaN-filled result dictionary when the fit fails.

Parameters:
  • fwhm_func – Energy resolution model class with func, string_func, guess, and bounds static methods (e.g. FWHMLinear).

  • fwhm_peaks – Peak energies in keV at which FWHMs were measured.

  • fwhms – Measured FWHM values in keV.

  • dfwhms – Uncertainties on fwhms.

Returns:

results – Dictionary with keys function, module, expression, parameters, uncertainties, cov, csqr, and p_val.

full_calibration(e_uncal, peak_pars, allowed_p_val=1e-20, tail_weight=0, peak_param='mode', n_events=None)

Run the complete HPGe energy calibration pipeline.

Sequentially calls hpge_find_energy_peaks(), hpge_get_energy_peaks(), hpge_fit_energy_peaks(), and get_energy_res_curve() (for both FWHMLinear and FWHMQuadratic). If peaks are dropped during fitting, a second attempt is made with adjusted fit windows.

Parameters:
  • e_uncal – 1-D array of uncalibrated energy values.

  • peak_pars – List of (peak_kev, (kev_lo, kev_hi), func) tuples specifying the fit window and model for each peak.

  • allowed_p_val – Minimum chi-squared p-value for a peak fit to be accepted.

  • tail_weight – Weight for the tail-suppression prior in the fit cost function.

  • peak_param – Parameter used to extract the peak position ("mode" or "mu").

  • n_events – Maximum number of events to use; None uses all.

gen_pars_dict()

Generate the calibration expression dictionary for use in a hit-dict.

Returns:

pars_dict – Dictionary with keys "expression" (the polynomial string) and "parameters" (mapping coefficient names to their values).

get_energy_res_curve(fwhm_func, interp_energy_kev=None)

Fit the energy-resolution curve and optionally interpolate to target energies.

Collects calibrated FWHMs from the most recent peak-fit results (excluding Doppler-broadened peaks at ~511, 1592.53, and 2103.5 keV), calls fit_energy_res_curve(), and optionally calls interpolate_energy_res(). The result is stored under fwhm_func.__name__ in the most recent results entry.

Parameters:
get_fwhms()

Compute and store calibrated FWHMs for all fitted peaks.

Reads uncalibrated FWHM values from the most recent peak-parameter results, converts them to keV using the derivative of the calibration polynomial, and writes fwhm_in_kev and fwhm_err_in_kev back into each peak’s parameter dictionary.

hpge_cal_energy_peak_tops(e_uncal, n_sigmas=20, peaks_kev=None, default_n_bins=50, n_events=None, allowed_p_val=0.01, update_cal_pars=True, fit_width_bound_kev=15)

Perform energy calibration for HPGe detector using peak fitting.

Args:
e_uncal (array-like):

Uncalibrated energy values.

n_sigmas (float, optional):

Number of standard deviations to use for peak fitting range. Defaults to 1.2.

peaks_kev (array-like, optional):

Known peak positions in keV. If not provided, uses self.peaks_kev. Defaults to None.

default_n_bins (int, optional):

Number of bins for histogram. Defaults to 50.

n_events (int, optional):

Number of events to consider for calibration. Defaults to None which uses all events.

allowed_p_val (float, optional):

Maximum p-value for a fit to be considered valid. Defaults to 0.05.

update_cal_pars (bool, optional):

Whether to update the calibration parameters. Defaults to True.

hpge_find_energy_peaks(e_uncal, peaks_kev=None, n_sigma=5, etol_kev=None, bin_width_kev=1, erange=None, var_zero=1, update_cal_pars=True)

Find uncalibrated energy peaks whose energy spacing matches the pattern in peaks_kev.

Parameters:
  • (array-like) (e_uncal) – Uncalibrated energy values.

  • (array-like (peaks_kev) – Pattern of energy peaks to match. If not provided, the peaks from the object’s attribute peaks_kev will be used.

  • optional) – Pattern of energy peaks to match. If not provided, the peaks from the object’s attribute peaks_kev will be used.

  • (float (var_zero) – Number of standard deviations above the mean to consider a peak significant. Default is 5.

  • optional) – Number of standard deviations above the mean to consider a peak significant. Default is 5.

  • (float – Tolerance in energy units for matching peaks. If not provided, it will be estimated based on the peak widths.

  • optional) – Tolerance in energy units for matching peaks. If not provided, it will be estimated based on the peak widths.

  • (float – Width of the energy bins for initial peak search. Default is 1 keV.

  • optional) – Width of the energy bins for initial peak search. Default is 1 keV.

  • (tuple (erange) – Range of uncalibrated energy values to consider. If not provided, the range will be determined based on the peaks.

  • optional) – Range of uncalibrated energy values to consider. If not provided, the range will be determined based on the peaks.

  • (float – Value to replace zero variance with. Default is 1.

  • optional) – Value to replace zero variance with. Default is 1.

  • (bool (update_cal_pars) – Whether to update the calibration parameters. Default is True.

  • optional) – Whether to update the calibration parameters. Default is True.

  • Returns – None

hpge_fit_energy_peaks(e_uncal, peak_pars=None, peaks_kev=None, bin_width_kev=1, peak_param='mode', method='unbinned', n_events=None, allowed_p_val=0.01, tail_weight=0, update_cal_pars=True, use_bin_width_in_fit=True)

Fit the energy peaks specified using the given function.

Parameters:
  • e_uncal – Unbinned energy data to be fit.

  • peaks_kev – Peak energies in keV to fit. Defaults to self.peaks_kev.

  • peak_pars – List of (peak, range, func) tuples specifying the energy, fit range (keV), and fit function for each peak.

  • bin_width_kev – Default bin width for fit-window histogramming. Default is 1 keV.

  • peak_param – Peak-location parameter to use in fitting. Default is "mode".

  • method – Fitting method: "unbinned" (default) or "binned".

  • n_events – Number of events to use for the unbinned fit.

  • allowed_p_val – Lower limit on the fit p-value.

  • tail_weight – Weight applied to the tail of the fit.

  • update_cal_pars – Whether to update the calibration parameters. Default is True.

Returns:

results_dict – Dictionary containing the fit results for each peak.

Raises:

RuntimeError – If the fit fails.

Notes

This function fits the energy peaks specified using the given function. It calculates the range around each peak to fit, performs the fitting using either unbinned or binned method, and returns the fit results in a dictionary.

hpge_get_energy_peaks(e_uncal, peaks_kev=None, n_sigma=3, etol_kev=5, var_zero=1, bin_width_kev=0.2, update_cal_pars=True, erange=None)

Get uncalibrated E peaks at the energies of peaks_kev

Parameters:
  • e_uncal – Uncalibrated energy values.

  • peaks_kev – Energies of peaks to search for (in keV). Defaults to self.peaks_kev.

  • n_sigma – Threshold for detecting a peak in sigma (i.e. sqrt(var)). Default is 3.

  • etol_kev – Absolute tolerance in energy for matching peaks. Default is 5.

  • var_zero – Replacement value for zeros in var to avoid divide-by-zero in hist/sqrt(var). Default is 1.

  • bin_width_kev – Width of the energy bins for re-binning the histogram. Default is 0.2 keV.

  • update_cal_pars – Whether to update the calibration parameters. Default is True.

  • erange – Range of energy values to consider for the peak search. Determined automatically from peaks_kev if not provided.

Returns:

None

Notes

This method performs the following steps: 1. Re-bins the histogram in ~0.2 keV bins with updated energy scale parameters for peak-top fits. 2. Finds all local maxima in the histogram with significance greater than n_sigma. 3. Matches the calculated peak energies with the expected peak energies. 4. Removes duplicate peak matches. 5. Updates the input peaks, got peaks, and got peak locations in the results dictionary. 6. If update_cal_pars is True, calculates the updated calibration curve using the matched peak energies.

static interpolate_energy_res(fwhm_func, fwhm_peaks, fwhm_results, interp_energy_kev=None, debug_mode=False)

Interpolate the fitted energy-resolution curve to target energies.

For each entry in interp_energy_kev, evaluates the fitted model at the requested energy and estimates the uncertainty via bootstrap resampling of the covariance matrix. Results are added to fwhm_results in-place and the same dict is returned.

Parameters:
  • fwhm_func – Energy resolution model class (e.g. FWHMLinear).

  • fwhm_peaks – Peak energies used in the original fit (used for range checking).

  • fwhm_results – Result dictionary from fit_energy_res_curve(); modified in-place.

  • interp_energy_kev – Dict mapping labels to energies in keV at which to interpolate (e.g. {"Qbb": 2039.0}). None skips interpolation.

  • debug_mode – If True, re-raises exceptions instead of silently returning NaN.

Returns:

fwhm_results – The input dictionary with added interpolation entries.

plot_cal_fit(data, figsize=(12, 8), fontsize=12, erange=(200, 2700))

Plot the calibration curve and residuals.

Produces a two-panel figure: the upper panel shows the ADC peak locations vs. their known keV energies with the fitted polynomial overlaid; the lower panel shows the residuals in keV.

Parameters:
  • data – Unused; kept for API consistency with other plot methods.

  • figsize – Figure size (width, height) in inches.

  • fontsize – Font size for axis labels.

  • erange(low, high) energy range in keV for the x-axis.

Returns:

figmatplotlib.figure.Figure object.

plot_cal_fit_with_errors(data, figsize=(10, 6), fontsize=12, erange=(200, 2700))

Plot the calibration curve with peak-fit uncertainties.

Like plot_cal_fit() but shows error bars on the ADC peak locations derived from the peak-fit parameter uncertainties.

Parameters:
  • data – Unused; kept for API consistency with other plot methods.

  • figsize – Figure size (width, height) in inches.

  • fontsize – Font size for axis labels.

  • erange(low, high) energy range in keV for the x-axis.

Returns:

figmatplotlib.figure.Figure object.

plot_eres_fit(data, erange=(200, 2700), figsize=(12, 8), fontsize=12)

Plot the energy-resolution curve fits.

Shows measured FWHMs with error bars alongside the fitted FWHMLinear and FWHMQuadratic curves, plus the interpolated resolution at Qββ.

Parameters:
  • data – Unused; kept for API consistency with other plot methods.

  • erange(low, high) energy range in keV for the x-axis.

  • figsize – Figure size (width, height) in inches.

  • fontsize – Font size for axis labels.

Returns:

figmatplotlib.figure.Figure object.

plot_fits(energies, figsize=(12, 8), fontsize=12, ncols=3, nrows=3, binning_kev=5)

Plot the peak-shape fits for all fitted calibration peaks.

Produces a grid of subplots, one per fitted peak, showing the histogrammed data and the best-fit model.

Parameters:
  • energies – 1-D array of calibrated energy values.

  • figsize – Figure size (width, height) in inches.

  • fontsize – Font size for axis labels.

  • ncols – Number of subplot columns.

  • nrows – Number of subplot rows.

  • binning_kev – Histogram bin width in keV.

Returns:

figmatplotlib.figure.Figure object.

update_results_dict(results_dict)

Store results_dict in results under the caller’s function name.

If a key with that name already exists, a numeric suffix is appended to avoid overwriting previous results.

Parameters:

results_dict – Dictionary of results to store.

class pygama.pargen.energy_cal.TailPrior(data, model, tail_weight=0)

Bases: object

Iminuit cost-function mixin that adds a logarithmic tail-fraction prior.

When tail_weight > 0, the prior penalises small htail values, discouraging the fitter from driving the tail fraction to zero.

Parameters:
  • data – Data array (unused in the prior term itself; kept for interface compatibility with iminuit cost functions).

  • model – Peak model callable.

  • tail_weight – Strength of the prior; larger values impose a stronger penalty on small htail.

_call(*pars)

Delegate to __call__ (iminuit internal interface).

_value(*pars)

Delegate to __call__ (iminuit internal interface).

errordef = 0.5
verbose = 0
pygama.pargen.energy_cal.average_counts_check(hist, bins, var, threshold=1)
pygama.pargen.energy_cal.get_hpge_energy_bounds(func, parguess)
pygama.pargen.energy_cal.get_hpge_energy_fixed(func)

Get the fixed indexes for fitting and mask for parameters based on the given function.

Parameters:

func – The fit function for which the fixed indexes and mask are to be determined.

Returns:

  • fixed – List of parameter names to hold fixed during fitting.

  • mask – Boolean array; True for free parameters, False for fixed ones.

pygama.pargen.energy_cal.get_hpge_energy_peak_par_guess(energy, func, fit_range=None, bin_width=None, mode_guess=None)

Get parameter guesses for func fit to peak in hist

Parameters:
  • energy – Array of energy values in the range around the peak.

  • func – The fit function to guess parameters for.

  • fit_range – Range (lo, hi) around the peak; defaults to the full extent of energy.

  • bin_width – Width of the histogram bins. Default is 1.

  • mode_guess – Initial guess for the mode (mu) parameter; estimated from the data if not provided.

Returns:

guessesValueView of initial parameter guesses for func.

Notes

This function calculates parameter guesses for fitting a function to a peak in a histogram. It uses various methods to estimate the parameters based on the provided energy values and the selected function.

If the function is ‘gauss_on_step’, the following parameters will be estimated: - n_sig: Number of signal events in the peak. - mu: Mean of the peak. - sigma: Standard deviation of the peak. - n_bkg: Number of background events. - hstep: Height of the step between the peak and the background. - x_lo: Lower bound of the fit range. - x_hi: Upper bound of the fit range.

If the function is ‘hpge_peak’, the following parameters will be estimated: - n_sig: Number of signal events in the peak. - mu: Mean of the peak. - sigma: Standard deviation of the peak. - htail: Height of the tail component. - tau: Decay constant of the tail component. - n_bkg: Number of background events. - hstep: Height of the step between the peak and the background. - x_lo: Lower bound of the fit range. - x_hi: Upper bound of the fit range.

If the provided function is not implemented, an error will be raised.

Examples

>>> energy = [1, 2, 3, 4, 5]
>>> func = pgf.gauss_on_step
>>> fit_range = (2, 4)
>>> bin_width = 0.5
>>> mode_guess = 3.5
>>> get_hpge_energy_peak_par_guess(energy, func, fit_range, bin_width, mode_guess)
{'n_sig': 3, 'mu': 3.5, 'sigma': 0.5, 'n_bkg': 2, 'hstep': 0.5, 'x_lo': 2, 'x_hi': 4}
pygama.pargen.energy_cal.hpge_fit_energy_cal_func(mus, mu_vars, energies_kev, energy_scale_pars, deg=0, fixed=None)

Find best fit of E = poly(mus +/- sqrt(mu_vars)) This is an inversion of hpge_fit_energy_scale. E uncertainties are computed from mu_vars / dmu/dE where mu = poly(E) is the E_scale function

Parameters:
  • mus – Uncalibrated energies.

  • mu_vars – Variances in mus.

  • energies_kev – Reference energies to fit to, in keV.

  • energy_scale_pars – Parameters from the energy-scale fit (keV → ADC) used for computing uncertainties.

  • deg – Degree for the calibration polynomial. deg=0 fits a simple scaling; otherwise follows the np.polyfit() convention.

  • fixed – Dict whose keys are polynomial-parameter indices to fix and values are the fixed values (None fixes at the current guess value).

Returns:

  • pars – Best-fit parameters following the np.polyfit() convention.

  • cov – Covariance matrix for the best-fit parameters.

pygama.pargen.energy_cal.hpge_fit_energy_peak_tops(hist, bins, var, peak_locs, n_to_fit=7, cost_func='Least Squares', inflate_errors=False, gof_method='var', debug_mode=False)

Fit gaussians to the tops of peaks

Parameters:
  • hist – Histogram of uncalibrated energies, see pgh.get_hist().

  • bins – Bin edges corresponding to hist.

  • var – Variances of each histogram bin.

  • peak_locs – Locations of peaks in hist; must be accurate to within ± 2*n_to_fit bins.

  • n_to_fit – Number of histogram bins near the peak top to include in the Gaussian fit.

  • cost_func – Cost function flag passed to gauss_mode_width_max().

  • inflate_errors – Error inflation flag passed to gauss_mode_width_max().

  • gof_method – Goodness-of-fit method flag passed to gauss_mode_width_max().

Returns:

  • pars_list – List of best-fit parameter arrays (mode, sigma, max) for each peak-top fit.

  • cov_list – List of covariance matrices corresponding to each entry in pars_list.

pygama.pargen.energy_cal.hpge_fit_energy_scale(mus, mu_vars, energies_kev, deg=0, fixed=None)

Find best fit of poly(E) = mus +/- sqrt(mu_vars) Compare to hpge_fit_energy_cal_func which fits for E = poly(mu)

Parameters:
  • mus – Uncalibrated energies.

  • mu_vars – Variances in mus.

  • energies_kev – Reference energies to fit to, in keV.

  • deg – Degree for the energy scale fit. deg=0 fits a simple scaling mu = scale * E; otherwise follows the np.polyfit() convention.

  • fixed – Dict whose keys are polynomial-parameter indices to fix and values are the fixed values (None fixes at the current guess value).

Returns:

  • pars – Best-fit parameters following the np.polyfit() convention.

  • errs – Parameter errors (square root of diagonal covariance entries).

  • cov – Covariance matrix for the best-fit parameters.

pygama.pargen.energy_cal.poly_match(xx, yy, deg=-1, rtol=1e-05, atol=1e-08, fixed=None)

Find the polynomial function best matching pol(xx) = yy

Finds the poly fit of xx to yy that obtains the most matches between pol(xx) and yy in the np.isclose() sense. If multiple fits give the same number of matches, the fit with the best gof is used, where gof is computed only among the matches. Assumes that the relationship between xx and yy is monotonic

Parameters:
  • xx – Domain data array. Must be sorted ascending; must satisfy len(xx) >= len(yy).

  • yy – Range data array: values to which poly(xx) will be compared. Must be sorted ascending; must satisfy len(yy) > max(2, deg+2).

  • deg – Degree of the polynomial. deg=0 fits a simple scaling scale * xx = yy; deg=-1 fits a simple shift xx + shift = yy; otherwise equivalent to the deg argument of np.polyfit().

  • rtol – Relative tolerance passed to np.isclose().

  • atol – Absolute tolerance passed to np.isclose() (same units as yy).

Returns:

  • pars – Best-fit parameters of poly(xx) = yy in np.polyfit() convention, or None when inputs are invalid.

  • i_matches – List of indices in xx for the matched values in the best match.

pygama.pargen.energy_cal.poly_wrapper(x, *pars)
pygama.pargen.energy_cal.sum_bins(hist, bins, var, threshold=5)
pygama.pargen.energy_cal.unbinned_staged_energy_fit(energy, func, gof_range=None, fit_range=None, guess=None, guess_func=<function get_hpge_energy_peak_par_guess>, bounds_func=<function get_hpge_energy_bounds>, fixed_func=<function get_hpge_energy_fixed>, guess_kwargs=None, bounds_kwargs=None, fixed_kwargs=None, tol=None, tail_weight=0, allow_tail_drop=True, bin_width=None, lock_guess=False, p_val_threshold=1e-19, display=0)

Unbinned fit to energy. This is different to the default fitting as it will try different fitting methods and choose the best. This is necessary for the lower statistics.

pygama.pargen.energy_optimisation module

This module contains the functions for performing the energy optimisation. This happens in 2 steps, firstly a grid search is performed on each peak separately using the optimiser, then the resulting grids are interpolated to provide the best energy resolution at Qbb

pygama.pargen.energy_optimisation.fom_fwhm_no_alpha_sweep(tb_in, kwarg_dict, ctc_param=None, alpha=0, idxs=None, frac_max=0.5, kev=True, display=0)

Figure-of-merit: FWHM at a fixed (or pre-computed) alpha, no sweep.

Applies a single drift-time correction with the given alpha and fits the peak, returning a comprehensive set of fit quality metrics. Used when the optimal alpha is already known (e.g. from a prior fom_fwhm_with_alpha_fit() call) or when no charge-trapping correction is desired.

Parameters:
  • tb_in – LH5 table containing the energy and optional drift-time columns.

  • kwarg_dict – Per-peak fitting options with at minimum the keys parameter (energy column name), func, peak, and kev_width. Optional keys: alpha (overrides the alpha argument), ctc_param (drift-time column name).

  • ctc_param – Name of the charge-trapping correction column; overridden by kwarg_dict["ctc_param"] if present.

  • alpha – Charge-trapping correction coefficient; overridden by kwarg_dict["alpha"] if present.

  • idxs – Optional index array to select a subset of events.

  • frac_max – Fractional height used to define the FWHM.

  • kev – If True, return the FWHM in keV rather than ADC units.

  • display – Verbosity level; values > 0 produce diagnostic plots.

Returns:

out_dict – Dictionary with keys fwhm, fwhm_o_max, fwhm_err, fwhm_o_max_err, chisquare, n_sig, n_sig_err, mu, mu_err, and fit_pars. All values are np.nan if the input energies contain NaNs.

pygama.pargen.energy_optimisation.fom_fwhm_with_alpha_fit(tb_in, kwarg_dict, ctc_parameter, nsteps=11, idxs=None, frac_max=0.2, display=0)

Figure-of-merit: FWHM minimised over a sweep of charge-trapping correction values.

Scans nsteps values of the charge-trapping coefficient alpha between 0 and 3.5x10^-6, fitting the peak at each step via get_peak_fwhm_with_dt_corr(). A degree-4 polynomial is fit to the valid FWHM/max-ratio values to locate the optimal alpha, and the peak is re-fit at that alpha to obtain the final FWHM in keV. An early termination heuristic stops the sweep when the FWHM curve is clearly rising.

Parameters:
  • tb_in – LH5 table containing the energy and drift-time columns.

  • kwarg_dict – Per-peak fitting options with at minimum the keys parameter (energy column name), func (peak shape), peak (keV), and kev_width (fit window half-widths). Optional key bin_width sets the histogram bin width (default 1 ADC).

  • ctc_parameter – Name of the charge-trapping correction parameter column in tb_in.

  • nsteps – Number of alpha values to scan.

  • idxs – Optional boolean or integer index array to select a subset of events.

  • frac_max – Fractional height used to define the final FWHM.

  • display – Verbosity level; values > 0 produce diagnostic plots.

Returns:

out_dict – Dictionary with keys fwhm, fwhm_err, alpha, alpha_err, chisquare, n_sig, and n_sig_err. All values are np.nan (and alpha is 0) on failure.

pygama.pargen.energy_optimisation.fom_interpolate_energy_res_with_single_peak_alpha_sweep(data, kwarg_dict, display=0)

Figure-of-merit: energy resolution interpolated to a target energy using a multi-peak sweep with a shared alpha from the highest-energy peak.

The charge-trapping correction parameter alpha is determined from the last (highest-energy) peak via fom_fwhm_with_alpha_fit(); this alpha is then applied to all remaining peaks via fom_fwhm_no_alpha_sweep(). The resulting FWHM vs. energy curve is fit with fwhm_func and interpolated to each energy in interp_energy.

Parameters:
  • data – DataFrame containing energy and correction parameters for all events.

  • kwarg_dict

    Dictionary with keys:

    • peaks_kev - list of peak energies in keV, ordered from low to # noqa: RUF002 high. The last entry is used for the alpha sweep.

    • idx_list - list of event-index arrays, one per peak. # noqa: RUF002

    • ctc_param - name of the charge-trapping correction parameter. # noqa: RUF002

    • peak_dicts - list of per-peak fitting dictionaries. # noqa: RUF002

    • interp_energy (optional, default ``{“Qbb”: 2039}``) - dict # noqa: RUF002 mapping energy label to keV value for interpolation.

    • fwhm_func (optional, default pgc.FWHMLinear ) - FWHM # noqa: RUF002 curve model used for the energy-resolution fit.

    • frac_max (optional, default 0.2) - fraction of peak maximum # noqa: RUF002 used to define fit range.

  • display – Verbosity / plotting level passed through to the underlying fits.

Returns:

results – Result dictionary containing interpolated FWHM value(s), e.g. {"Qbb_fwhm": ..., "Qbb_fwhm_err": ..., "alpha": ..., "peaks": ..., "fwhms": ..., "fwhm_errs": ..., "n_sig": ..., "n_sig_err": ...}. Returns (nan, nan, nan) if fewer than two valid FWHM values are available.

Return type:

dict | tuple

pygama.pargen.energy_optimisation.fom_single_peak_alpha_sweep(data, kwarg_dict, display=0)

Figure-of-merit wrapper: FWHM with alpha (charge-trapping correction) sweep for a single calibration peak.

Thin adapter around fom_fwhm_with_alpha_fit() that unpacks the standardised kwarg_dict interface expected by the optimisation framework.

Parameters:
  • data – DataFrame containing the energy and charge-trapping correction parameters for all events.

  • kwarg_dict

    Dictionary with keys:

    • idx_list - list of event-index arrays, one per peak. Only the # noqa: RUF002 first entry (idx_list[0]) is used.

    • ctc_param - name of the charge-trapping correction parameter. # noqa: RUF002

    • peak_dicts - list of per-peak fitting dictionaries. Only the # noqa: RUF002 first entry is used.

    • frac_max (optional, default 0.2) - fraction of the peak # noqa: RUF002 maximum used to define the fit range.

  • display – Verbosity / plotting level passed through to the underlying fit.

Returns:

out_dict – Result dictionary from fom_fwhm_with_alpha_fit() containing at minimum fwhm, fwhm_err, alpha, n_sig, and n_sig_err.

Return type:

dict

pygama.pargen.energy_optimisation.get_peak_fwhm_with_dt_corr(energies, alpha, dt, func, peak, kev_width, guess=None, kev=False, frac_max=0.5, bin_width=1, allow_tail_drop=False, display=0)

Apply a drift-time correction and fit a peak, returning FWHM and fit quality.

Computes ct_energy = energy + alpha * dt * energy, then performs an unbinned staged fit of func to the corrected spectrum inside a window around the peak. Bootstrap resampling is used to estimate the uncertainty on the FWHM. All return values are np.nan / None when the fit fails.

Parameters:
  • energies – 1-D array of raw (uncorrected) energy values.

  • alpha – Charge-trapping correction coefficient.

  • dt – 1-D array of drift-time values, same length as energies.

  • func – Peak-shape distribution to fit.

  • peak – Known peak energy in keV (used for ADC-to-keV conversion).

  • kev_width(low_side, high_side) half-widths in keV that define the fit window relative to the peak centroid.

  • guess – Optional initial parameter guess; forwarded to unbinned_staged_energy_fit().

  • kev – If True, convert the returned FWHM from ADC to keV.

  • frac_max – Fractional height at which to evaluate the peak width (default 0.5 gives the FWHM).

  • bin_width – Histogram bin width in ADC counts.

  • allow_tail_drop – Passed through to the staged fit; allows the tail fraction to drop to zero.

  • display – Verbosity level; values > 0 produce diagnostic plots.

Returns:

  • fwhm – Full-width at frac_max of the maximum in keV (or ADC if kev is False).

  • fwhm_o_max – Ratio of FWHM to peak maximum (shape quality metric).

  • fwhm_err – Bootstrap uncertainty on fwhm.

  • fwhm_o_max_err – Bootstrap uncertainty on fwhm_o_max.

  • chisqr(chi2, ndof) reduced chi-squared tuple from the staged fit.

  • n_sig – Fitted number of signal events.

  • n_sig_err – Uncertainty on n_sig.

  • mu – Fitted peak centroid in ADC.

  • mu_err – Uncertainty on mu.

  • energy_pars – Full best-fit parameter values from the staged fit.

pygama.pargen.energy_optimisation.simple_guess(energy, func, fit_range=None, bin_width=None)

Generate a simple initial parameter guess for a peak fit.

Estimates the peak centroid, width, signal and background counts, and (for HPGe-style models) tail parameters from the histogram of energy. The bin width is chosen adaptively using an inter-quartile-range rule when not provided explicitly.

Parameters:
  • energy – 1-D array of energy values.

  • func – Distribution function to guess for; currently supports hpge_peak and gauss_on_step.

  • fit_range(low, high) energy window. Defaults to the full range of energy.

  • bin_width – Histogram bin width. When None an optimal width is estimated from the data.

Returns:

parguessiminuit.Values of initial parameter values, or the NaN-filled fallback from return_nans() if func is not supported.

pygama.pargen.lq_cal module

Calibration routines for the Late Charge (LQ) parameter in HPGe detectors.

LQ is computed from the ratio of the late-charge integral to the total charge and is sensitive to multi-site events. This module provides functions to

  • determine the fit range and histogram the LQ distribution,

  • fit the distribution with a Gaussian model,

  • perform a drift-time correction for the time-dependent LQ shift,

  • calculate the LQ cut value from the DEP (double-escape peak) survival fraction,

  • compute survival fractions as a function of LQ cut position,

  • produce diagnostic plots.

The main entry point is the LQCal class.

class pygama.pargen.lq_cal.LQCal(cal_dicts, cal_energy_param, dt_param, eres_func, cdf=<pygama.math.functions.gauss.GaussianGen object>, selection_string='is_valid_cal&is_not_pulser', debug_mode=False)

Bases: object

A class for calibrating the LQ parameter and determining the LQ cut value

Parameters:
  • cal_dicts (dict) – Hit-level calibration dictionary of expressions/parameters to apply to the data. May be keyed by run timestamp.

  • cal_energy_param (str) – Name of the calibrated energy parameter.

  • dt_param (str) – Name of the drift-time parameter.

  • eres_func (callable) – Callable that returns the energy resolution (FWHM) in keV for a given energy.

  • cdf (callable) – CDF callable used for the binned Gaussian fits.

  • selection_string (str) – Boolean expression selecting valid, non-pulser events.

  • debug_mode – If True, exceptions are re-raised instead of being caught and logged.

calibrate(df, initial_lq_param)

Run the full LQ calibration pipeline.

Executes the three calibration steps in sequence:

  1. lq_timecorr() — normalise by the time-varying DEP mean.

  2. drift_time_correction() — remove the linear drift-time dependence.

  3. get_cut_lq_dep() — fit the normalised DEP distribution and set the cut value.

Survival fractions at the DEP (1592.5 keV), Qββ (2039 keV), 2103.53 keV, and 2614.5 keV are then computed and stored in low_side_sf and low_side_peak_dfs.

Parameters:
  • df – DataFrame modified in-place; must contain initial_lq_param and all columns referenced by cal_energy_param, dt_param, and selection_string.

  • initial_lq_param – Name of the raw LQ parameter column in df.

drift_time_correction(df, lq_param, cal_energy_param, display=0)

Remove the linear drift-time dependence from the LQ distribution.

Fits a degree-1 polynomial to LQ vs. drift time for DEP events in a 6 keV window around 1592.5 keV. Subtracts the fitted slope and intercept to produce a corrected LQ_Corrected column centred at zero, and writes the calibration expression into cal_dicts.

Parameters:
  • df (pd.DataFrame()) – DataFrame modified in-place; must contain lq_param, the calibrated energy column, and the drift-time column.

  • lq_param – Name of the LQ parameter column to correct.

  • cal_energy_param (str) – Name of the calibrated energy column.

  • display (int) – Verbosity level (currently unused).

get_cut_lq_dep(df, lq_param, cal_energy_param)

Determine the LQ cut value from the DEP distribution.

Fits a Gaussian to the sideband-subtracted LQ distribution at the DEP (1592.5 keV). The LQ values are normalised by the fitted σ to produce LQ_Classifier, and a fixed cut at 3 σ is applied (LQ_Cut). Both columns are added to df and the corresponding calibration expressions are written into cal_dicts.

Parameters:
  • df (pd.DataFrame()) – DataFrame modified in-place; must contain lq_param and cal_energy_param.

  • lq_param (str) – Name of the (drift-time-corrected) LQ parameter column.

  • cal_energy_param (str) – Name of the calibrated energy column.

lq_timecorr(df, lq_param, output_name='LQ_Timecorr', display=0)

Normalise LQ by the time-varying DEP mean.

Fits the LQ distribution at the DEP (1592.5 keV) for each run_timestamp group (or globally when no timestamp column is present) and divides lq_param by the resulting weighted-average mean. The normalised column is stored as output_name in df and the corresponding calibration expression is written into cal_dicts.

Parameters:
  • df – DataFrame modified in-place; must contain lq_param and the calibrated energy column. If a run_timestamp column is present, the correction is applied per timestamp.

  • lq_param – Name of the raw LQ parameter column.

  • output_name – Name of the output time-corrected LQ column.

  • display – Verbosity level (currently unused).

update_cal_dicts(update_dict)

Merge new entries into the calibration dictionary.

If cal_dicts is keyed by run timestamps, each timestamp’s sub-dict is updated individually, using update_dict directly as a fallback when a timestamp is absent. Otherwise update_dict is merged directly.

Parameters:

update_dict – Dictionary of new calibration entries to merge.

pygama.pargen.lq_cal.binned_lq_fit(df, lq_param, cal_energy_param, peak, _cdf=<pygama.math.functions.gauss.GaussianGen object>, sidebands=True)
Function for fitting a distribution of LQ values within a specified

energy peak. Fits a gaussian to the distribution

Parameters:
  • df (DataFrame) – DataFrame containing the LQ parameter and calibrated energy.

  • lq_param (str) – Name of the LQ parameter column.

  • cal_energy_param (str) – Name of the calibrated energy column.

  • peak (float) – Energy value in keV of the peak whose LQ distribution will be fit.

  • cdf – CDF callable used for the binned fit.

  • sidebands (bool) – Whether to apply sideband subtraction when building the histogram.

Returns:

  • values – Best-fit parameter values (mu, sigma) from the Gaussian fit.

  • errors – Parameter uncertainties corresponding to values.

  • hist – Histogram counts used for the binned fit.

  • bins – Bin-edge array used for the binned fit.

pygama.pargen.lq_cal.calculate_time_means(df, lq_param, cal_energy_param, peak, sidebands=True)

Compute the arithmetic mean and standard deviation of LQ at a peak.

Selects events in a ±8/+5 keV window around peak, restricts to the ±2.5σ fit range returned by get_fit_range(), and returns the sample mean and standard deviation together with their statistical uncertainties.

Parameters:
  • df (DataFrame) – DataFrame containing the LQ and calibrated energy columns.

  • lq_param (str) – Name of the LQ parameter column.

  • cal_energy_param (str) – Name of the calibrated energy column.

  • peak (float) – Peak energy in keV.

  • sidebands (bool) – Unused; kept for signature consistency with binned_lq_fit().

Returns:

  • pars – Dictionary {"mu": mean, "sigma": std} of the LQ distribution.

  • errors – Dictionary {"mu": mean_err, "sigma": sigma_err} of the statistical uncertainties.

pygama.pargen.lq_cal.fit_time_means(tstamps, means, reses)

Compute a running weighted-average LQ mean across run timestamps.

Accumulates valid (non-NaN) mean/resolution pairs and stores the weighted-average mean (weighted by inverse resolution) for every accumulated timestamp. Timestamps with NaN values in either means or reses are assigned NaN in the output.

Parameters:
  • tstamps – Ordered sequence of run-timestamp strings.

  • means – Array of per-timestamp LQ mean values; may contain NaN.

  • reses – Array of per-timestamp LQ resolutions (sigma/mu) used as inverse weights; may contain NaN.

Returns:

out_dict – Dictionary mapping each timestamp to the cumulative weighted- average LQ mean up to that point, or np.nan for invalid entries.

pygama.pargen.lq_cal.get_fit_range(lq)

Determine a ±2.5σ fit range around the mode of an LQ distribution.

Uses a 100-bin histogram over the 1st–95th percentile range to locate the peak centroid and estimate the width, then returns a ±2.5σ window suitable for a subsequent Gaussian fit.

Parameters:

lq (np.array) – 1-D array of LQ parameter values.

Returns:

fit_range(left_edge, right_edge) tuple defining the ±2.5σ window around the estimated peak centroid.

Return type:

tuple(float, float)

pygama.pargen.lq_cal.get_lq_hist(df, lq_param, cal_energy_param, peak, sidebands=True)

Histogram the LQ distribution for events near a calibration peak.

Optionally performs sideband subtraction to isolate peak-associated events from the Compton continuum. The peak window is (peak - 8, peak + 5) keV and the sideband window is (peak + 7, peak + 20) keV.

Parameters:
  • df (pd.DataFrame()) – DataFrame containing both the LQ parameter and calibrated energy.

  • lq_param (str) – Name of the LQ parameter column.

  • cal_energy_param (str) – Name of the calibrated energy column.

  • peak (float) – Peak energy in keV around which to select events.

  • sidebands (bool) – If True (default), subtract a sideband histogram to remove the Compton background contribution.

Returns:

  • hist – Counts array (sideband-subtracted when sidebands is True).

  • bins – Bin-edge array.

  • var – Per-bin variance (combined Poisson uncertainty).

pygama.pargen.lq_cal.plot_classifier(lq_class, data, lq_param='LQ_Classifier', xrange=(800, 3000), yrange=(-10, 30), xn_bins=700, yn_bins=500, figsize=(12, 8), fontsize=12)
Return type:

figure

pygama.pargen.lq_cal.plot_drift_time_correction(lq_class, data, lq_param='LQ_Timecorr', figsize=(12, 8), fontsize=12)

Plots a 2D histogram of LQ versus effective drift time in a 6 keV window around the DEP. Additionally plots the fit results for the drift time correction.

Return type:

figure

pygama.pargen.lq_cal.plot_lq_cut_fit(lq_class, data, figsize=(12, 8), fontsize=12)

Plots the final histogram of LQ values for events in the DEP, and the fit results used for determining the cut value

Return type:

figure

pygama.pargen.lq_cal.plot_lq_mean_time(lq_class, data, lq_param='LQ_Timecorr', figsize=(12, 8), fontsize=12)

Plots the mean LQ value calculated for each given timestamp

Return type:

figure

pygama.pargen.lq_cal.plot_sf_vs_energy(lq_class, data, xrange=(900, 3000), n_bins=701, figsize=(12, 8), fontsize=12)

Plots the survival fraction as a function of energy

Return type:

figure

pygama.pargen.lq_cal.plot_spectra(lq_class, data, xrange=(900, 3000), n_bins=2101, xrange_inset=(1580, 1640), n_bins_inset=200, figsize=(12, 8), fontsize=12)

Plots a 2D histogram of the LQ classifier vs calibrated energy

Return type:

figure

pygama.pargen.lq_cal.plot_survival_fraction_curves(lq_class, data, figsize=(12, 8), fontsize=12)

Plots the survival fraction curves as a function of LQ cut values for every peak of interest

Return type:

figure

pygama.pargen.noise_optimization module

This module contains the functions for performing the filter optimisation. This happens with a grid search performed on ENC peak.

pygama.pargen.noise_optimization.calculate_spread(energies, percentile_low, percentile_high, n_samples)

Estimate the inter-percentile spread of an energy distribution via bootstrapping.

Repeatedly resamples the input array with replacement and computes the difference between percentile_high and percentile_low for each bootstrap replicate. The mean and standard error of the bootstrap distribution are returned as the figure-of-merit and its uncertainty.

Parameters:
  • energies – 1-D array of energy values.

  • percentile_low – Lower percentile (0-100) used to define the spread.

  • percentile_high – Upper percentile (0-100) used to define the spread.

  • n_samples – Number of bootstrap resamples to draw.

Returns:

results – Dictionary with keys:

  • fom - mean inter-percentile spread across bootstrap samples.

  • fom_err - standard error of the mean spread.

Return type:

dict

pygama.pargen.noise_optimization.noise_optimization(tb_data, dsp_proc_chain, par_dsp, opt_dict, _lh5_path, display=0)

Calculate the optimal noise-filter parameter.

Parameters:
  • tb_data (Table) – Raw LH5 table to run the DSP macro on.

  • dsp_proc_chain (dict) – Minimal DSP config dictionary.

  • par_dsp (dict) – Dictionary with default DSP parameters.

  • opt_dict (dict) – Dictionary with parameters for the optimisation.

  • _lh5_path (str) – Name of the channel to process (LH5 group name in raw files).

  • display (int) – Plot verbosity level.

Returns:

res_dict – Dictionary of optimisation results.

Return type:

dict

pygama.pargen.noise_optimization.simple_gaussian_fit(energies, dx=1, sigma_thr=4, allowed_p_val=1e-20)

Fit a Gaussian-on-uniform model to an ENC (noise) peak and return the FWHM.

An initial guess from simple_gaussian_guess() is used to define a sigma-clipped fit range. The unbinned fit is accepted if the covariance is well-defined and the goodness-of-fit p-value exceeds allowed_p_val; otherwise the guess values are returned with zero uncertainties.

Parameters:
  • energies – 1-D array of energy (or ENC) values.

  • dx – Histogram bin width used for initial guessing and goodness-of-fit evaluation.

  • sigma_thr – Number of sigma around the estimated mean used to clip the fit range.

  • allowed_p_val – Minimum chi-squared p-value for the fit to be considered successful.

Returns:

results – Result dictionary with keys:

  • pars - fit parameter array.

  • errors - fit parameter uncertainties.

  • covariance - parameter covariance matrix.

  • mu - peak centre.

  • mu_err - uncertainty on the peak centre.

  • fom - FWHM of the peak.

  • fom_err - uncertainty on the FWHM.

  • chisq - reduced chi-squared of the fit.

  • p_val - goodness-of-fit p-value.

Return type:

dict

pygama.pargen.noise_optimization.simple_gaussian_guess(hist, bins, func, toll=0.2)

Generate an initial parameter guess and bounds for a Gaussian-on-uniform fit.

Estimates the peak centre from the histogram maximum and the width from the FWHM of the peak. The signal count is estimated by integrating the histogram over a +-2 sigma window around the peak.

Parameters:
  • hist – 1-D array of histogram counts.

  • bins – 1-D array of bin edges (length len(hist) + 1).

  • func – Probability distribution object whose parameter names are used to construct the output dictionaries (must expose mu, sigma, and n_sig parameters).

  • toll – Fractional tolerance used to set symmetric bounds around each guess value.

Returns:

  • guess – Initial parameter values: {"mu": ..., "sigma": ..., "n_sig": ...}.

  • bounds – Allowed parameter ranges: {"mu": (lo, hi), "sigma": (lo, hi), "n_sig": (lo, hi)}.

Return type:

tuple

pygama.pargen.pz_correct module

This module is for extracting the pole zero constants from the decay tail

class pygama.pargen.pz_correct.PZCorrect(dsp_config, wf_field, debug_mode=False)

Bases: object

Extract pole-zero correction constants from HPGe waveform data.

Provides methods for determining single- and double-pole-zero (DPZ) decay constants by fitting the exponential tail of raw waveforms. Results are accumulated in output_dict in a format suitable for direct use as a DSP parameter database.

Parameters:
  • dsp_config – DSP processing chain configuration dictionary.

  • wf_field – Name of the waveform field in the input lgdo.Table.

  • debug_mode – If True, additional diagnostic information is retained.

get_dpz_decay_constants(tb_data, percent_tau1_fit=0.1, percent_tau2_fit=0.2, offset_from_wf_max=10, superpulse_bl_idx=25, superpulse_window_width=13, display=0)

Gets values for the DPZ time constants in 3 stages:

  1. Perform a linear fit to the start and end of the decaying tail of a superpulse

  2. Use those initial guesses to seed a LSQ fit to a DPZ model of the sum of two decaying exponentials

  3. Use the results of the model fit as initial guesses in a DSP routine that optimizes the flatness of the decaying tail

The first step in this process is to generate a superpulse from high-energy waveforms.

Parameters:
  • tb_data (Table) – Table containing high-energy event waveforms and daqenergy. Pulser events must have already been removed from this table

  • percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.

  • percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.

  • offset_from_wf_max (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.

  • superpulse_bl_idx (int) – The index at which to stop when computing the mean value of a baseline, used for DPZ

  • superpulse_window_width (int) – The window of acceptance for tp100s while selecting waveforms to make a superpulse, with a center set at the median tp100 value. If a tp100 falls outside this window, ignore this waveform for the superpulse

  • display (int) – An integer. If greater than 1, plots and shows the attempts to fit the long and short time constants. If greater than 0, saves the plot to a dictionary.

Returns:

  • tau_dict – dictionary of form {‘dpz’: {‘tau1’: decay_constant1, ‘tau2’: decay_constant2, ‘frac’: fraction}}

  • out_plot_dict – A dictionary containing monitoring plots

Return type:

dict

Notes

tau1 is the shorter time constant, tau2 is the longer, and frac is the amount of the larger time constant present in the sum of the two exponentials

get_single_decay_constant(tb_data, slope_param='tail_slope', display=0)

Determine the single pole-zero decay constant from the tail slope.

Runs the DSP chain on tb_data, extracts the slope_param column, estimates the mode and standard deviation of the slope distribution, and converts the modal value to a time constant tau = -1/mode. The # noqa: RUF002 result is stored in output_dict under the "pz" key.

Parameters:
  • tb_data (Table) – LH5 table containing the raw waveforms and any parameters needed by dsp_config.

  • slope_param – Name of the tail-slope output column produced by the DSP chain.

  • display – If > 0, return a plot dictionary; otherwise return None.

Returns:

out_plot_dict – Dictionary of diagnostic figures (empty if display > 0), or None when display ≤ 0.

plot_slopes(tb_data, slope_field, with_correction=False, display=0, figsize=(8, 6), fontsize=8)

Plot a histogram of waveform tail slopes before or after PZ correction.

Useful for visually assessing whether the fitted decay constant successfully centres the slope distribution at zero.

Parameters:
  • tb_data – Raw waveform table (lgdo.Table) to process.

  • slope_field – Name of the tail-slope field in the DSP output table.

  • with_correction – If True, apply the current output_dict PZ parameters before computing slopes. If False, run the DSP chain without any correction.

  • display – If > 1, call plt.show(); if ≤ 1, close the figure after saving it to the returned dictionary.

  • figsize – Figure size (width, height) in inches.

  • fontsize – Base font size for the plot.

Returns:

out_plot_dict – Dictionary {"slopes": fig} or {"corrected_slope": fig} containing the matplotlib figure.

Return type:

dict

plot_waveforms_after_correction(tb_data, wf_field, xlim=(0, 1024), ylim=None, norm_param=None, display=0, figsize=(8, 6), fontsize=8, n_waveforms=100, downsample=1)

Plot a random selection of waveforms after applying the PZ correction.

Runs the DSP chain with the current output_dict parameters so that the corrected waveform field (wf_field) reflects the fitted pole-zero constants.

Parameters:
  • tb_data – Raw waveform table (lgdo.Table) to process.

  • wf_field – Name of the corrected waveform field in the DSP output table.

  • xlim – x-axis limits (lo, hi) in samples.

  • ylim – y-axis limits (lo, hi) in ADU, or None for auto-scaling.

  • norm_param – If given, each waveform is divided by the value of this DSP output parameter so that traces are displayed on a normalised scale.

  • display – If > 1, call plt.show(); if ≤ 1, close the figure after saving it to the returned dictionary.

  • figsize – Figure size (width, height) in inches.

  • fontsize – Base font size for the plot.

  • n_waveforms – Number of waveforms to draw (sampled randomly).

  • downsample – Plot every downsample-th sample to reduce rendering load.

Returns:

plot_dict – Dictionary {"waveforms": fig} containing the matplotlib figure.

Return type:

dict

pygama.pargen.pz_correct.dpz_model(ts, amp, tau1, tau2, f2)

Models the double pole zero function as A*[(1-f2)*exp(-t/tau1) + f2*exp(-t/tau2)], this is the expected shape of an HPGe waveform. Used in performing fits in dpz_model_fit()

Parameters:
  • ts (array) – Array of time point values, in samples

  • A – The overall amplitude of the waveform

  • tau1 (float) – The short time constant present

  • tau2 (float) – The long time constant present

  • f2 (float) – The fraction of the long time constant present in the overall waveform

Return type:

array

pygama.pargen.pz_correct.dpz_model_fit(waveform, percent_tau1_fit, percent_tau2_fit, idx_shift, plot)
Parameters:
  • waveform (array) – An array containing waveform data that has been baseline subtracted

  • percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.

  • percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.

  • idx_shift (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.

  • plot (int) – An integer, that if greater than 1 plots and shows the fit results, if greater than 0 saves the plots to a dictionary

Returns:

  • tau1 – A guess of the shorter double pole zero time constant

  • tau2 – A guess of the longer double pole zero time constant

  • frac2 – A guess of the fraction in the double pole zero of the longer time constant

  • out_plot_dict – A dictionary containing the output plots if requested

Return type:

tuple[float, float, float, dict]

Notes

Extracts the double pole zero parameters by fitting to an analytic model, utilizing the best-guess from linear fitting. Fits to the function f(t) = A*[(1-f2)*exp(-t/tau1)+f2*exp(-t/tau2)] It fits from the maximum of the waveform onwards to the end of the tail. The waveform must be baseline subtracted in order to get the best fit results

pygama.pargen.pz_correct.linear_dpz_fit(waveform, percent_tau1_fit, percent_tau2_fit, idx_shift, plot)
Parameters:
  • waveform (array) – An array containing waveform data

  • percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.

  • percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.

  • idx_shift (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.

  • plot (int) – An integer, if greater than 1 plots and shows the fit results, if greater than 0 saves the plot to a dictionary.

Returns:

  • tau1 – A guess of the shorter double pole zero time constant

  • tau2 – A guess of the longer double pole zero time constant

  • frac – A guess of the fraction in the double pole zero of the longer time constant

  • out_plot_dict – A dictionary containing the output plots if requested

Return type:

tuple[float, float, float, dict]

Notes

Extracts an initial guess of the double pole zero parameters by performing two linear fits to the log of the waveform. The first log-fit is to long time scales, and fits the long time constant tau2. This piece is subtracted off the waveform, and then the shorter time constant is fit. The average of the fractions returned by the intercepts of these fits is returned.

The waveform must be baseline subtracted to provide accurate fit results!

pygama.pargen.pz_correct.tp100_align(wfs, tp100_window_width, tp100s)

Align provided waveforms at their maximum, so that an average over all the waveforms creates a valid superpulse to fit using dpz_model_fit(). Do this without sacrificing too much of the length of the decaying tail

Parameters:
  • wfs (array) – An array of arrays containing waveforms

  • tp100_window_width (int) – The window of acceptance, with a center set at the median tp100 value. If a tp100 falls outside this window, ignore this waveform for the superpulse

  • tp100s (array) – An array containing the indices of the maximums of the waveforms, in samples

Returns:

time_aligned_wfs – An array of waveforms that are all aligned at their maximal values

Return type:

array

pygama.pargen.survival_fractions module

This module provides functions for calculating survival fractions for a cut.

pygama.pargen.survival_fractions.compton_sf(cut_param, low_cut_val, high_cut_val=None, mode='greater', data_mask=None)

Function for calculating the survival fraction of a cut in a compton region with a simple counting analysis.

Parameters:
  • cut_param – Array of cut parameter values for all events.

  • low_cut_val – Lower threshold value of cut_param.

  • high_cut_val – Upper threshold value; when set, events in (low_cut_val, high_cut_val) are counted as passing.

  • mode – Direction of the cut when high_cut_val is None: "greater" counts events above low_cut_val; "less" counts events below it.

  • data_mask – Boolean mask pre-applied to cut_param. Defaults to all-True.

Returns:

sf – Dictionary with keys low_cut, sf (survival fraction in percent), sf_err (statistical uncertainty in percent), and high_cut.

Return type:

dict

pygama.pargen.survival_fractions.compton_sf_sweep(energy, cut_param, final_cut_value, data_mask=None, cut_range=(-5, 5), n_samples=51, mode='greater')

Function sweeping through cut values and calculating the survival fraction for each value using a simple counting analysis.

Parameters:
  • energy (np.array) – Array of calibrated energies (unused in counting analysis, kept for API consistency with get_sf_sweep()).

  • cut_param (np.array) – Array of cut parameter values; must have the same length as energy.

  • final_cut_value (float) – Cut threshold at which to evaluate and return the final survival fraction.

  • data_mask (np.array) – Boolean mask applied before counting. Defaults to all-True.

  • cut_range(low, high) interval of cut values to sweep over.

  • n_samples – Number of evenly-spaced cut values within cut_range.

  • mode – Direction of the cut: "greater" or "less".

Returns:

  • out_df – DataFrame indexed by cut value with columns sf and sf_err giving the survival fraction and its uncertainty at each sweep point.

  • sf – Survival fraction in percent at final_cut_value.

  • err – Uncertainty on sf in percent.

Return type:

tuple(pd.DataFrame, float, float)

pygama.pargen.survival_fractions.energy_guess(energy, func_i, fit_range=None, bin_width=1, peak=None, eres=None)

Generate an initial parameter guess for a peak fit.

Delegates to pygama.pargen.energy_cal.get_hpge_energy_peak_par_guess() for supported distribution types. Currently only hpge_peak and gauss_on_step are supported.

Parameters:
  • energy – 1-D array of energy values.

  • func_i – Distribution function for which to compute the guess (e.g. hpge_peak or gauss_on_step).

  • fit_range(low, high) energy window for the guess. Defaults to the full range of energy.

  • bin_width – Histogram bin width used internally for peak characterisation.

  • peak – Known peak position in keV. If given, overrides the guessed mu.

  • eres – Known energy resolution (FWHM) in keV. If given, sets sigma = eres / 2.355.

Returns:

parguessiminuit.Values of initial parameter values for func_i, or None if func_i is not supported.

pygama.pargen.survival_fractions.fail_pdf_gos(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, hstep1, hstep2)

Extended PDF for events failing a cut, using a Gaussian-on-step model.

Re-parameterises the standard gauss_on_step PDF in terms of total event counts and cut efficiencies so that epsilon_sig and epsilon_bkg can be fit directly.

Parameters:
  • x – Energy values to evaluate the PDF at.

  • x_lo – Lower edge of the fit range.

  • x_hi – Upper edge of the fit range.

  • n_sig – Total number of signal events (passing + failing).

  • epsilon_sig – Fraction of signal events passing the cut; 1 - epsilon_sig events fail.

  • n_bkg – Total number of background events.

  • epsilon_bkg – Fraction of background events passing the cut; 1 - epsilon_bkg events fail.

  • mu – Peak centroid.

  • sigma – Peak width (standard deviation).

  • hstep1 – Step height parameter for passing events (unused here but kept for a consistent signature shared with pass_pdf_gos()).

  • hstep2 – Step height parameter for failing events.

pygama.pargen.survival_fractions.fail_pdf_hpge(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, htail, tau, hstep1, hstep2)

Extended PDF for events failing a cut, using a HPGe peak model.

Re-parameterises the standard hpge_peak PDF in terms of total event counts and cut efficiencies so that epsilon_sig and epsilon_bkg can be fit directly.

Parameters:
  • x – Energy values to evaluate the PDF at.

  • x_lo – Lower edge of the fit range.

  • x_hi – Upper edge of the fit range.

  • n_sig – Total number of signal events (passing + failing).

  • epsilon_sig – Fraction of signal events passing the cut; 1 - epsilon_sig events fail.

  • n_bkg – Total number of background events.

  • epsilon_bkg – Fraction of background events passing the cut; 1 - epsilon_bkg events fail.

  • mu – Peak centroid.

  • sigma – Peak width (standard deviation).

  • htail – Fraction of events in the low-energy tail.

  • tau – Decay constant of the low-energy tail.

  • hstep1 – Step height parameter for passing events (unused here but kept for a consistent signature shared with pass_pdf_hpge()).

  • hstep2 – Step height parameter for failing events.

pygama.pargen.survival_fractions.fix_all_but_nevents(func)

Return the set of parameters to fix when fitting for cut efficiency.

Produces the list of fixed parameter names and a corresponding boolean mask for func, leaving only the event-count and efficiency parameters free. Used when re-fitting a peak with a known shape to extract the signal and background efficiencies of a data-quality cut.

Parameters:

func – Distribution function; currently supports gauss_on_step and hpge_peak.

Returns:

  • fixed – List of parameter names to be held fixed during the fit, or None if func is not supported.

  • mask – Boolean array of length len(func.required_args()), where True indicates a free (floating) parameter.

Return type:

tuple

pygama.pargen.survival_fractions.get_bounds(func, parguess)

Build a parameter bounds dictionary for a peak fit.

Wraps pygama.pargen.energy_cal.get_hpge_energy_bounds() and applies additional tight constraints on the peak position and event counts.

Parameters:
  • func – Distribution function; currently supports hpge_peak and gauss_on_step.

  • parguess – Initial parameter guess dictionary, as returned by energy_guess() or update_guess().

Returns:

bounds – Dictionary mapping parameter names to (low, high) bound tuples, or None if func is not supported.

Return type:

dict

pygama.pargen.survival_fractions.get_sf_sweep(energy, cut_param, final_cut_value=None, peak=1592.5, eres_pars=None, data_mask=None, cut_range=(-5, 5), n_samples=26, mode='greater', fit_range=None, debug_mode=False)

Function sweeping through cut values and calculating the survival fraction for each value using a fit to the surviving and failing enegry distributions.

Parameters:
  • energy (np.array) – Array of calibrated energies.

  • cut_param (np.array) – Array of the cut parameter values; must have the same length as energy.

  • final_cut_value (float | None) – Cut threshold at which to evaluate and return the final survival fraction. If None, sf and err are returned as None.

  • peak (float) – Known peak position in keV.

  • eres_pars (list | None) – Energy resolution parameter (FWHM) in keV used to seed the fit.

  • data_mask – Boolean mask applied before fitting. Defaults to all-True.

  • cut_range(low, high) interval of cut values to sweep over.

  • n_samples – Number of evenly-spaced cut values to evaluate within cut_range.

  • mode – Direction of the cut: "greater" or "less".

  • fit_range(low, high) energy window for the peak fit.

  • debug_mode – If True, re-raises exceptions encountered during the sweep instead of silently skipping them.

Returns:

  • out_df – DataFrame indexed by cut value with columns sf and sf_err giving the survival fraction and its uncertainty at each sweep point.

  • sf – Survival fraction in percent at final_cut_value, or None.

  • err – Uncertainty on sf in percent, or None.

Return type:

tuple(pd.DataFrame, float, float)

pygama.pargen.survival_fractions.get_survival_fraction(energy, cut_param, cut_val, peak, eres_pars, fit_range=None, high_cut=None, pars=None, data_mask=None, mode='greater', func=<pygama.math.functions.sum_dists.SumDists object>, fix_step=True, display=0)

Function for calculating the survival fraction of a cut using a fit to the surviving and failing energy distributions.

Parameters:
  • energy (ndarray) – Array of calibrated energies.

  • cut_param (ndarray) – Array of the cut parameter values; must have the same length as energy.

  • cut_val (float) – Threshold value of cut_param at which to evaluate the survival fraction.

  • peak (float) – Known peak position in keV.

  • eres_pars (float) – Energy resolution parameter (FWHM) in keV used to seed the fit.

  • fit_range (tuple | None) – (low, high) energy window for the fit. Defaults to the full range of energy.

  • high_cut (float | None) – Upper threshold for cut_param; when set, events in (cut_val, high_cut) are taken as passing.

  • pars (ValueView) – Initial parameter values for the peak fit. If None, an automatic staged fit is performed first.

  • data_mask (ndarray) – Boolean mask applied to energy and cut_param before fitting. Defaults to all-True.

  • mode (str) – Direction of the cut: "greater" keeps events above cut_val; "less" keeps events below it.

  • func – Peak-shape distribution to use; defaults to hpge_peak.

  • fix_step – If True, fix the step-height parameters during the efficiency fit.

  • display – Verbosity level; if greater than 1, plots the passing and failing distributions.

Returns:

  • sf – Survival fraction in percent.

  • err – Statistical uncertainty on sf in percent.

  • values – Best-fit parameter values from the efficiency fit.

  • errors – Parameter uncertainties from the efficiency fit.

pygama.pargen.survival_fractions.pass_pdf_gos(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, hstep1, hstep2)

Extended PDF for events passing a cut, using a Gaussian-on-step model.

Re-parameterises the standard gauss_on_step PDF in terms of total event counts and cut efficiencies so that epsilon_sig and epsilon_bkg can be fit directly.

Parameters:
  • x – Energy values to evaluate the PDF at.

  • x_lo – Lower edge of the fit range.

  • x_hi – Upper edge of the fit range.

  • n_sig – Total number of signal events (passing + failing).

  • epsilon_sig – Fraction of signal events passing the cut (signal efficiency).

  • n_bkg – Total number of background events.

  • epsilon_bkg – Fraction of background events passing the cut.

  • mu – Peak centroid.

  • sigma – Peak width (standard deviation).

  • hstep1 – Step height parameter for passing events.

  • hstep2 – Step height parameter for failing events (unused here but kept for a consistent signature shared with fail_pdf_gos()).

pygama.pargen.survival_fractions.pass_pdf_hpge(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, htail, tau, hstep1, hstep2)

Extended PDF for events passing a cut, using a HPGe peak model.

Re-parameterises the standard hpge_peak PDF in terms of total event counts and cut efficiencies so that epsilon_sig and epsilon_bkg can be fit directly.

Parameters:
  • x – Energy values to evaluate the PDF at.

  • x_lo – Lower edge of the fit range.

  • x_hi – Upper edge of the fit range.

  • n_sig – Total number of signal events (passing + failing).

  • epsilon_sig – Fraction of signal events passing the cut (signal efficiency).

  • n_bkg – Total number of background events.

  • epsilon_bkg – Fraction of background events passing the cut.

  • mu – Peak centroid.

  • sigma – Peak width (standard deviation).

  • htail – Fraction of events in the low-energy tail.

  • tau – Decay constant of the low-energy tail.

  • hstep1 – Step height parameter for passing events.

  • hstep2 – Step height parameter for failing events (unused here but kept for a consistent signature shared with fail_pdf_hpge()).

pygama.pargen.survival_fractions.update_guess(func, parguess, energies)

Update the signal and background event-count guesses from data.

Refines the n_sig and n_bkg entries in parguess by counting events in a ±2*sigma* window around the peak centroid and subtracting a sideband estimate for the background contribution.

Parameters:
  • func – Distribution function; currently supports gauss_on_step and hpge_peak.

  • parguess – Parameter guess dictionary, modified in-place. Must contain mu, sigma, x_lo, x_hi, n_sig, and n_bkg.

  • energies – 1-D array of energy values from which to re-estimate the counts.

Returns:

parguess – The input dictionary with updated n_sig and n_bkg values.

pygama.pargen.utils module

Utility functions for parameter fitting, unit conversion, and LH5 data loading used across the pargen calibration and optimisation modules.

pygama.pargen.utils.convert_to_minuit(pars, func)

Create an iminuit.Minuit instance from a parameter set and a PDF.

Parameters:
  • pars – Initial parameter values. Either a dict mapping parameter names to values, or a sequence of values that will be passed positionally.

  • func – Callable whose signature defines the parameters. If the object exposes a pdf_ext attribute it is used as the cost function (so that extended PDFs are handled correctly); otherwise the callable itself is used.

Returns:

m – Configured Minuit object ready for minimisation.

Return type:

Minuit

pygama.pargen.utils.load_data(files, lh5_path, cal_dict, params, cal_energy_param='cuspEmax_ctc_cal', threshold=None, return_selection_mask=False)

Load parameters from LH5 files and apply calibration expressions.

Reads params from files, evaluates all expressions in cal_dict to produce calibrated columns, and optionally applies a lower energy threshold. When files is a dict keyed by run timestamp, the function recurses over each timestamp and concatenates the results.

Parameters:
  • files (str | list | dict) – A single file path, a list of file paths, or a dict mapping run timestamps to lists of file paths.

  • lh5_path (str) – Path to the LH5 table within each file.

  • cal_dict (dict) – Calibration expressions in hit-dict format: {outname: {"expression": ..., "parameters": {...}}}. When files is a timestamp dict, this may also be keyed by timestamp.

  • params (set) – Set of output column names to include in the returned DataFrame.

  • cal_energy_param (str) – Name of the calibrated energy column used to apply the threshold.

  • threshold – Minimum energy value; events below this are dropped. None keeps all events.

  • return_selection_mask – If True, also return the boolean threshold mask.

Returns:

  • df – DataFrame containing the requested params (plus an optional run_timestamp column when files is a dict).

  • masks – Boolean threshold mask of the same length as df. Only returned when return_selection_mask is True.

Return type:

pd.DataFrame | tuple(pd.DataFrame, np.array)

pygama.pargen.utils.return_nans(input)

Return a NaN-filled result tuple with the same structure as a successful fit.

Useful for propagating fit failures without raising exceptions.

Parameters:

input – Either a plain callable (whose positional arguments after the first define the parameter list) or an object with a required_args() method (e.g. a pygama distribution).

Returns:

  • values – Parameter values, all set to NaN.

  • errors – Parameter uncertainties, all set to NaN.

  • covariance – Square covariance matrix filled with NaN.

Return type:

tuple