pygama.pargen package¶
Utilities to generate and optimize parameters of interest from data (e.g. calibration routines)
Submodules¶
pygama.pargen.AoE_cal module¶
This module provides functions for correcting the a/e energy dependence, determining the cut level and calculating survival fractions.
- class pygama.pargen.AoE_cal.CalAoE(cal_dicts=None, cal_energy_param='cuspEmax_ctc_cal', eres_func=<function CalAoE.<lambda>>, pdf=<pygama.math.functions.sum_dists.SumDists object>, selection_string='index==index', dt_corr=False, dep_correct=False, dt_cut=None, dt_param='dt_eff', high_cut_val=3, mean_func=<class 'pygama.pargen.AoE_cal.Pol1'>, sigma_func=<class 'pygama.pargen.AoE_cal.SigmaFit'>, compt_bands_width=20, debug_mode=False)¶
Bases:
objectClass for calibrating the A/E, this follow 5 steps: the time correction, drift time correction, energy correction, the cut level calculation and the survival fraction calculation.
- Parameters:
cal_dicts (dict | None) – Dictionary of calibration parameters; can be empty/None for a single run, or keyed by timestamp in the format
YYYYMMDDTHHMMSSZfor multiple runs.cal_energy_param (str) – Calibrated energy parameter to use for A/E calibrations and for determining peak events.
eres_func (callable) – Function to determine the energy resolution; should accept a single calibrated energy value.
pdf – PDF to fit to the A/E distribution.
selection_string (str) – Selection string passed as a query to the data DataFrame.
dt_corr (bool) – Whether to apply drift-time correction.
dep_correct (bool) – Whether to correct the double escape peak into the single-site band before cut determination.
dt_cut (dict | None) –
Dictionary of the drift time cut parameters in the form:
{"out_param": "dt_cut", "hard": False}
where
out_paramis the name of the parameter to cut on in the dataframe (should have been precalculated) and"hard"controls whether these events are removed completely for survival fraction calculations, or only for cut determination / A/E calibration steps.dt_param (str) – Name of the drift-time parameter in the dataframe.
high_cut_val (float) – Value to cut the A/E distribution at on the high side.
mean_func (Callable) – Class implementing the energy-dependence model for the A/E mean (e.g.
Pol1).sigma_func (Callable) – Class implementing the energy-dependence model for the A/E sigma (e.g.
SigmaFit).compt_bands_width (float) – Width of the Compton bands in keV used for the energy correction.
debug_mode (bool) – If
True, re-raise exceptions from the A/E calibration steps instead of returning NaN.
- calculate_survival_fractions(data, aoe_param, peaks, fit_widths, mode='greater')¶
Calculate survival fractions for the A/E cut for a list of peaks for the final A/E cut value
- Parameters:
data (DataFrame) – DataFrame containing the data.
aoe_param (str) – Name of the parameter in the dataframe for the final A/E classifier.
peaks (list) – List of peak energies in keV to calculate survival fractions for.
fit_widths (list[tuple]) – List of
(below, above)tuples in keV defining the fit range around each peak.mode (str) – Whether to keep events with A/E
"greater"or"less"than the cut value.
- calculate_survival_fractions_sweep(data, aoe_param, peaks, fit_widths, n_samples=26, cut_range=(-5, 5), mode='greater')¶
Calculate survival fractions for the A/E cut for a list of peaks by sweeping through values of the A/E cut to show how this varies
- Parameters:
data (DataFrame) – DataFrame containing the data.
aoe_param (str) – Name of the parameter in the dataframe for the final A/E classifier.
peaks (list) – List of peak energies in keV to calculate survival fractions for.
fit_widths (list[tuple]) – List of
(below, above)tuples in keV defining the fit range around each peak.n_samples – Number of cut values to sample in the sweep.
cut_range – Range of A/E cut values to sweep through.
mode – Whether to keep events with A/E
"greater"or"less"than the cut value.
- calibrate(df, initial_aoe_param, peaks_of_interest=None, fit_widths=None, cut_peak_idx=0, dep_acc=0.9, sf_nsamples=11, sf_cut_range=(-5, 5), timecorr_mode='full', dtcorr_mode='mcdrift', override_dict=None)¶
Main function to run a full A/E calibration with all steps i.e. time correction, drift time correction, energy correction, A/E cut determination and survival fraction calculation
- Parameters:
df (DataFrame) – DataFrame containing the data.
initial_aoe_param (str) – Name of the A/E parameter in the dataframe to use as starting point.
peaks_of_interest (list | None) – List of peak energies in keV to calculate survival fractions for.
fit_widths (list[tuple] | None) – List of
(below, above)tuples in keV for the fit range around each peak.cut_peak_idx (int) – Index into
peaks_of_interestof the peak used for cut determination.dep_acc (float) – Desired survival fraction in the DEP for the final cut value.
sf_nsamples (int) – Number of samples to take in the survival fraction sweep.
sf_cut_range (tuple) – Range of A/E cut values to use for the survival fraction sweep.
timecorr_mode (str) – Mode to use for the time correction; see
time_correction()for details.dtcorr_mode (str) – Mode to use for the drift time correction; see
drift_time_correction()for details.override_dict (dict | None) –
Optional dictionary of pre-computed calibration entries that bypass one or more fit steps. Each key should map to a sub-dict with
"expression"(a string expression) and"parameters"(a dict of named constants used by the expression). Supported keys and the fit step they replace:"AoE_Timecorr"- skipstime_correction(); the expression is evaluated with columns from df plus the stored parameters to populatedf["AoE_Timecorr"]."AoE_DTcorr"- skipsdrift_time_correction()(only relevant whenself.dt_corrisTrue); populatesdf["AoE_DTcorr"]. Silently ignored ifself.dt_corrisFalse."AoE_Corrected"and"AoE_Classifier"– together skipenergy_correction(). Both keys must be present to take the override path; if only one is given a warning is logged and normal energy correction runs instead."_AoE_Classifier_intermediate"is optional: if present it is evaluated and stored before"AoE_Classifier"is evaluated (useful when the"AoE_Classifier"expression references it)."AoE_Low_Cut"- skipsget_aoe_cut_fit(); the expression should evaluate to a boolean array. The numeric cut threshold is read fromparameters["a"](or the sole parameter value if"a"is absent) and stored asself.low_cut_val.
- drift_time_correction(data, aoe_param='AoE_Timecorr', out_param='AoE_DTcorr', mode='mcdrift', display=0)¶
Apply a linear drift-time correction to the A/E distribution.
Estimates a linear correction coefficient to align the A/E values across drift time and writes the corrected parameter to the DataFrame and calibration dictionary.
- Parameters:
data (DataFrame) – DataFrame containing the data.
aoe_param (str) – Name of the A/E parameter to use as the starting point for correction.
out_param (str) – Name of the output parameter for the drift-time-corrected A/E, stored in the DataFrame and added to the calibration dictionary.
mode (str) – Mode used to estimate the correction coefficient. Options: -
"mcdrift": Minimum Covariance Determinant (MCD) based estimation (improved stability for low statistic and no evident structure) -"bimodal_dt_fit": Fit the position of two drift time peaks (heavily relying on DEP events and clear structure).display (int) – Plot verbosity level.
- energy_correction(data, aoe_param, corrected_param='AoE_Corrected', classifier_param='AoE_Classifier', display=0)¶
Calculates the corrections needed for the energy dependence of the A/E. Does this by fitting the compton continuum in slices and then applies fits to the centroid and variance.
- Parameters:
data (DataFrame) – DataFrame containing the data.
aoe_param (str) – Name of the A/E parameter to use as starting point.
corrected_param (str) – Name of the output parameter for the energy-mean-corrected A/E to be added in the dataframe and to the calibration dictionary.
classifier_param (str) – Name of the output parameter for the full mean- and sigma-corrected A/E classifier to be added in the dataframe and to the calibration dictionary.
display (int) – Plot verbosity level.
- get_aoe_cut_fit(data, aoe_param, peak, ranges, dep_acc=0.9, output_cut_param='AoE_Low_Cut', _display=1)¶
Determines A/E cut by sweeping through values and for each one fitting the DEP to determine how many events survive. Fits the resulting distribution and interpolates to get cut value at desired DEP survival fraction (typically 90%)
- Parameters:
data (DataFrame) – DataFrame containing the data.
aoe_param (str) – Name of the A/E parameter to use as starting point.
peak (float) – Energy in keV of the peak to use for cut determination, e.g.
1592.5.ranges (tuple) – Tuple of
(below, above)in keV defining the fit range aroundpeak, e.g.(20, 40).dep_acc (float) – Desired DEP survival fraction for the final cut.
output_cut_param (str) – Name of the output boolean parameter for events passing the A/E cut in the dataframe and to be added to the calibration dictionary.
display – Plot verbosity level.
- time_correction(df, aoe_param, mode='full', output_name='AoE_Timecorr', display=0)¶
Calculates the time correction for the A/E parameter by fitting the 1-1.3 MeV Compton continuum and extracting the centroid. If only a single run is passed will just perform a shift of the A/E parameter to 1 using the centroid otherwise for multiple runs the shift will be determined on the mode given in.
- Parameters:
df (DataFrame) – DataFrame containing the data.
aoe_param (str) – Name of the A/E parameter to use.
mode (str) –
Mode to use for the time correction, can be
"full","partial", or"none":none: just use the mean of the a/e centroids to shift all the data partial: iterate through the centroids if vary by less than 0.4 sigma then group and take mean otherwise when a run higher than 0.4 sigma is found if it is a single run set to nan otherwise start a new block full : each run will be corrected individually average_consecutive: average the consecutive centroids interpolate_consecutive: interpolate between the consecutive centroids
output_name (str) – Name of the output parameter for the time-corrected A/E in the dataframe and to be added to the calibration dictionary.
display (int) – Plot verbosity level.
- update_cal_dicts(update_dict)¶
Merge new entries into the A/E calibration dictionary.
If
cal_dictsis keyed by run timestamps, each timestamp’s sub-dict is updated individually, using update_dict directly as a fallback when a timestamp is absent. Otherwise update_dict is merged directly.- Parameters:
update_dict – Dictionary of new calibration entries to merge.
- class pygama.pargen.AoE_cal.Pol1¶
Bases:
objectLinear model for the A/E mean as a function of energy: mean(E) = a·E + b.
- static func(x, a, b)¶
Evaluate
a*x + b.
- static guess(_bands, _means, _mean_errs)¶
Return initial parameter guess
[a, b].
- static string_func(input_param)¶
Return the expression string for the hit-dict format.
- class pygama.pargen.AoE_cal.SigmaFit¶
Bases:
objectEnergy-dependent A/E sigma model: sigma(E) = sqrt(a + (b/E)^c).
- static func(x, a, b, c)¶
Evaluate
sqrt(a + (b/x)^c), returning NaN for non-positive arguments.
- static guess(_bands, sigmas, _sigma_errs)¶
Return initial parameter guess
[a, b, c].
- static string_func(input_param)¶
Return the expression string for the hit-dict format.
- class pygama.pargen.AoE_cal.SigmoidFit¶
Bases:
objectSigmoid model for A/E cut survival: sf(E) = (a + b·E) · erfc(c·E + d).
- static func(x, a, b, c, d)¶
Evaluate
(a + b*x) * erfc(c*x + d).
- static guess(_xs, ys, _y_errs)¶
Return initial parameter guess
[a, b, c, d].
- pygama.pargen.AoE_cal.aoe_peak_bounds(func, guess, **kwargs)¶
Build parameter bounds for an A/E peak fit.
- Parameters:
func – PDF to bound; one of
aoe_peak,aoe_peak_with_high_tail,exgauss, orgaussian.guess – Current parameter guess dict (used to derive relative bounds for
muandsigma).**kwargs – Override specific bounds by name (e.g.
tau=(0, 0.5)).
- Returns:
bounds_dict – Dict mapping each parameter name to a
(lower, upper)tuple suitable for passing tolimits.
- pygama.pargen.AoE_cal.aoe_peak_fixed(func, **_kwargs)¶
Return the fixed parameters and free-parameter mask for an A/E peak fit.
- Parameters:
func – PDF to query; one of
aoe_peak,aoe_peak_with_high_tail,exgauss, orgaussian.**kwargs – Unused; reserved for future overrides.
- Returns:
fixed – List of parameter names to hold fixed (typically the fit-range boundaries
x_loandx_hi).mask – Boolean array aligned with
func.required_args();Truefor free parameters,Falsefor fixed ones.
- pygama.pargen.AoE_cal.aoe_peak_guess(func, hist, bins, var, **kwargs)¶
Build initial parameter guesses for an A/E peak fit.
Estimates the peak centroid and width from the histogram using
pgf.gauss_mode_width_max(), falling back to a simple Gaussian guess when that fails. The returned values are packaged viaconvert_to_minuit()so they can be passed directly toMinuit.- Parameters:
func – PDF to guess parameters for; one of
aoe_peak,aoe_peak_with_high_tail,exgauss, orgaussian.hist – Histogram counts.
bins – Histogram bin edges.
var – Histogram bin variances.
**kwargs – Override specific guess values by name (e.g.
mu=0.95).
- Returns:
values –
ValueViewof initial parameter guesses forfunc.
- pygama.pargen.AoE_cal.average_consecutive(tstamps, means)¶
Fit the time dependence of the means of the A/E distribution by average consecutive entries
- Parameters:
tstamps – Timestamps of the data.
means – Means of the A/E distribution.
- Returns:
out_dict – Dictionary mapping each timestamp to the average of its mean and the next.
- pygama.pargen.AoE_cal.bimodal_dt_fit(data, aoe_param, dt_param, fit_selection, cal_energy_param, pdf, debug_mode=False, display=0, **_kwargs)¶
Calculates the correction needed to align the two drift time regions for ICPC detectors. This is done by fitting the two drift time peaks in the drift time spectrum and then fitting the A/E peaks in each of these regions. A simple linear correction is then applied to align these regions.
- Parameters:
data (DataFrame) – DataFrame containing the data.
aoe_param (str) – Name of the A/E parameter to use as starting point.
dt_param (str) – Name of the drift time parameter.
fit_selection (str) – Selection string for the fit.
cal_energy_param (str) – Name of the calibrated energy parameter.
pdf – PDF to use for the A/E fit.
debug_mode (bool) – If True, re-raises exceptions instead of logging them.
display (int) – Plot verbosity level.
- Returns:
alpha – Linear correction coefficient.
dt_res_dict – Dictionary containing intermediate fit results.
- Return type:
- pygama.pargen.AoE_cal.drifttime_corr_plot(aoe_class, data, aoe_param='AoE_Timecorr', aoe_param_corr='AoE_DTcorr', figsize=(12, 8), fontsize=12)¶
- pygama.pargen.AoE_cal.fit_time_means(tstamps, means, sigmas)¶
Fit the time dependence of the means of the A/E distribution
- Parameters:
tstamps – Timestamps of the data.
means – Means of the A/E distribution.
sigmas – Sigmas of the A/E distribution.
- Returns:
out_dict – Dictionary mapping each timestamp to its corrected mean value.
- pygama.pargen.AoE_cal.interpolate_consecutive(tstamps, means, times, aoe_param, output_name)¶
Fit the time dependence of the means of the A/E distribution by average consecutive entries
- Parameters:
tstamps – Timestamps of the data.
means – Means of the A/E distribution.
times – Unix timestamps of each mean sample.
aoe_param – Name of the A/E parameter in the dataframe.
output_name – Key to use for the correction expression in the output dict.
- Returns:
out_dict – Dictionary mapping each timestamp to a per-output-name correction expression.
- pygama.pargen.AoE_cal.mcdrift(data, aoe_param, dt_param, fit_selection, cal_energy_param, debug_mode=False, **_kwargs)¶
Calculates the drift time correction for the A/E parameter using the Minimum Covariance Determinant (MCD) method and PCA on the (AoE, dt_eff) distribution.
- Parameters:
data (DataFrame) – DataFrame containing the data.
aoe_param (str) – Name of the A/E parameter to use as starting point.
dt_param (str) – Name of the drift time parameter.
fit_selection (str) – Selection string for the fit.
cal_energy_param (str) – Name of the calibrated energy parameter.
debug_mode (bool) – If True, re-raises exceptions instead of logging them.
- Returns:
alpha – Linear correction coefficient.
dt_res_dict – Dictionary containing intermediate fit results.
- Return type:
- pygama.pargen.AoE_cal.plot_aoe_mean_time(aoe_class, _data, time_param='AoE_Timecorr', figsize=(12, 8), fontsize=12)¶
- pygama.pargen.AoE_cal.plot_aoe_res_time(aoe_class, _data, _time_param='AoE_Timecorr', figsize=(12, 8), fontsize=12)¶
- pygama.pargen.AoE_cal.plot_classifier(aoe_class, data, aoe_param='AoE_Classifier', xrange=(900, 3000), yrange=(-50, 10), xn_bins=700, yn_bins=500, figsize=(12, 8), fontsize=12)¶
- Return type:
figure
- pygama.pargen.AoE_cal.plot_compt_bands_overlayed(aoe_class, data, eranges, aoe_param='AoE_Timecorr', aoe_range=None, title='Compton Bands', density=True, n_bins=50, figsize=(12, 8), fontsize=12)¶
Function to plot various compton bands to check energy dependence and corrections
- pygama.pargen.AoE_cal.plot_cut_fit(aoe_class, _data, dep_acc=0.9, figsize=(12, 8), fontsize=12)¶
- Return type:
figure
- pygama.pargen.AoE_cal.plot_dt_dep(aoe_class, data, eranges, titles=None, aoe_param='AoE_Timecorr', bins=(200, 100), dt_max=2000, figsize=(12, 8), fontsize=12)¶
Function to produce 2d histograms of A/E against drift time to check dependencies
- pygama.pargen.AoE_cal.plot_mean_fit(aoe_class, _data, figsize=(12, 8), fontsize=12)¶
- Return type:
figure
- pygama.pargen.AoE_cal.plot_sf_vs_energy(aoe_class, data, xrange=(900, 3000), n_bins=701, figsize=(12, 8), fontsize=12)¶
- Return type:
figure
- pygama.pargen.AoE_cal.plot_sigma_fit(aoe_class, _data, figsize=(12, 8), fontsize=12)¶
- Return type:
figure
- pygama.pargen.AoE_cal.plot_spectra(aoe_class, data, xrange=(900, 3000), n_bins=2101, xrange_inset=(1580, 1640), n_bins_inset=200, figsize=(12, 8), fontsize=12)¶
- Return type:
figure
- pygama.pargen.AoE_cal.plot_survival_fraction_curves(aoe_class, _data, figsize=(12, 8), fontsize=12)¶
- Return type:
figure
- pygama.pargen.AoE_cal.unbinned_aoe_fit(aoe, pdf=<pygama.math.functions.sum_dists.SumDists object>, display=0)¶
Fitting function for A/E, first fits just a Gaussian before using the full pdf to fit if fails will return NaN values
- Parameters:
aoe (np.array) – A/E values.
pdf – PDF to fit to.
display (int) – Verbosity level; 0 = silent, >1 = diagnostic plots.
- Returns:
values – Fitted parameter values.
errors – Fitted parameter errors.
covariance – Covariance matrix.
fit – Full fit info tuple
(values, errors, covariance, gof, pdf, mask, valid, m).
- Return type:
tuple(np.array, np.array)
pygama.pargen.data_cleaning module¶
This module provides routines for calculating and applying quality cuts
- pygama.pargen.data_cleaning.find_pulser_properties(df, energy='daqenergy')¶
Identify pulser peaks in an energy spectrum by their time periodicity.
Locates local maxima in the energy histogram, fits each peak top to estimate its centroid and width, and then checks whether events in each peak arrive with a regular time spacing — a hallmark of pulser signals. Only peaks with a clearly periodic inter-arrival time are returned.
- Parameters:
df – DataFrame with at least an energy column (named by energy) and a
timestampcolumn in seconds.energy – Name of the energy column to analyse.
- Returns:
out_pulsers – List of
(pulser_energy, peak_e_err, period, energy_name)tuples, one per detected pulser peak. pulser_energy is the peak centroid, peak_e_err is its half-width, period is the repetition period in seconds, and energy_name is the value of energy.
- pygama.pargen.data_cleaning.fit_distributions(x_lo, x_hi, norm_par_array, display=0)¶
Fit several distribution models to a parameter distribution and return the best-fitting one.
Candidates (Gaussian, exGauss, skewed Gaussian, Gaussian-on-exGauss, double exGauss) are fitted unbinned over the range
[x_lo, x_hi]. The winner is chosen by p-value (or lowest chi-squared when all p-values are zero).- Parameters:
x_lo – Lower bound of the fitting range.
x_hi – Upper bound of the fitting range.
norm_par_array – 1-D array of (typically normalised) parameter values.
display – Verbosity / plotting level. Values > 0 trigger diagnostic plots.
- Returns:
func – Best-fitting distribution callable.
pars – Parameter array for the winning model.
- Return type:
- pygama.pargen.data_cleaning.generate_cut_classifiers(data, cut_dict, rounding=4, display=0)¶
Finds double sided cut boundaries for a file for the parameters specified
- Parameters:
data (dict[str, ndarray]) – Input data as an LH5 table, a dictionary of arrays, or a
pandas.DataFrame.Dictionary of the form:
{ "output_parameter_name": { "cut_parameter": "parameter_to_cut_on", "cut_level": "number_of_sigmas", "mode": "inclusive/exclusive" } }
cut_level may also be a dict to set asymmetric or one-sided cuts:
{ "output_parameter_name": { "cut_parameter": "parameter_to_cut_on", "cut_level": {"low_side": "3", "high_side": "2"}, "mode": "inclusive/exclusive" } }
Alternatively, pass through an arbitrary expression directly:
{ "is_valid_t0":{ "expression":"(tp_0_est>a)&(tp_0_est<b)", "parameters":{"a":"46000", "b":"52000"} } }
rounding (int) – Number of decimal places to round cut boundaries to.
display (int) – Verbosity level. Values > 0 produce diagnostic histograms.
- Returns:
output_dict – Cut expressions in hit-dict format with an additional
{out_par}_classifiernormalisation entry for each fitted parameter:{ "output_parameter_name": { "expression": "cut_expression", "parameters": {"a": lower_bound, "b": upper_bound}, } }
plot_dict – Dictionary of
matplotlib.figure.Figureobjects keyed by parameter name. Only returned when display > 0.
- Return type:
- pygama.pargen.data_cleaning.generate_cuts(data, cut_dict, rounding=4, display=0)¶
Finds double sided cut boundaries for a file for the parameters specified
- Parameters:
data (dict[str, ndarray]) – Input data as an LH5 table, a dictionary of arrays, or a
pandas.DataFrame.Dictionary of the form:
{ "output_parameter_name": { "cut_parameter": "parameter_to_cut_on", "cut_level": "number_of_sigmas", "mode": "inclusive/exclusive" } }
cut_level may also be a dict to set asymmetric or one-sided cuts:
{ "output_parameter_name": { "cut_parameter": "parameter_to_cut_on", "cut_level": {"low_side": "3", "high_side": "2"}, "mode": "inclusive/exclusive" } }
Alternatively, pass through an arbitrary expression directly:
{ "is_valid_t0":{ "expression":"(tp_0_est>a)&(tp_0_est<b)", "parameters":{"a":"46000", "b":"52000"} } }
rounding (int) – Number of decimal places to round cut boundaries to.
display (int) – Verbosity level. Values > 0 produce diagnostic histograms.
- Returns:
output_dict – Cut expressions in hit-dict format:
{ "output_parameter_name": { "expression": "cut_expression", "parameters": {"a": lower_bound, "b": upper_bound}, } }
plot_dict – Dictionary of
matplotlib.figure.Figureobjects keyed by parameter name. Only returned when display > 0.
- Return type:
- pygama.pargen.data_cleaning.get_cut_indexes(data, cut_parameters)¶
Compute a boolean mask of events that pass all specified cuts.
Derives cut boundaries via
generate_cuts(), evaluates each cut expression, and returns a combined pass/fail mask.- Parameters:
data – Input data as an
lgdo.Tableorpandas.DataFrame.cut_parameters – Cut specification dictionary passed through to
generate_cuts().
- Returns:
ct_mask – Boolean array of length
len(data);Truefor events passing all cuts.
- pygama.pargen.data_cleaning.get_keys(in_data, cut_dict)¶
Extract the data column names referenced by a cut dictionary.
Scans each entry in cut_dict for the
cut_parameterorexpressionkey and returns the subset of those names that actually appear in in_data.- Parameters:
in_data – Source of available column names. Either a mapping (e.g. a dict or LH5
Table) whose.keys()give the column names, or an iterable of name strings.cut_dict – Cut specification dictionary, as used by
generate_cuts().
- Returns:
keys – Sorted list of unique column names from in_data that are referenced by cut_dict.
- pygama.pargen.data_cleaning.get_mode_stdev(par_array)¶
Estimate the mode and standard deviation of a parameter distribution.
Uses a histogram-based approach: the distribution is first clipped to the 1st-99th percentile range, the modal bin is located, and a Gaussian is fit in the neighbourhood of that bin to refine the estimates. Multiple fallback strategies are applied when the primary fit fails.
- Parameters:
par_array – 1-D array of parameter values.
- Returns:
mean – Estimated mode (peak position) of the distribution.
std – Estimated standard deviation corresponding to the FWHM of the peak.
- Return type:
- pygama.pargen.data_cleaning.get_tcm_pulser_ids(tcm_file, channel, multiplicity_threshold)¶
Identify pulser event indices from a TCM (Time Coincidence Map) file.
Reads the hardware TCM table and returns the indices of events in which the specified channel appears and the total multiplicity exceeds the given threshold — a signature typical of pulser events that fire multiple channels simultaneously.
- Parameters:
tcm_file – Path to a single TCM LH5 file, or a list of such file paths. When a list is supplied the results are concatenated.
channel – Channel identifier. Accepts an integer or a string with an optional
"ch"prefix (e.g."ch34"or"34").multiplicity_threshold – Minimum event multiplicity required for an event to be labelled as a pulser.
- Returns:
ids – Array of integer indices of pulser events within the channel’s event list.
mask – Boolean mask of the same length as the channel event list.
- Return type:
- pygama.pargen.data_cleaning.skewed_fit(x, n_sig, mu, sigma, alpha)¶
- pygama.pargen.data_cleaning.skewed_pdf(x, n_sig, mu, sigma, alpha)¶
- pygama.pargen.data_cleaning.tag_pulsers(df, chan_info, window=0.01)¶
Tag pulser events in a DataFrame using energy and periodicity cuts.
For each channel descriptor in chan_info, events are selected by a narrow energy window around the pulser peak energy. A linear fit to the pulse timestamps is then used to determine the precise period and phase of the pulser train; events that fall within window seconds of an expected pulse are labelled as pulsers via an
isPulsercolumn.- Parameters:
df – DataFrame containing at minimum a
timestampcolumn and the energy column(s) referenced in chan_info.chan_info – A single tuple
(pulser_energy, peak_e_err, period, energy_name)or a list of such tuples, one per channel. pulser_energy and peak_e_err define the energy selection window; period is the approximate repetition period in seconds; energy_name is the column in df to use.window – Half-width (seconds) of the time residual window around each expected pulse used to define the period cut.
- Returns:
df – The input DataFrame with an added (or updated)
isPulsercolumn set to 1 for pulser events and 0 for physics events.- Return type:
pygama.pargen.dplms_ge_dict module¶
This module is for creating dplms dictionary for ge processing
- pygama.pargen.dplms_ge_dict.dplms_ge_dict(raw_fft, raw_cal, dsp_config, par_dsp, dplms_dict, fom_func, decay_const=0, ene_par='dplmsEmax', ctc_par='dt_eff', display=0)¶
This function calculates the dplms dictionary for HPGe detectors.
- Parameters:
raw_fft (Table) – Table with FFT raw data already baseline-selected upstream. All events in this table are used to estimate the noise matrix.
raw_cal (Table) – table with cal raw data
dsp_config (dict) – dsp config file
par_dsp (dict) – Dictionary with db parameters for dsp processing
dplms_dict (dict) – Dictionary with various parameters
fom_func – Function for peak fits
- Returns:
out_dict
- Return type:
- pygama.pargen.dplms_ge_dict.filter_synthesis(ref, nmat, rmat, za, pmat, fmat, length, size, flip=True)¶
Synthesise a DPLMS optimal filter for a single channel.
Solves the linear system
(N + R + za*I + P + F) x = rfor the filter coefficients x, where N, R, P, and F are the noise, reference, pile-up, and flat-top matrices respectively, za is a zero-area regularisation term, and r is the reference pulse window.- Parameters:
ref (array) – Mean reference pulse of length size.
nmat (array) – Noise covariance matrix, shape
(length, length).rmat (array) – Reference signal matrix, shape
(length, length).za (int) – Zero-area regularisation coefficient.
pmat (array) – Pile-up penalty matrix, shape
(length, length).fmat (array) – Flat-top derivative matrix, shape
(length, length).length (int) – Filter length (samples).
size (int) – Full waveform buffer length (samples); used to centre the reference window.
flip (bool) – If True (default), return the time-reversed filter coefficients so that the filter can be applied via convolution.
- Returns:
x – Filter coefficients of length length.
y – Response of the filter convolved with the reference pulse.
refy – Reference pulse window aligned to y for comparison.
- Return type:
- pygama.pargen.dplms_ge_dict.filter_synthesis_corr(ref, nmat_corr, rmat, za, pmat, fmat, length, size, flip=True)¶
Synthesise a DPLMS optimal filter exploiting cross-channel noise correlations.
Extends
filter_synthesis()to the multi-channel case by building block-extended versions of the signal-shape matrices and solving the combined linear system. Only the primary-channel slice of the solution is returned.- Parameters:
ref (array) – Mean reference pulse for the primary channel, length size.
nmat_corr (array) – Block noise covariance matrix from
noise_matrix_corr(), shape(n_channels * length, n_channels * length).rmat (array) – Reference signal matrix for the primary channel, shape
(length, length).za (int) – Zero-area regularisation coefficient.
pmat (array) – Pile-up penalty matrix for the primary channel, shape
(length, length).fmat (array) – Flat-top derivative matrix for the primary channel, shape
(length, length).length (int) – Filter length per channel (samples).
size (int) – Full waveform buffer length (samples).
flip (bool) – If True (default), return the time-reversed filter coefficients.
- Returns:
x – Primary-channel filter coefficients of length length.
y – Response of the filter convolved with the (extended) reference pulse.
refy – Reference pulse window aligned to y for comparison.
- Return type:
- pygama.pargen.dplms_ge_dict.is_not_pile_up(peak_pos, peak_pos_neg, thr, lim, size)¶
Reject pile-up events based on the presence of secondary peaks.
A histogram of positive-peak positions is used to define a single-peak band; events with additional peaks outside that band, or any negative peaks, are flagged as pile-up.
- Parameters:
peak_pos (array) – 2-D array of positive peak positions per event (samples).
peak_pos_neg (array) – 2-D array of negative peak positions per event (samples).
thr (int) – Amplitude threshold (percent of histogram maximum) used to define the edges of the primary peak band.
lim (int) – Half-width of the search window around the nominal waveform centre (samples).
size (int) – Length of the waveform window used for filter synthesis (samples).
- Returns:
idxs – Per-event mask; True means the event is pile-up free.
llow – Lower boundary of the accepted peak band (samples).
lupp – Upper boundary of the accepted peak band (samples).
- Return type:
- pygama.pargen.dplms_ge_dict.is_valid_centroid(centroid, lim, size, full_size)¶
Select waveforms whose centroid lies within a valid alignment window.
- Parameters:
centroid (array) – Per-event centroid positions (samples).
lim (int) – Half-width of the allowed centroid displacement from the nominal centre of the waveform window (samples).
size (int) – Length of the waveform window used for filter synthesis (samples).
full_size (int) – Total waveform buffer length (samples).
- Returns:
idxs – Boolean mask that is True for valid events.
llim – Lower centroid boundary used (samples).
hlim – Upper centroid boundary used (samples).
- Return type:
- pygama.pargen.dplms_ge_dict.is_valid_risetime(risetime, llim, perc)¶
Select waveforms within an acceptable risetime range.
- Parameters:
- Returns:
idxs – Boolean mask; True for events within the accepted risetime range.
llim – Lower risetime boundary used (samples).
hlim – Upper risetime boundary used (samples, computed from perc).
- Return type:
- pygama.pargen.dplms_ge_dict.noise_matrix(bls, length)¶
Compute the noise covariance matrix from baseline waveforms.
The baselines are mean-subtracted, the raw covariance is estimated, and then compressed to the filter length via a sliding-window average (convolution with an identity kernel).
- pygama.pargen.dplms_ge_dict.noise_matrix_corr(bls, bls_corr, length)¶
Compute a block noise covariance matrix including cross-channel correlations.
Extends
noise_matrix()to the multi-channel case by building an(n_channels * length, n_channels * length)block matrix where each block is the cross-channel noise covariance between two channels, compressed to length via the same sliding-window average used innoise_matrix().- Parameters:
- Returns:
nmat – Symmetrised block noise covariance matrix of shape
(n_channels * length, n_channels * length).- Return type:
- pygama.pargen.dplms_ge_dict.signal_matrices(wfs, length, decay_const, ff=2)¶
Compute the signal-shape matrices needed for DPLMS filter synthesis.
Three matrices are constructed from the waveform ensemble:
rmat - outer product of the mean reference pulse (shape constraint).
pmat - pile-up penalty matrix built from an exponential decay model.
fmat - flat-top matrix encoding the derivative of the flat-top region.
- Parameters:
wfs (array) – 2-D array of aligned signal waveforms, shape
(n_events, n_samples).length (int) – Filter length (samples); determines the output matrix dimensions.
decay_const (float) – Exponential decay constant used to build the pile-up penalty matrix. Set to 0 to disable the pile-up term.
ff (int) – Flat-top length (samples) used to construct the flat-top derivative matrix.
- Returns:
ref – Mean reference pulse of length n_samples.
rmat – Reference signal matrix, shape
(length, length).pmat – Pile-up penalty matrix, shape
(length, length).fmat – Flat-top derivative matrix, shape
(length, length).
- Return type:
- pygama.pargen.dplms_ge_dict.signal_selection(dsp_cal, dplms_dict, coeff_values)¶
Apply centroid, pile-up, and risetime cuts to select clean calibration signals.
- Parameters:
dsp_cal – DSP output table (lgdo.Table) containing at minimum
peak_pos,peak_pos_neg,centroid,tp_90, andtp_10fields.dplms_dict – Dictionary with DPLMS configuration parameters including
rt_low,peak_lim,wsize,bsize,centroid_lim, anddp_defsub-dictionary withrtandptdefaults.coeff_values – Dictionary of per-optimisation-point overrides; may contain
rt(risetime percentile) andpt(pile-up threshold) keys.
- Returns:
sel_dict – Selection dictionary with keys:
idxs- combined boolean mask of passing eventsct_ll,ct_hh- centroid window boundariespp_ll,pp_hh- pile-up peak band boundariesrt_ll,rt_hh- risetime window boundaries
- Return type:
pygama.pargen.dsp_optimize module¶
- class pygama.pargen.dsp_optimize.BayesianOptimizer(acq_func, batch_size, kernel=None, sampling_rate=None, fom_value='y_val', fom_error='y_val_err')¶
Bases:
objectBayesian optimization using a Gaussian Process regression.
This optimizer fits a Gaussian Process (GP) to fit kernel to data, takes in a series of init samples for this fit and then calculates the next point using the acquisition function specified.
The GP model is implemented using
sklearn.gaussian_process.GaussianProcessRegressor.- Parameters:
acq_func (str) –
Acquisition function used to select new sampling points.
- Supported values are:
"ei": Expected Improvement"ucb": Upper Confidence Bound"lcb": Lower Confidence Bound
batch_size (int) – Number of candidate points proposed at each optimization iteration.
kernel (Kernel | None) – Optional kernel used by the Gaussian Process regressor. If
None, the default kernel fromGaussianProcessRegressoris used.sampling_rate (str | pint.Quantity | None) – Optional sampling rate associated with evaluations. If given as a string it will be converted to a
pint.Quantity.fom_value (str) – Column name containing the figure-of-merit values in the input data,
fom_error (str) – Column name containing the uncertainty associated with the figure-of-merit values.
- Variables:
gauss_pr – Gaussian Process model used.
acq_function – Acquisition function method corresponding to the specified acq_func.
batch_size – Batch size used in the Gaussian Process optimization.
dims – List of dimensions (parameters) added to the optimizer’s search space.
iters – Counter for the number of optimization iterations performed.
best_samples – DataFrame storing the best evaluated samples and their acquisition values.
distances – List of distances between sampled points, used for diagnostics.
x_init – Initial input points provided via add_initial_values.
y_init – Observed function values corresponding to x_init.
yerr_init – Uncertainty associated with y_init.
y_min – Minimum observed value in y_init. Set by get_first_point.
optimal_x – Input corresponding to y_min. Set by get_first_point.
optimal_ei – Placeholder for expected improvement at optimal_x. Set by get_first_point.
current_x – Current optimization point being evaluated. Updated by update_db_dict().
current_ei – Acquisition function value for current_x. Updated by update_db_dict().
current_iter – Optimization iteration counter. Incremented each time update_db_dict() is called.
Examples
Simple 1D optimization:
import numpy as np # Create optimizer optimizer = BayesianOptimizer(acq_func="ei", batch_size=1) optimizer.add_dimension("x", parameter="x", min_val=-5, max_val=5) # Add initial observations optimizer.add_initial_values( np.array([[0], [2], [-1]]), np.array([10.0, 5.0, 7.0]), np.array([0.1, 0.2, 0.1]) ) # Update DSP parameters database with the current optimization point db_dict = optimizer.update_db_dict({}) # Simulate new evaluation and update optimizer optimizer.update({"x": [1.0], "y_val": 3.5, "y_val_err": 0.1}) # Plot the result fig = optimizer.plot()
- _extend_prior_with_posterior_data(x, y, yerr)¶
Append a new observation
(x, y, yerr)to the internal training arrays.
- _get_expected_improvement(x_new)¶
Compute the Expected Improvement acquisition value at x_new.
- _get_lcb(x_new)¶
Compute the Lower Confidence Bound acquisition value at x_new.
- _get_next_probable_point()¶
Find the next candidate point by minimising the acquisition function.
Samples batch_size random starting points within the search bounds and runs L-BFGS-B from each. Values that coincide with an existing sample are perturbed slightly. Parameters with
round_to_samplesset are rounded to the nearest multiple ofsampling_rate.- Returns:
x_optimal – List of parameter values for the next candidate point.
min_ei – Acquisition function value at x_optimal.
- _get_ucb(x_new)¶
Compute the Upper Confidence Bound acquisition value at x_new.
- add_dimension(name, parameter, min_val, max_val, round_to_samples=False, unit=None)¶
Add a new dimension (parameter) to the optimizer’s search space.
Each dimension defines a variable that the Bayesian optimizer will vary to explore the objective function.
- Parameters:
name (str) – Name of the dimension.
parameter (float) – Initial value or reference for this parameter.
min_val (float) – Minimum allowed value for this parameter.
max_val (float) – Maximum allowed value for this parameter.
round_to_samples (bool) – If True, sampled values along this dimension will be rounded to multiples of sampling_rate. Requires sampling_rate to be set in the optimizer.
unit (str | Quantity | None) – Physical unit of the parameter (e.g., “K”, “mL”). Converted to a pint.Quantity internally if provided.
- add_initial_values(x_init, y_init, yerr_init)¶
Add initial observed data to the optimizer.
These initial samples are used to fit the Gaussian Process model before starting the Bayesian optimization iterations.
- Parameters:
Notes
The lengths of x_init, y_init, and yerr_init must all match.
- get_best_vals()¶
Return the best-observed parameter values as a DSP database dictionary.
Formats
optimal_xinto the same{name: {parameter: value_str}}structure expected byrun_one_dsp().- Returns:
out_dict – DSP parameter database dictionary with the optimal values.
- get_first_point()¶
Get the first optimization point based on initial data.
Selects the input point corresponding to the minimum observed value in the initial samples (y_init). This point can serve as the starting point for the Bayesian optimization process.
Sets the following attributes on the optimizer: - y_min : Minimum observed value in y_init. - optimal_x : Input corresponding to y_min. - optimal_ei : Initialized to None.
- get_n_dimensions()¶
Return the number of dimensions in the search space.
- iterate_values()¶
Fit the GP to current data and propose the next evaluation point.
Trains the Gaussian Process on all valid (non-NaN) observations accumulated so far, then calls
_get_next_probable_point()to select the next candidate.- Returns:
x_next – Parameter values for the next candidate point.
ei – Acquisition function value at x_next.
- plot(init_samples=None)¶
Plot the Gaussian Process regression and acquisition function.
For 1D and 2D optimization problems, this method visualizes the fitted Gaussian Process and the acquisition function across the search space. Initial samples, failed samples, and the optimal point are highlighted.
- Parameters:
init_samples – Optional list of initial sample points to highlight on the plot.
- Returns:
fig – Matplotlib Figure object containing the plot of the Gaussian Process regression and acquisition function.
- Return type:
- plot_acq(init_samples=None)¶
Plot the acquisition function across the search space.
- update(results)¶
Update the optimizer with new evaluation results.
This method updates the internal state of the Bayesian optimizer with the latest evaluation at current_x. It extends the prior data used by the Gaussian Process, updates the current best observed point, and tracks distances between consecutive sampled points.
- update_db_dict(db_dict)¶
Update a DSP parameters database with the current optimization point.
This method updates db_dict with the values of the current point being evaluated by the optimizer. If it is the first iteration, it retrieves the initial point; otherwise, it computes the next point using iterate_values(). Each dimension’s value is stored in the database, with units appended if specified.
- Parameters:
db_dict (Mapping) – Dictionary representing the DSP parameters database. Keys are dimension names, and values are dictionaries mapping parameter identifiers to their corresponding values as strings (including units).
- Returns:
Mapping – Updated DSP parameters database including the current optimization point.
- Return type:
- class pygama.pargen.dsp_optimize.OptimiserDimension(name, parameter, min_val, max_val, round, unit)¶
Bases:
tupleCreate new instance of OptimiserDimension(name, parameter, min_val, max_val, round, unit)
- _asdict()¶
Return a new dict which maps field names to their values.
- _field_defaults = {}¶
- _fields = ('name', 'parameter', 'min_val', 'max_val', 'round', 'unit')¶
- classmethod _make(iterable)¶
Make a new OptimiserDimension object from a sequence or iterable
- _replace(**kwds)¶
Return a new OptimiserDimension object replacing specified fields with new values
- max_val¶
Alias for field number 3
- min_val¶
Alias for field number 2
- name¶
Alias for field number 0
- parameter¶
Alias for field number 1
- round¶
Alias for field number 4
- unit¶
Alias for field number 5
- class pygama.pargen.dsp_optimize.ParGrid¶
Bases:
objectParameter Grid class Each ParGrid entry corresponds to a dsp parameter to be varied. The ntuples must follow the pattern: ( name parameter value_strs) : ( str, str, list of str) where name and parameter are the same as ‘db.name.parameter’ in the processing chain, value_strs is the array of strings to set the argument to.
- add_dimension(name, parameter, value_strs)¶
Add a new dimension to the parameter grid.
- Parameters:
name – Dimension name, matching the
db.nameprefix in the DSP chain.parameter – Parameter name within name, matching
db.name.parameter.value_strs – List of string values to sweep over for this dimension.
- get_data(i_dim, i_par)¶
Return the name, parameter, and value string for a grid cell.
- Parameters:
i_dim – Dimension index.
i_par – Parameter (point) index within that dimension.
- Returns:
name – Dimension name string.
parameter – Parameter name string.
value_str – String value at position i_par along dimension i_dim.
- get_n_dimensions()¶
Return the number of dimensions in the grid.
- get_n_grid_points()¶
Return the total number of grid points (product of all axis lengths).
- get_n_points_of_dim(i)¶
Return the number of grid points along dimension i.
- get_par_meshgrid(copy=False, sparse=False)¶
return a meshgrid of parameter values Always uses Matrix indexing (natural for par grid) so that mg[i1][i2][…] corresponds to index order in self.dims Note copy is False by default as opposed to numpy default of True
- get_shape()¶
Return the shape of the N-dimensional grid as a tuple.
- get_zero_indices()¶
Return a zero-initialised index array of length
get_n_dimensions().
- iterate_indices(indices)¶
iterate given indices [i1, i2, …] by one. For easier iteration. The convention here is arbitrary, but its the order the arrays would be traversed in a series of nested for loops in the order appearing in dims (first dimension is first for loop, etc): Return False when the grid runs out of indices. Otherwise returns True.
- print_data(indices)¶
Format a human-readable description of a grid point.
- Parameters:
indices – Sequence of per-dimension indices identifying the grid point.
- Returns:
print_string – Multi-line string listing each
name.parameter = valueentry.
- set_dsp_pars(db_dict, indices)¶
Populate a DSP parameter database from grid indices.
- Parameters:
db_dict – Existing database dictionary to update, or
Noneto create a new one.indices – Per-dimension index array identifying the grid point.
- Returns:
db_dict – Updated database dictionary with the parameter values for the specified grid point.
- class pygama.pargen.dsp_optimize.ParGridDimension(name, parameter, value_strs)¶
Bases:
tupleCreate new instance of ParGridDimension(name, parameter, value_strs)
- _asdict()¶
Return a new dict which maps field names to their values.
- _field_defaults = {}¶
- _fields = ('name', 'parameter', 'value_strs')¶
- classmethod _make(iterable)¶
Make a new ParGridDimension object from a sequence or iterable
- _replace(**kwds)¶
Return a new ParGridDimension object replacing specified fields with new values
- name¶
Alias for field number 0
- parameter¶
Alias for field number 1
- value_strs¶
Alias for field number 2
- pygama.pargen.dsp_optimize.get_grid_points(grid)¶
Generate a list of all grid-point index combinations.
Iterates all grids simultaneously, cycling each grid back to its zero index when it is exhausted. Iteration stops when every grid has completed at least one full cycle.
- Parameters:
grid – List of
ParGridobjects defining the search space.- Returns:
out – List of
[tuple, ...]entries, where each entry contains one index tuple per grid.
- pygama.pargen.dsp_optimize.run_bayesian_optimisation(tb_data, dsp_config, fom_function, optimisers, fom_kwargs=None, db_dict=None, nan_val=10, n_iter=10)¶
Run Bayesian optimization loop to optimize DSP parameters.
- Parameters:
tb_data (Table) – An input table of the data for which the DSP is performed.
dsp_config (str | Mapping) – Configuration for the DSP processing chain, see run_one_dsp()
fom_function (list | Callable) – Function(s) to calculate the figure-of-merit from the DSP output table. If a list is given, each function will be applied to the output of each optimiser in order. If a single function is given, it will be applied to the output of all optimisers. This function should return a dictionary containing a keys of name fom_value and fom_error in the optimisers, which will be used to inform the optimization process.
optimisers (list | BayesianOptimizer) – A list of BayesianOptimizer instances to run in parallel. Each optimizer will update the DSP parameters database and receive the results of each iteration to inform its next point selection.
fom_kwargs (Mapping | None) – Optional keyword arguments for the fom_function. If a list is given, each set of kwargs will be applied to the corresponding function in fom_function. If a single set of kwargs is given, it will be applied to all functions.
db_dict (Mapping | None) – Optional initial DSP parameters database to be updated by the optimizers. If not provided, an empty dictionary will be used.
nan_val (float | list) – Value to assign to the figure-of-merit if it is NaN. Can be a single float applied to all optimizers or a list of floats corresponding to each optimizer.
n_iter (int) – Number of optimization iterations to perform. Each iteration consists of each optimizer proposing a new point, running the DSP with those parameters, evaluating the figure-of-merit, and updating the optimizers with the results.
- Returns:
out_param_dict – A dictionary containing the best parameter values found by each optimizer, formatted for updating the DSP parameters database.
out_results_list – A list of dictionaries containing the optimal results corresponding to the best parameters found by each optimizer.
- Return type:
Note
The optimisers need to be initialised with the parameters set, and initial values added.
- pygama.pargen.dsp_optimize.run_grid(tb_data, dsp_config, grid, fom_function, db_dict=None, verbosity=1, **fom_kwargs)¶
Extract a table of optimization values for a grid of DSP parameters The grid argument defines a list of parameters and values over which to run the DSP defined in dsp_config on tb_data. At each point, a scalar figure-of-merit is extracted.
Returns a N-dimensional ndarray of figure-of-merit values, where the array axes are in the order they appear in grid.
- Parameters:
tb_data – Input LH5 table of waveform data.
dsp_config – DSP processing chain configuration.
grid –
ParGriddefining the parameter sweep.fom_function – Callable that receives the output table and returns a scalar FOM.
db_dict – Optional base DSP parameter database dictionary.
verbosity – Verbosity level for the processing chain and FOM callable.
**fom_kwargs – Additional keyword arguments forwarded to fom_function.
- Returns:
grid_values – N-dimensional
ndarrayof FOM values; the M-th axis corresponds to the M-th dimension of grid.
- pygama.pargen.dsp_optimize.run_grid_point(tb_data, dsp_config, grids, fom_function, iii, db_dict=None, verbosity=1, fom_kwargs=None)¶
Run DSP and evaluate the figure-of-merit at a single grid point.
Sets the DSP parameter database from grids and iii, runs the DSP chain, and optionally evaluates one or more figure-of-merit functions.
- Parameters:
tb_data – Input LH5 table of waveform data.
dsp_config – DSP processing chain configuration dictionary.
grids – A single
ParGridor a list of them.fom_function – A single callable or list of callables; each is called as
fom_function(tb_out, verbosity, fom_kwargs[i]). WhenNone, the raw output table is returned.iii – List of per-grid index arrays, one per entry in grids.
db_dict – Optional base DSP parameter database to update.
verbosity – Verbosity level forwarded to the DSP chain and FOM callables.
fom_kwargs – List of keyword-argument dicts, one per FOM callable.
- Returns:
out – Dictionary with keys
"indexes"(list of grid-point tuples) and"results"(array of FOM values, or the output table when fom_function isNone).
- pygama.pargen.dsp_optimize.run_one_dsp(tb_data, dsp_config, db_dict=None, fom_function=None, verbosity=0, fom_kwargs=None)¶
Run one iteration of DSP on
tb_data`, based on the DSP specified in ``dsp_config.Optionally returns a value for optimization
- Parameters:
tb_data (Table) – Input LH5 table of waveform data. Typically a preselected subset is passed so optimisation does not run over the full dataset.
dsp_config (Mapping) – DSP processing chain configuration (see
build_processing_chain).db_dict (Mapping | None) – Optional DSP parameter database dictionary.
fom_function (Callable | None) – When provided, called on the output table to return a scalar figure-of-merit. Should accept a verbosity argument.
verbosity (int) – Verbosity level for the processing chain and FOM callable.
fom_kwargs – Optional keyword arguments forwarded to fom_function.
- Returns:
figure_of_merit – Scalar FOM value returned by fom_function when it is not
None.tb_out – Output LH5 table when fom_function is
None.
- Return type:
pygama.pargen.energy_cal module¶
routines for automatic calibration.
hpge_find_energy_peaks (Find uncalibrated E peaks whose E spacing matches the pattern in peaks_kev)
hpge_get_energy_peaks (Get uncalibrated E peaks at the energies of peaks_kev)
hpge_fit_energy_peaks (fits the energy peals)
hpge_E_calibration (main routine – finds and fits peaks specified)
- class pygama.pargen.energy_cal.FWHMLinear¶
Bases:
objectEnergy-resolution model FWHM(E) = sqrt(a + b·E).
- static bounds(ys)¶
Return parameter bounds
[(a_lo, a_hi), (b_lo, b_hi)].
- static func(x, a, b)¶
Evaluate
sqrt(a + b*x).
- static guess(xs, ys, y_errs)¶
Return initial parameter guess
[a, b].
- static string_func(input_param)¶
Return the expression string for the hit-dict format.
- class pygama.pargen.energy_cal.FWHMQuadratic¶
Bases:
objectEnergy-resolution model FWHM(E) = sqrt(a + b·E + c·E²).
- static bounds(ys)¶
Return parameter bounds
[(a_lo, a_hi), (b_lo, b_hi), (c_lo, c_hi)].
- static func(x, a, b, c)¶
Evaluate
sqrt(a + b*x + c*x**2).
- static guess(xs, ys, y_errs)¶
Return initial parameter guess
[a, b, c].
- static string_func(input_param)¶
Return the expression string for the hit-dict format.
- class pygama.pargen.energy_cal.HPGeCalibration(energy_param, glines, guess_kev, deg=1, uncal_is_int=False, fixed=None, debug_mode=False)¶
Bases:
objectCalibrate HPGe data to a set of known peaks.
Stores the calibration polynomial, peak locations, and intermediate fit results. Each method appends to a
resultsdictionary so that the full calibration history is preserved.- Parameters:
energy_param – Name of the uncalibrated energy parameter (used in logging).
glines – List of known peak energies in keV to fit to.
guess_kev (float) – Rough initial conversion factor from ADC to keV (must be positive).
deg (int) – Degree of the calibration polynomial
E_kev = poly(e_uncal).deg = 0is a simple scaling;deg = -1fixes the constant term. Follows thenumpy.polynomial.polynomialconvention (lowest to highest order).uncal_is_int (bool) – If
True, avoids picket-fence binning artefacts when e_uncal is integer-valued.fixed – Dictionary of fixed polynomial coefficient indices and their values.
debug_mode (bool) – If
True, exceptions are re-raised instead of being caught and logged.
- calibrate_prominent_peak(e_uncal, peak, peak_pars, allowed_p_val=1e-20, tail_weight=0, peak_param='mode', n_events=None)¶
Calibrate using a single prominent peak (degree-0 calibration).
Fits only the specified peak with
update_cal_pars=True, then re-fits all peaks withupdate_cal_pars=Falseto populate the full results, and fits both energy-resolution curves. Requiresdeg == 0(pure scaling calibration).- Parameters:
e_uncal – 1-D array of uncalibrated energy values.
peak – Known energy of the prominent peak in keV.
peak_pars – List of
(peak_kev, (kev_lo, kev_hi), func)tuples.allowed_p_val – Minimum chi-squared p-value for an accepted fit.
tail_weight – Weight for the tail-suppression prior.
peak_param – Parameter used to extract the peak position.
n_events – Maximum number of events to use;
Noneuses all.
- fit_calibrated_peaks(e_uncal, peak_pars)¶
Fit peaks using an existing calibration without updating the polynomial.
Calls
hpge_get_energy_peaks()andhpge_fit_energy_peaks()withupdate_cal_pars=False, then fits the energy resolution curves for bothFWHMLinearandFWHMQuadratic.- Parameters:
e_uncal – 1-D array of uncalibrated energy values.
peak_pars – List of
(peak_kev, (kev_lo, kev_hi), func)tuples.
- static fit_energy_res_curve(fwhm_func, fwhm_peaks, fwhms, dfwhms)¶
Fit an energy-resolution model to a set of measured FWHMs.
Performs a weighted least-squares fit of fwhm_func to fwhms at fwhm_peaks and evaluates the chi-squared p-value. Returns a NaN-filled result dictionary when the fit fails.
- Parameters:
fwhm_func – Energy resolution model class with
func,string_func,guess, andboundsstatic methods (e.g.FWHMLinear).fwhm_peaks – Peak energies in keV at which FWHMs were measured.
fwhms – Measured FWHM values in keV.
dfwhms – Uncertainties on fwhms.
- Returns:
results – Dictionary with keys
function,module,expression,parameters,uncertainties,cov,csqr, andp_val.
- full_calibration(e_uncal, peak_pars, allowed_p_val=1e-20, tail_weight=0, peak_param='mode', n_events=None)¶
Run the complete HPGe energy calibration pipeline.
Sequentially calls
hpge_find_energy_peaks(),hpge_get_energy_peaks(),hpge_fit_energy_peaks(), andget_energy_res_curve()(for bothFWHMLinearandFWHMQuadratic). If peaks are dropped during fitting, a second attempt is made with adjusted fit windows.- Parameters:
e_uncal – 1-D array of uncalibrated energy values.
peak_pars – List of
(peak_kev, (kev_lo, kev_hi), func)tuples specifying the fit window and model for each peak.allowed_p_val – Minimum chi-squared p-value for a peak fit to be accepted.
tail_weight – Weight for the tail-suppression prior in the fit cost function.
peak_param – Parameter used to extract the peak position (
"mode"or"mu").n_events – Maximum number of events to use;
Noneuses all.
- gen_pars_dict()¶
Generate the calibration expression dictionary for use in a hit-dict.
- Returns:
pars_dict – Dictionary with keys
"expression"(the polynomial string) and"parameters"(mapping coefficient names to their values).
- get_energy_res_curve(fwhm_func, interp_energy_kev=None)¶
Fit the energy-resolution curve and optionally interpolate to target energies.
Collects calibrated FWHMs from the most recent peak-fit results (excluding Doppler-broadened peaks at ~511, 1592.53, and 2103.5 keV), calls
fit_energy_res_curve(), and optionally callsinterpolate_energy_res(). The result is stored underfwhm_func.__name__in the most recent results entry.- Parameters:
fwhm_func – Energy resolution model class (e.g.
FWHMLinearorFWHMQuadratic).interp_energy_kev – Dict mapping labels to energies in keV, forwarded to
interpolate_energy_res().Noneskips interpolation.
- get_fwhms()¶
Compute and store calibrated FWHMs for all fitted peaks.
Reads uncalibrated FWHM values from the most recent peak-parameter results, converts them to keV using the derivative of the calibration polynomial, and writes
fwhm_in_kevandfwhm_err_in_kevback into each peak’s parameter dictionary.
- hpge_cal_energy_peak_tops(e_uncal, n_sigmas=20, peaks_kev=None, default_n_bins=50, n_events=None, allowed_p_val=0.01, update_cal_pars=True, fit_width_bound_kev=15)¶
Perform energy calibration for HPGe detector using peak fitting.
- Args:
- e_uncal (array-like):
Uncalibrated energy values.
- n_sigmas (float, optional):
Number of standard deviations to use for peak fitting range. Defaults to 1.2.
- peaks_kev (array-like, optional):
Known peak positions in keV. If not provided, uses self.peaks_kev. Defaults to None.
- default_n_bins (int, optional):
Number of bins for histogram. Defaults to 50.
- n_events (int, optional):
Number of events to consider for calibration. Defaults to None which uses all events.
- allowed_p_val (float, optional):
Maximum p-value for a fit to be considered valid. Defaults to 0.05.
- update_cal_pars (bool, optional):
Whether to update the calibration parameters. Defaults to True.
- hpge_find_energy_peaks(e_uncal, peaks_kev=None, n_sigma=5, etol_kev=None, bin_width_kev=1, erange=None, var_zero=1, update_cal_pars=True)¶
Find uncalibrated energy peaks whose energy spacing matches the pattern in peaks_kev.
- Parameters:
(array-like) (e_uncal) – Uncalibrated energy values.
(array-like (peaks_kev) – Pattern of energy peaks to match. If not provided, the peaks from the object’s attribute peaks_kev will be used.
optional) – Pattern of energy peaks to match. If not provided, the peaks from the object’s attribute peaks_kev will be used.
(float (var_zero) – Number of standard deviations above the mean to consider a peak significant. Default is 5.
optional) – Number of standard deviations above the mean to consider a peak significant. Default is 5.
(float – Tolerance in energy units for matching peaks. If not provided, it will be estimated based on the peak widths.
optional) – Tolerance in energy units for matching peaks. If not provided, it will be estimated based on the peak widths.
(float – Width of the energy bins for initial peak search. Default is 1 keV.
optional) – Width of the energy bins for initial peak search. Default is 1 keV.
(tuple (erange) – Range of uncalibrated energy values to consider. If not provided, the range will be determined based on the peaks.
optional) – Range of uncalibrated energy values to consider. If not provided, the range will be determined based on the peaks.
(float – Value to replace zero variance with. Default is 1.
optional) – Value to replace zero variance with. Default is 1.
(bool (update_cal_pars) – Whether to update the calibration parameters. Default is True.
optional) – Whether to update the calibration parameters. Default is True.
Returns – None
- hpge_fit_energy_peaks(e_uncal, peak_pars=None, peaks_kev=None, bin_width_kev=1, peak_param='mode', method='unbinned', n_events=None, allowed_p_val=0.01, tail_weight=0, update_cal_pars=True, use_bin_width_in_fit=True)¶
Fit the energy peaks specified using the given function.
- Parameters:
e_uncal – Unbinned energy data to be fit.
peaks_kev – Peak energies in keV to fit. Defaults to
self.peaks_kev.peak_pars – List of
(peak, range, func)tuples specifying the energy, fit range (keV), and fit function for each peak.bin_width_kev – Default bin width for fit-window histogramming. Default is 1 keV.
peak_param – Peak-location parameter to use in fitting. Default is
"mode".method – Fitting method:
"unbinned"(default) or"binned".n_events – Number of events to use for the unbinned fit.
allowed_p_val – Lower limit on the fit p-value.
tail_weight – Weight applied to the tail of the fit.
update_cal_pars – Whether to update the calibration parameters. Default is
True.
- Returns:
results_dict – Dictionary containing the fit results for each peak.
- Raises:
RuntimeError – If the fit fails.
Notes
This function fits the energy peaks specified using the given function. It calculates the range around each peak to fit, performs the fitting using either unbinned or binned method, and returns the fit results in a dictionary.
- hpge_get_energy_peaks(e_uncal, peaks_kev=None, n_sigma=3, etol_kev=5, var_zero=1, bin_width_kev=0.2, update_cal_pars=True, erange=None)¶
Get uncalibrated E peaks at the energies of peaks_kev
- Parameters:
e_uncal – Uncalibrated energy values.
peaks_kev – Energies of peaks to search for (in keV). Defaults to
self.peaks_kev.n_sigma – Threshold for detecting a peak in sigma (i.e. sqrt(var)). Default is 3.
etol_kev – Absolute tolerance in energy for matching peaks. Default is 5.
var_zero – Replacement value for zeros in
varto avoid divide-by-zero inhist/sqrt(var). Default is 1.bin_width_kev – Width of the energy bins for re-binning the histogram. Default is 0.2 keV.
update_cal_pars – Whether to update the calibration parameters. Default is
True.erange – Range of energy values to consider for the peak search. Determined automatically from
peaks_kevif not provided.
- Returns:
None
Notes
This method performs the following steps: 1. Re-bins the histogram in ~0.2 keV bins with updated energy scale parameters for peak-top fits. 2. Finds all local maxima in the histogram with significance greater than n_sigma. 3. Matches the calculated peak energies with the expected peak energies. 4. Removes duplicate peak matches. 5. Updates the input peaks, got peaks, and got peak locations in the results dictionary. 6. If update_cal_pars is True, calculates the updated calibration curve using the matched peak energies.
- static interpolate_energy_res(fwhm_func, fwhm_peaks, fwhm_results, interp_energy_kev=None, debug_mode=False)¶
Interpolate the fitted energy-resolution curve to target energies.
For each entry in interp_energy_kev, evaluates the fitted model at the requested energy and estimates the uncertainty via bootstrap resampling of the covariance matrix. Results are added to fwhm_results in-place and the same dict is returned.
- Parameters:
fwhm_func – Energy resolution model class (e.g.
FWHMLinear).fwhm_peaks – Peak energies used in the original fit (used for range checking).
fwhm_results – Result dictionary from
fit_energy_res_curve(); modified in-place.interp_energy_kev – Dict mapping labels to energies in keV at which to interpolate (e.g.
{"Qbb": 2039.0}).Noneskips interpolation.debug_mode – If
True, re-raises exceptions instead of silently returning NaN.
- Returns:
fwhm_results – The input dictionary with added interpolation entries.
- plot_cal_fit(data, figsize=(12, 8), fontsize=12, erange=(200, 2700))¶
Plot the calibration curve and residuals.
Produces a two-panel figure: the upper panel shows the ADC peak locations vs. their known keV energies with the fitted polynomial overlaid; the lower panel shows the residuals in keV.
- Parameters:
data – Unused; kept for API consistency with other plot methods.
figsize – Figure size
(width, height)in inches.fontsize – Font size for axis labels.
erange –
(low, high)energy range in keV for the x-axis.
- Returns:
fig –
matplotlib.figure.Figureobject.
- plot_cal_fit_with_errors(data, figsize=(10, 6), fontsize=12, erange=(200, 2700))¶
Plot the calibration curve with peak-fit uncertainties.
Like
plot_cal_fit()but shows error bars on the ADC peak locations derived from the peak-fit parameter uncertainties.- Parameters:
data – Unused; kept for API consistency with other plot methods.
figsize – Figure size
(width, height)in inches.fontsize – Font size for axis labels.
erange –
(low, high)energy range in keV for the x-axis.
- Returns:
fig –
matplotlib.figure.Figureobject.
- plot_eres_fit(data, erange=(200, 2700), figsize=(12, 8), fontsize=12)¶
Plot the energy-resolution curve fits.
Shows measured FWHMs with error bars alongside the fitted
FWHMLinearandFWHMQuadraticcurves, plus the interpolated resolution at Qββ.- Parameters:
data – Unused; kept for API consistency with other plot methods.
erange –
(low, high)energy range in keV for the x-axis.figsize – Figure size
(width, height)in inches.fontsize – Font size for axis labels.
- Returns:
fig –
matplotlib.figure.Figureobject.
- plot_fits(energies, figsize=(12, 8), fontsize=12, ncols=3, nrows=3, binning_kev=5)¶
Plot the peak-shape fits for all fitted calibration peaks.
Produces a grid of subplots, one per fitted peak, showing the histogrammed data and the best-fit model.
- Parameters:
energies – 1-D array of calibrated energy values.
figsize – Figure size
(width, height)in inches.fontsize – Font size for axis labels.
ncols – Number of subplot columns.
nrows – Number of subplot rows.
binning_kev – Histogram bin width in keV.
- Returns:
fig –
matplotlib.figure.Figureobject.
- update_results_dict(results_dict)¶
Store results_dict in
resultsunder the caller’s function name.If a key with that name already exists, a numeric suffix is appended to avoid overwriting previous results.
- Parameters:
results_dict – Dictionary of results to store.
- class pygama.pargen.energy_cal.TailPrior(data, model, tail_weight=0)¶
Bases:
objectIminuit cost-function mixin that adds a logarithmic tail-fraction prior.
When tail_weight > 0, the prior penalises small
htailvalues, discouraging the fitter from driving the tail fraction to zero.- Parameters:
data – Data array (unused in the prior term itself; kept for interface compatibility with iminuit cost functions).
model – Peak model callable.
tail_weight – Strength of the prior; larger values impose a stronger penalty on small
htail.
- _call(*pars)¶
Delegate to
__call__(iminuit internal interface).
- _value(*pars)¶
Delegate to
__call__(iminuit internal interface).
- errordef = 0.5¶
- verbose = 0¶
- pygama.pargen.energy_cal.average_counts_check(hist, bins, var, threshold=1)¶
- pygama.pargen.energy_cal.get_hpge_energy_bounds(func, parguess)¶
- pygama.pargen.energy_cal.get_hpge_energy_fixed(func)¶
Get the fixed indexes for fitting and mask for parameters based on the given function.
- Parameters:
func – The fit function for which the fixed indexes and mask are to be determined.
- Returns:
fixed – List of parameter names to hold fixed during fitting.
mask – Boolean array;
Truefor free parameters,Falsefor fixed ones.
- pygama.pargen.energy_cal.get_hpge_energy_peak_par_guess(energy, func, fit_range=None, bin_width=None, mode_guess=None)¶
Get parameter guesses for func fit to peak in hist
- Parameters:
energy – Array of energy values in the range around the peak.
func – The fit function to guess parameters for.
fit_range – Range
(lo, hi)around the peak; defaults to the full extent ofenergy.bin_width – Width of the histogram bins. Default is 1.
mode_guess – Initial guess for the mode (mu) parameter; estimated from the data if not provided.
- Returns:
guesses –
ValueViewof initial parameter guesses forfunc.
Notes
This function calculates parameter guesses for fitting a function to a peak in a histogram. It uses various methods to estimate the parameters based on the provided energy values and the selected function.
If the function is ‘gauss_on_step’, the following parameters will be estimated: - n_sig: Number of signal events in the peak. - mu: Mean of the peak. - sigma: Standard deviation of the peak. - n_bkg: Number of background events. - hstep: Height of the step between the peak and the background. - x_lo: Lower bound of the fit range. - x_hi: Upper bound of the fit range.
If the function is ‘hpge_peak’, the following parameters will be estimated: - n_sig: Number of signal events in the peak. - mu: Mean of the peak. - sigma: Standard deviation of the peak. - htail: Height of the tail component. - tau: Decay constant of the tail component. - n_bkg: Number of background events. - hstep: Height of the step between the peak and the background. - x_lo: Lower bound of the fit range. - x_hi: Upper bound of the fit range.
If the provided function is not implemented, an error will be raised.
Examples
>>> energy = [1, 2, 3, 4, 5] >>> func = pgf.gauss_on_step >>> fit_range = (2, 4) >>> bin_width = 0.5 >>> mode_guess = 3.5 >>> get_hpge_energy_peak_par_guess(energy, func, fit_range, bin_width, mode_guess) {'n_sig': 3, 'mu': 3.5, 'sigma': 0.5, 'n_bkg': 2, 'hstep': 0.5, 'x_lo': 2, 'x_hi': 4}
- pygama.pargen.energy_cal.hpge_fit_energy_cal_func(mus, mu_vars, energies_kev, energy_scale_pars, deg=0, fixed=None)¶
Find best fit of E = poly(mus +/- sqrt(mu_vars)) This is an inversion of hpge_fit_energy_scale. E uncertainties are computed from mu_vars / dmu/dE where mu = poly(E) is the E_scale function
- Parameters:
mus – Uncalibrated energies.
mu_vars – Variances in
mus.energies_kev – Reference energies to fit to, in keV.
energy_scale_pars – Parameters from the energy-scale fit (keV → ADC) used for computing uncertainties.
deg – Degree for the calibration polynomial.
deg=0fits a simple scaling; otherwise follows thenp.polyfit()convention.fixed – Dict whose keys are polynomial-parameter indices to fix and values are the fixed values (
Nonefixes at the current guess value).
- Returns:
pars – Best-fit parameters following the
np.polyfit()convention.cov – Covariance matrix for the best-fit parameters.
- pygama.pargen.energy_cal.hpge_fit_energy_peak_tops(hist, bins, var, peak_locs, n_to_fit=7, cost_func='Least Squares', inflate_errors=False, gof_method='var', debug_mode=False)¶
Fit gaussians to the tops of peaks
- Parameters:
hist – Histogram of uncalibrated energies, see
pgh.get_hist().bins – Bin edges corresponding to
hist.var – Variances of each histogram bin.
peak_locs – Locations of peaks in
hist; must be accurate to within ±2*n_to_fitbins.n_to_fit – Number of histogram bins near the peak top to include in the Gaussian fit.
cost_func – Cost function flag passed to
gauss_mode_width_max().inflate_errors – Error inflation flag passed to
gauss_mode_width_max().gof_method – Goodness-of-fit method flag passed to
gauss_mode_width_max().
- Returns:
pars_list – List of best-fit parameter arrays
(mode, sigma, max)for each peak-top fit.cov_list – List of covariance matrices corresponding to each entry in
pars_list.
- pygama.pargen.energy_cal.hpge_fit_energy_scale(mus, mu_vars, energies_kev, deg=0, fixed=None)¶
Find best fit of poly(E) = mus +/- sqrt(mu_vars) Compare to hpge_fit_energy_cal_func which fits for E = poly(mu)
- Parameters:
mus – Uncalibrated energies.
mu_vars – Variances in
mus.energies_kev – Reference energies to fit to, in keV.
deg – Degree for the energy scale fit.
deg=0fits a simple scalingmu = scale * E; otherwise follows thenp.polyfit()convention.fixed – Dict whose keys are polynomial-parameter indices to fix and values are the fixed values (
Nonefixes at the current guess value).
- Returns:
pars – Best-fit parameters following the
np.polyfit()convention.errs – Parameter errors (square root of diagonal covariance entries).
cov – Covariance matrix for the best-fit parameters.
- pygama.pargen.energy_cal.poly_match(xx, yy, deg=-1, rtol=1e-05, atol=1e-08, fixed=None)¶
Find the polynomial function best matching pol(xx) = yy
Finds the poly fit of xx to yy that obtains the most matches between pol(xx) and yy in the np.isclose() sense. If multiple fits give the same number of matches, the fit with the best gof is used, where gof is computed only among the matches. Assumes that the relationship between xx and yy is monotonic
- Parameters:
xx – Domain data array. Must be sorted ascending; must satisfy
len(xx) >= len(yy).yy – Range data array: values to which
poly(xx)will be compared. Must be sorted ascending; must satisfylen(yy) > max(2, deg+2).deg – Degree of the polynomial.
deg=0fits a simple scalingscale * xx = yy;deg=-1fits a simple shiftxx + shift = yy; otherwise equivalent to thedegargument ofnp.polyfit().rtol – Relative tolerance passed to
np.isclose().atol – Absolute tolerance passed to
np.isclose()(same units asyy).
- Returns:
pars – Best-fit parameters of
poly(xx) = yyinnp.polyfit()convention, orNonewhen inputs are invalid.i_matches – List of indices in
xxfor the matched values in the best match.
- pygama.pargen.energy_cal.poly_wrapper(x, *pars)¶
- pygama.pargen.energy_cal.sum_bins(hist, bins, var, threshold=5)¶
- pygama.pargen.energy_cal.unbinned_staged_energy_fit(energy, func, gof_range=None, fit_range=None, guess=None, guess_func=<function get_hpge_energy_peak_par_guess>, bounds_func=<function get_hpge_energy_bounds>, fixed_func=<function get_hpge_energy_fixed>, guess_kwargs=None, bounds_kwargs=None, fixed_kwargs=None, tol=None, tail_weight=0, allow_tail_drop=True, bin_width=None, lock_guess=False, p_val_threshold=1e-19, display=0)¶
Unbinned fit to energy. This is different to the default fitting as it will try different fitting methods and choose the best. This is necessary for the lower statistics.
pygama.pargen.energy_optimisation module¶
This module contains the functions for performing the energy optimisation. This happens in 2 steps, firstly a grid search is performed on each peak separately using the optimiser, then the resulting grids are interpolated to provide the best energy resolution at Qbb
- pygama.pargen.energy_optimisation.fom_fwhm_no_alpha_sweep(tb_in, kwarg_dict, ctc_param=None, alpha=0, idxs=None, frac_max=0.5, kev=True, display=0)¶
Figure-of-merit: FWHM at a fixed (or pre-computed) alpha, no sweep.
Applies a single drift-time correction with the given alpha and fits the peak, returning a comprehensive set of fit quality metrics. Used when the optimal alpha is already known (e.g. from a prior
fom_fwhm_with_alpha_fit()call) or when no charge-trapping correction is desired.- Parameters:
tb_in – LH5 table containing the energy and optional drift-time columns.
kwarg_dict – Per-peak fitting options with at minimum the keys
parameter(energy column name),func,peak, andkev_width. Optional keys:alpha(overrides the alpha argument),ctc_param(drift-time column name).ctc_param – Name of the charge-trapping correction column; overridden by
kwarg_dict["ctc_param"]if present.alpha – Charge-trapping correction coefficient; overridden by
kwarg_dict["alpha"]if present.idxs – Optional index array to select a subset of events.
frac_max – Fractional height used to define the FWHM.
kev – If
True, return the FWHM in keV rather than ADC units.display – Verbosity level; values > 0 produce diagnostic plots.
- Returns:
out_dict – Dictionary with keys
fwhm,fwhm_o_max,fwhm_err,fwhm_o_max_err,chisquare,n_sig,n_sig_err,mu,mu_err, andfit_pars. All values arenp.nanif the input energies contain NaNs.
- pygama.pargen.energy_optimisation.fom_fwhm_with_alpha_fit(tb_in, kwarg_dict, ctc_parameter, nsteps=11, idxs=None, frac_max=0.2, display=0)¶
Figure-of-merit: FWHM minimised over a sweep of charge-trapping correction values.
Scans nsteps values of the charge-trapping coefficient alpha between 0 and 3.5x10^-6, fitting the peak at each step via
get_peak_fwhm_with_dt_corr(). A degree-4 polynomial is fit to the valid FWHM/max-ratio values to locate the optimal alpha, and the peak is re-fit at that alpha to obtain the final FWHM in keV. An early termination heuristic stops the sweep when the FWHM curve is clearly rising.- Parameters:
tb_in – LH5 table containing the energy and drift-time columns.
kwarg_dict – Per-peak fitting options with at minimum the keys
parameter(energy column name),func(peak shape),peak(keV), andkev_width(fit window half-widths). Optional keybin_widthsets the histogram bin width (default 1 ADC).ctc_parameter – Name of the charge-trapping correction parameter column in tb_in.
nsteps – Number of alpha values to scan.
idxs – Optional boolean or integer index array to select a subset of events.
frac_max – Fractional height used to define the final FWHM.
display – Verbosity level; values > 0 produce diagnostic plots.
- Returns:
out_dict – Dictionary with keys
fwhm,fwhm_err,alpha,alpha_err,chisquare,n_sig, andn_sig_err. All values arenp.nan(andalphais 0) on failure.
- pygama.pargen.energy_optimisation.fom_interpolate_energy_res_with_single_peak_alpha_sweep(data, kwarg_dict, display=0)¶
Figure-of-merit: energy resolution interpolated to a target energy using a multi-peak sweep with a shared alpha from the highest-energy peak.
The charge-trapping correction parameter alpha is determined from the last (highest-energy) peak via
fom_fwhm_with_alpha_fit(); this alpha is then applied to all remaining peaks viafom_fwhm_no_alpha_sweep(). The resulting FWHM vs. energy curve is fit with fwhm_func and interpolated to each energy in interp_energy.- Parameters:
data – DataFrame containing energy and correction parameters for all events.
kwarg_dict –
Dictionary with keys:
peaks_kev- list of peak energies in keV, ordered from low to # noqa: RUF002 high. The last entry is used for the alpha sweep.idx_list- list of event-index arrays, one per peak. # noqa: RUF002ctc_param- name of the charge-trapping correction parameter. # noqa: RUF002peak_dicts- list of per-peak fitting dictionaries. # noqa: RUF002interp_energy(optional, default ``{“Qbb”: 2039}``) - dict # noqa: RUF002 mapping energy label to keV value for interpolation.fwhm_func(optional, defaultpgc.FWHMLinear) - FWHM # noqa: RUF002 curve model used for the energy-resolution fit.frac_max(optional, default 0.2) - fraction of peak maximum # noqa: RUF002 used to define fit range.
display – Verbosity / plotting level passed through to the underlying fits.
- Returns:
results – Result dictionary containing interpolated FWHM value(s), e.g.
{"Qbb_fwhm": ..., "Qbb_fwhm_err": ..., "alpha": ..., "peaks": ..., "fwhms": ..., "fwhm_errs": ..., "n_sig": ..., "n_sig_err": ...}. Returns(nan, nan, nan)if fewer than two valid FWHM values are available.- Return type:
- pygama.pargen.energy_optimisation.fom_single_peak_alpha_sweep(data, kwarg_dict, display=0)¶
Figure-of-merit wrapper: FWHM with alpha (charge-trapping correction) sweep for a single calibration peak.
Thin adapter around
fom_fwhm_with_alpha_fit()that unpacks the standardisedkwarg_dictinterface expected by the optimisation framework.- Parameters:
data – DataFrame containing the energy and charge-trapping correction parameters for all events.
kwarg_dict –
Dictionary with keys:
idx_list- list of event-index arrays, one per peak. Only the # noqa: RUF002 first entry (idx_list[0]) is used.ctc_param- name of the charge-trapping correction parameter. # noqa: RUF002peak_dicts- list of per-peak fitting dictionaries. Only the # noqa: RUF002 first entry is used.frac_max(optional, default 0.2) - fraction of the peak # noqa: RUF002 maximum used to define the fit range.
display – Verbosity / plotting level passed through to the underlying fit.
- Returns:
out_dict – Result dictionary from
fom_fwhm_with_alpha_fit()containing at minimumfwhm,fwhm_err,alpha,n_sig, andn_sig_err.- Return type:
- pygama.pargen.energy_optimisation.get_peak_fwhm_with_dt_corr(energies, alpha, dt, func, peak, kev_width, guess=None, kev=False, frac_max=0.5, bin_width=1, allow_tail_drop=False, display=0)¶
Apply a drift-time correction and fit a peak, returning FWHM and fit quality.
Computes
ct_energy = energy + alpha * dt * energy, then performs an unbinned staged fit of func to the corrected spectrum inside a window around the peak. Bootstrap resampling is used to estimate the uncertainty on the FWHM. All return values arenp.nan/Nonewhen the fit fails.- Parameters:
energies – 1-D array of raw (uncorrected) energy values.
alpha – Charge-trapping correction coefficient.
dt – 1-D array of drift-time values, same length as energies.
func – Peak-shape distribution to fit.
peak – Known peak energy in keV (used for ADC-to-keV conversion).
kev_width –
(low_side, high_side)half-widths in keV that define the fit window relative to the peak centroid.guess – Optional initial parameter guess; forwarded to
unbinned_staged_energy_fit().kev – If
True, convert the returned FWHM from ADC to keV.frac_max – Fractional height at which to evaluate the peak width (default 0.5 gives the FWHM).
bin_width – Histogram bin width in ADC counts.
allow_tail_drop – Passed through to the staged fit; allows the tail fraction to drop to zero.
display – Verbosity level; values > 0 produce diagnostic plots.
- Returns:
fwhm – Full-width at frac_max of the maximum in keV (or ADC if kev is
False).fwhm_o_max – Ratio of FWHM to peak maximum (shape quality metric).
fwhm_err – Bootstrap uncertainty on fwhm.
fwhm_o_max_err – Bootstrap uncertainty on fwhm_o_max.
chisqr –
(chi2, ndof)reduced chi-squared tuple from the staged fit.n_sig – Fitted number of signal events.
n_sig_err – Uncertainty on n_sig.
mu – Fitted peak centroid in ADC.
mu_err – Uncertainty on mu.
energy_pars – Full best-fit parameter values from the staged fit.
- pygama.pargen.energy_optimisation.simple_guess(energy, func, fit_range=None, bin_width=None)¶
Generate a simple initial parameter guess for a peak fit.
Estimates the peak centroid, width, signal and background counts, and (for HPGe-style models) tail parameters from the histogram of energy. The bin width is chosen adaptively using an inter-quartile-range rule when not provided explicitly.
- Parameters:
energy – 1-D array of energy values.
func – Distribution function to guess for; currently supports
hpge_peakandgauss_on_step.fit_range –
(low, high)energy window. Defaults to the full range of energy.bin_width – Histogram bin width. When
Nonean optimal width is estimated from the data.
- Returns:
parguess –
iminuit.Valuesof initial parameter values, or the NaN-filled fallback fromreturn_nans()if func is not supported.
pygama.pargen.lq_cal module¶
Calibration routines for the Late Charge (LQ) parameter in HPGe detectors.
LQ is computed from the ratio of the late-charge integral to the total charge and is sensitive to multi-site events. This module provides functions to
determine the fit range and histogram the LQ distribution,
fit the distribution with a Gaussian model,
perform a drift-time correction for the time-dependent LQ shift,
calculate the LQ cut value from the DEP (double-escape peak) survival fraction,
compute survival fractions as a function of LQ cut position,
produce diagnostic plots.
The main entry point is the LQCal class.
- class pygama.pargen.lq_cal.LQCal(cal_dicts, cal_energy_param, dt_param, eres_func, cdf=<pygama.math.functions.gauss.GaussianGen object>, selection_string='is_valid_cal&is_not_pulser', debug_mode=False)¶
Bases:
objectA class for calibrating the LQ parameter and determining the LQ cut value
- Parameters:
cal_dicts (dict) – Hit-level calibration dictionary of expressions/parameters to apply to the data. May be keyed by run timestamp.
cal_energy_param (str) – Name of the calibrated energy parameter.
dt_param (str) – Name of the drift-time parameter.
eres_func (callable) – Callable that returns the energy resolution (FWHM) in keV for a given energy.
cdf (callable) – CDF callable used for the binned Gaussian fits.
selection_string (str) – Boolean expression selecting valid, non-pulser events.
debug_mode – If
True, exceptions are re-raised instead of being caught and logged.
- calibrate(df, initial_lq_param)¶
Run the full LQ calibration pipeline.
Executes the three calibration steps in sequence:
lq_timecorr()— normalise by the time-varying DEP mean.drift_time_correction()— remove the linear drift-time dependence.get_cut_lq_dep()— fit the normalised DEP distribution and set the cut value.
Survival fractions at the DEP (1592.5 keV), Qββ (2039 keV), 2103.53 keV, and 2614.5 keV are then computed and stored in
low_side_sfandlow_side_peak_dfs.- Parameters:
df – DataFrame modified in-place; must contain initial_lq_param and all columns referenced by
cal_energy_param,dt_param, andselection_string.initial_lq_param – Name of the raw LQ parameter column in df.
- drift_time_correction(df, lq_param, cal_energy_param, display=0)¶
Remove the linear drift-time dependence from the LQ distribution.
Fits a degree-1 polynomial to LQ vs. drift time for DEP events in a 6 keV window around 1592.5 keV. Subtracts the fitted slope and intercept to produce a corrected
LQ_Correctedcolumn centred at zero, and writes the calibration expression intocal_dicts.- Parameters:
df (pd.DataFrame()) – DataFrame modified in-place; must contain lq_param, the calibrated energy column, and the drift-time column.
lq_param – Name of the LQ parameter column to correct.
cal_energy_param (str) – Name of the calibrated energy column.
display (int) – Verbosity level (currently unused).
- get_cut_lq_dep(df, lq_param, cal_energy_param)¶
Determine the LQ cut value from the DEP distribution.
Fits a Gaussian to the sideband-subtracted LQ distribution at the DEP (1592.5 keV). The LQ values are normalised by the fitted σ to produce
LQ_Classifier, and a fixed cut at 3 σ is applied (LQ_Cut). Both columns are added to df and the corresponding calibration expressions are written intocal_dicts.
- lq_timecorr(df, lq_param, output_name='LQ_Timecorr', display=0)¶
Normalise LQ by the time-varying DEP mean.
Fits the LQ distribution at the DEP (1592.5 keV) for each
run_timestampgroup (or globally when no timestamp column is present) and divides lq_param by the resulting weighted-average mean. The normalised column is stored as output_name in df and the corresponding calibration expression is written intocal_dicts.- Parameters:
df – DataFrame modified in-place; must contain lq_param and the calibrated energy column. If a
run_timestampcolumn is present, the correction is applied per timestamp.lq_param – Name of the raw LQ parameter column.
output_name – Name of the output time-corrected LQ column.
display – Verbosity level (currently unused).
- update_cal_dicts(update_dict)¶
Merge new entries into the calibration dictionary.
If
cal_dictsis keyed by run timestamps, each timestamp’s sub-dict is updated individually, using update_dict directly as a fallback when a timestamp is absent. Otherwise update_dict is merged directly.- Parameters:
update_dict – Dictionary of new calibration entries to merge.
- pygama.pargen.lq_cal.binned_lq_fit(df, lq_param, cal_energy_param, peak, _cdf=<pygama.math.functions.gauss.GaussianGen object>, sidebands=True)¶
- Function for fitting a distribution of LQ values within a specified
energy peak. Fits a gaussian to the distribution
- Parameters:
df (DataFrame) – DataFrame containing the LQ parameter and calibrated energy.
lq_param (str) – Name of the LQ parameter column.
cal_energy_param (str) – Name of the calibrated energy column.
peak (float) – Energy value in keV of the peak whose LQ distribution will be fit.
cdf – CDF callable used for the binned fit.
sidebands (bool) – Whether to apply sideband subtraction when building the histogram.
- Returns:
values – Best-fit parameter values
(mu, sigma)from the Gaussian fit.errors – Parameter uncertainties corresponding to values.
hist – Histogram counts used for the binned fit.
bins – Bin-edge array used for the binned fit.
- pygama.pargen.lq_cal.calculate_time_means(df, lq_param, cal_energy_param, peak, sidebands=True)¶
Compute the arithmetic mean and standard deviation of LQ at a peak.
Selects events in a ±8/+5 keV window around peak, restricts to the ±2.5σ fit range returned by
get_fit_range(), and returns the sample mean and standard deviation together with their statistical uncertainties.- Parameters:
df (DataFrame) – DataFrame containing the LQ and calibrated energy columns.
lq_param (str) – Name of the LQ parameter column.
cal_energy_param (str) – Name of the calibrated energy column.
peak (float) – Peak energy in keV.
sidebands (bool) – Unused; kept for signature consistency with
binned_lq_fit().
- Returns:
pars – Dictionary
{"mu": mean, "sigma": std}of the LQ distribution.errors – Dictionary
{"mu": mean_err, "sigma": sigma_err}of the statistical uncertainties.
- pygama.pargen.lq_cal.fit_time_means(tstamps, means, reses)¶
Compute a running weighted-average LQ mean across run timestamps.
Accumulates valid (non-NaN) mean/resolution pairs and stores the weighted-average mean (weighted by inverse resolution) for every accumulated timestamp. Timestamps with NaN values in either means or reses are assigned NaN in the output.
- Parameters:
tstamps – Ordered sequence of run-timestamp strings.
means – Array of per-timestamp LQ mean values; may contain NaN.
reses – Array of per-timestamp LQ resolutions (sigma/mu) used as inverse weights; may contain NaN.
- Returns:
out_dict – Dictionary mapping each timestamp to the cumulative weighted- average LQ mean up to that point, or
np.nanfor invalid entries.
- pygama.pargen.lq_cal.get_fit_range(lq)¶
Determine a ±2.5σ fit range around the mode of an LQ distribution.
Uses a 100-bin histogram over the 1st–95th percentile range to locate the peak centroid and estimate the width, then returns a ±2.5σ window suitable for a subsequent Gaussian fit.
- pygama.pargen.lq_cal.get_lq_hist(df, lq_param, cal_energy_param, peak, sidebands=True)¶
Histogram the LQ distribution for events near a calibration peak.
Optionally performs sideband subtraction to isolate peak-associated events from the Compton continuum. The peak window is
(peak - 8, peak + 5)keV and the sideband window is(peak + 7, peak + 20)keV.- Parameters:
df (pd.DataFrame()) – DataFrame containing both the LQ parameter and calibrated energy.
lq_param (str) – Name of the LQ parameter column.
cal_energy_param (str) – Name of the calibrated energy column.
peak (float) – Peak energy in keV around which to select events.
sidebands (bool) – If
True(default), subtract a sideband histogram to remove the Compton background contribution.
- Returns:
hist – Counts array (sideband-subtracted when sidebands is
True).bins – Bin-edge array.
var – Per-bin variance (combined Poisson uncertainty).
- pygama.pargen.lq_cal.plot_classifier(lq_class, data, lq_param='LQ_Classifier', xrange=(800, 3000), yrange=(-10, 30), xn_bins=700, yn_bins=500, figsize=(12, 8), fontsize=12)¶
- Return type:
figure
- pygama.pargen.lq_cal.plot_drift_time_correction(lq_class, data, lq_param='LQ_Timecorr', figsize=(12, 8), fontsize=12)¶
Plots a 2D histogram of LQ versus effective drift time in a 6 keV window around the DEP. Additionally plots the fit results for the drift time correction.
- Return type:
figure
- pygama.pargen.lq_cal.plot_lq_cut_fit(lq_class, data, figsize=(12, 8), fontsize=12)¶
Plots the final histogram of LQ values for events in the DEP, and the fit results used for determining the cut value
- Return type:
figure
- pygama.pargen.lq_cal.plot_lq_mean_time(lq_class, data, lq_param='LQ_Timecorr', figsize=(12, 8), fontsize=12)¶
Plots the mean LQ value calculated for each given timestamp
- Return type:
figure
- pygama.pargen.lq_cal.plot_sf_vs_energy(lq_class, data, xrange=(900, 3000), n_bins=701, figsize=(12, 8), fontsize=12)¶
Plots the survival fraction as a function of energy
- Return type:
figure
- pygama.pargen.lq_cal.plot_spectra(lq_class, data, xrange=(900, 3000), n_bins=2101, xrange_inset=(1580, 1640), n_bins_inset=200, figsize=(12, 8), fontsize=12)¶
Plots a 2D histogram of the LQ classifier vs calibrated energy
- Return type:
figure
- pygama.pargen.lq_cal.plot_survival_fraction_curves(lq_class, data, figsize=(12, 8), fontsize=12)¶
Plots the survival fraction curves as a function of LQ cut values for every peak of interest
- Return type:
figure
pygama.pargen.noise_optimization module¶
This module contains the functions for performing the filter optimisation. This happens with a grid search performed on ENC peak.
- pygama.pargen.noise_optimization.calculate_spread(energies, percentile_low, percentile_high, n_samples)¶
Estimate the inter-percentile spread of an energy distribution via bootstrapping.
Repeatedly resamples the input array with replacement and computes the difference between percentile_high and percentile_low for each bootstrap replicate. The mean and standard error of the bootstrap distribution are returned as the figure-of-merit and its uncertainty.
- Parameters:
energies – 1-D array of energy values.
percentile_low – Lower percentile (0-100) used to define the spread.
percentile_high – Upper percentile (0-100) used to define the spread.
n_samples – Number of bootstrap resamples to draw.
- Returns:
results – Dictionary with keys:
fom- mean inter-percentile spread across bootstrap samples.fom_err- standard error of the mean spread.
- Return type:
- pygama.pargen.noise_optimization.noise_optimization(tb_data, dsp_proc_chain, par_dsp, opt_dict, _lh5_path, display=0)¶
Calculate the optimal noise-filter parameter.
- Parameters:
tb_data (Table) – Raw LH5 table to run the DSP macro on.
dsp_proc_chain (dict) – Minimal DSP config dictionary.
par_dsp (dict) – Dictionary with default DSP parameters.
opt_dict (dict) – Dictionary with parameters for the optimisation.
_lh5_path (str) – Name of the channel to process (LH5 group name in raw files).
display (int) – Plot verbosity level.
- Returns:
res_dict – Dictionary of optimisation results.
- Return type:
- pygama.pargen.noise_optimization.simple_gaussian_fit(energies, dx=1, sigma_thr=4, allowed_p_val=1e-20)¶
Fit a Gaussian-on-uniform model to an ENC (noise) peak and return the FWHM.
An initial guess from
simple_gaussian_guess()is used to define a sigma-clipped fit range. The unbinned fit is accepted if the covariance is well-defined and the goodness-of-fit p-value exceeds allowed_p_val; otherwise the guess values are returned with zero uncertainties.- Parameters:
energies – 1-D array of energy (or ENC) values.
dx – Histogram bin width used for initial guessing and goodness-of-fit evaluation.
sigma_thr – Number of sigma around the estimated mean used to clip the fit range.
allowed_p_val – Minimum chi-squared p-value for the fit to be considered successful.
- Returns:
results – Result dictionary with keys:
pars- fit parameter array.errors- fit parameter uncertainties.covariance- parameter covariance matrix.mu- peak centre.mu_err- uncertainty on the peak centre.fom- FWHM of the peak.fom_err- uncertainty on the FWHM.chisq- reduced chi-squared of the fit.p_val- goodness-of-fit p-value.
- Return type:
- pygama.pargen.noise_optimization.simple_gaussian_guess(hist, bins, func, toll=0.2)¶
Generate an initial parameter guess and bounds for a Gaussian-on-uniform fit.
Estimates the peak centre from the histogram maximum and the width from the FWHM of the peak. The signal count is estimated by integrating the histogram over a +-2 sigma window around the peak.
- Parameters:
hist – 1-D array of histogram counts.
bins – 1-D array of bin edges (length
len(hist) + 1).func – Probability distribution object whose parameter names are used to construct the output dictionaries (must expose
mu,sigma, andn_sigparameters).toll – Fractional tolerance used to set symmetric bounds around each guess value.
- Returns:
guess – Initial parameter values:
{"mu": ..., "sigma": ..., "n_sig": ...}.bounds – Allowed parameter ranges:
{"mu": (lo, hi), "sigma": (lo, hi), "n_sig": (lo, hi)}.
- Return type:
pygama.pargen.pz_correct module¶
This module is for extracting the pole zero constants from the decay tail
- class pygama.pargen.pz_correct.PZCorrect(dsp_config, wf_field, debug_mode=False)¶
Bases:
objectExtract pole-zero correction constants from HPGe waveform data.
Provides methods for determining single- and double-pole-zero (DPZ) decay constants by fitting the exponential tail of raw waveforms. Results are accumulated in
output_dictin a format suitable for direct use as a DSP parameter database.- Parameters:
dsp_config – DSP processing chain configuration dictionary.
wf_field – Name of the waveform field in the input
lgdo.Table.debug_mode – If True, additional diagnostic information is retained.
- get_dpz_decay_constants(tb_data, percent_tau1_fit=0.1, percent_tau2_fit=0.2, offset_from_wf_max=10, superpulse_bl_idx=25, superpulse_window_width=13, display=0)¶
Gets values for the DPZ time constants in 3 stages:
Perform a linear fit to the start and end of the decaying tail of a superpulse
Use those initial guesses to seed a LSQ fit to a DPZ model of the sum of two decaying exponentials
Use the results of the model fit as initial guesses in a DSP routine that optimizes the flatness of the decaying tail
The first step in this process is to generate a superpulse from high-energy waveforms.
- Parameters:
tb_data (Table) – Table containing high-energy event waveforms and daqenergy. Pulser events must have already been removed from this table
percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.
percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.
offset_from_wf_max (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.
superpulse_bl_idx (int) – The index at which to stop when computing the mean value of a baseline, used for DPZ
superpulse_window_width (int) – The window of acceptance for tp100s while selecting waveforms to make a superpulse, with a center set at the median tp100 value. If a tp100 falls outside this window, ignore this waveform for the superpulse
display (int) – An integer. If greater than 1, plots and shows the attempts to fit the long and short time constants. If greater than 0, saves the plot to a dictionary.
- Returns:
tau_dict – dictionary of form {‘dpz’: {‘tau1’: decay_constant1, ‘tau2’: decay_constant2, ‘frac’: fraction}}
out_plot_dict – A dictionary containing monitoring plots
- Return type:
Notes
tau1 is the shorter time constant, tau2 is the longer, and frac is the amount of the larger time constant present in the sum of the two exponentials
- get_single_decay_constant(tb_data, slope_param='tail_slope', display=0)¶
Determine the single pole-zero decay constant from the tail slope.
Runs the DSP chain on tb_data, extracts the slope_param column, estimates the mode and standard deviation of the slope distribution, and converts the modal value to a time constant tau = -1/mode. The # noqa: RUF002 result is stored in
output_dictunder the"pz"key.- Parameters:
tb_data (Table) – LH5 table containing the raw waveforms and any parameters needed by
dsp_config.slope_param – Name of the tail-slope output column produced by the DSP chain.
display – If > 0, return a plot dictionary; otherwise return
None.
- Returns:
out_plot_dict – Dictionary of diagnostic figures (empty if display > 0), or
Nonewhen display ≤ 0.
- plot_slopes(tb_data, slope_field, with_correction=False, display=0, figsize=(8, 6), fontsize=8)¶
Plot a histogram of waveform tail slopes before or after PZ correction.
Useful for visually assessing whether the fitted decay constant successfully centres the slope distribution at zero.
- Parameters:
tb_data – Raw waveform table (
lgdo.Table) to process.slope_field – Name of the tail-slope field in the DSP output table.
with_correction – If True, apply the current
output_dictPZ parameters before computing slopes. If False, run the DSP chain without any correction.display – If > 1, call
plt.show(); if ≤ 1, close the figure after saving it to the returned dictionary.figsize – Figure size
(width, height)in inches.fontsize – Base font size for the plot.
- Returns:
out_plot_dict – Dictionary
{"slopes": fig}or{"corrected_slope": fig}containing the matplotlib figure.- Return type:
- plot_waveforms_after_correction(tb_data, wf_field, xlim=(0, 1024), ylim=None, norm_param=None, display=0, figsize=(8, 6), fontsize=8, n_waveforms=100, downsample=1)¶
Plot a random selection of waveforms after applying the PZ correction.
Runs the DSP chain with the current
output_dictparameters so that the corrected waveform field (wf_field) reflects the fitted pole-zero constants.- Parameters:
tb_data – Raw waveform table (
lgdo.Table) to process.wf_field – Name of the corrected waveform field in the DSP output table.
xlim – x-axis limits
(lo, hi)in samples.ylim – y-axis limits
(lo, hi)in ADU, or None for auto-scaling.norm_param – If given, each waveform is divided by the value of this DSP output parameter so that traces are displayed on a normalised scale.
display – If > 1, call
plt.show(); if ≤ 1, close the figure after saving it to the returned dictionary.figsize – Figure size
(width, height)in inches.fontsize – Base font size for the plot.
n_waveforms – Number of waveforms to draw (sampled randomly).
downsample – Plot every downsample-th sample to reduce rendering load.
- Returns:
plot_dict – Dictionary
{"waveforms": fig}containing the matplotlib figure.- Return type:
- pygama.pargen.pz_correct.dpz_model(ts, amp, tau1, tau2, f2)¶
Models the double pole zero function as A*[(1-f2)*exp(-t/tau1) + f2*exp(-t/tau2)], this is the expected shape of an HPGe waveform. Used in performing fits in
dpz_model_fit()- Parameters:
- Return type:
array
- pygama.pargen.pz_correct.dpz_model_fit(waveform, percent_tau1_fit, percent_tau2_fit, idx_shift, plot)¶
- Parameters:
waveform (array) – An array containing waveform data that has been baseline subtracted
percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.
percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.
idx_shift (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.
plot (int) – An integer, that if greater than 1 plots and shows the fit results, if greater than 0 saves the plots to a dictionary
- Returns:
tau1 – A guess of the shorter double pole zero time constant
tau2 – A guess of the longer double pole zero time constant
frac2 – A guess of the fraction in the double pole zero of the longer time constant
out_plot_dict – A dictionary containing the output plots if requested
- Return type:
Notes
Extracts the double pole zero parameters by fitting to an analytic model, utilizing the best-guess from linear fitting. Fits to the function f(t) = A*[(1-f2)*exp(-t/tau1)+f2*exp(-t/tau2)] It fits from the maximum of the waveform onwards to the end of the tail. The waveform must be baseline subtracted in order to get the best fit results
- pygama.pargen.pz_correct.linear_dpz_fit(waveform, percent_tau1_fit, percent_tau2_fit, idx_shift, plot)¶
- Parameters:
waveform (array) – An array containing waveform data
percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.
percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.
idx_shift (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.
plot (int) – An integer, if greater than 1 plots and shows the fit results, if greater than 0 saves the plot to a dictionary.
- Returns:
tau1 – A guess of the shorter double pole zero time constant
tau2 – A guess of the longer double pole zero time constant
frac – A guess of the fraction in the double pole zero of the longer time constant
out_plot_dict – A dictionary containing the output plots if requested
- Return type:
Notes
Extracts an initial guess of the double pole zero parameters by performing two linear fits to the log of the waveform. The first log-fit is to long time scales, and fits the long time constant tau2. This piece is subtracted off the waveform, and then the shorter time constant is fit. The average of the fractions returned by the intercepts of these fits is returned.
The waveform must be baseline subtracted to provide accurate fit results!
- pygama.pargen.pz_correct.tp100_align(wfs, tp100_window_width, tp100s)¶
Align provided waveforms at their maximum, so that an average over all the waveforms creates a valid superpulse to fit using
dpz_model_fit(). Do this without sacrificing too much of the length of the decaying tail- Parameters:
wfs (array) – An array of arrays containing waveforms
tp100_window_width (int) – The window of acceptance, with a center set at the median tp100 value. If a tp100 falls outside this window, ignore this waveform for the superpulse
tp100s (array) – An array containing the indices of the maximums of the waveforms, in samples
- Returns:
time_aligned_wfs – An array of waveforms that are all aligned at their maximal values
- Return type:
array
pygama.pargen.survival_fractions module¶
This module provides functions for calculating survival fractions for a cut.
- pygama.pargen.survival_fractions.compton_sf(cut_param, low_cut_val, high_cut_val=None, mode='greater', data_mask=None)¶
Function for calculating the survival fraction of a cut in a compton region with a simple counting analysis.
- Parameters:
cut_param – Array of cut parameter values for all events.
low_cut_val – Lower threshold value of cut_param.
high_cut_val – Upper threshold value; when set, events in
(low_cut_val, high_cut_val)are counted as passing.mode – Direction of the cut when high_cut_val is
None:"greater"counts events above low_cut_val;"less"counts events below it.data_mask – Boolean mask pre-applied to cut_param. Defaults to all-True.
- Returns:
sf – Dictionary with keys
low_cut,sf(survival fraction in percent),sf_err(statistical uncertainty in percent), andhigh_cut.- Return type:
- pygama.pargen.survival_fractions.compton_sf_sweep(energy, cut_param, final_cut_value, data_mask=None, cut_range=(-5, 5), n_samples=51, mode='greater')¶
Function sweeping through cut values and calculating the survival fraction for each value using a simple counting analysis.
- Parameters:
energy (np.array) – Array of calibrated energies (unused in counting analysis, kept for API consistency with
get_sf_sweep()).cut_param (np.array) – Array of cut parameter values; must have the same length as energy.
final_cut_value (float) – Cut threshold at which to evaluate and return the final survival fraction.
data_mask (np.array) – Boolean mask applied before counting. Defaults to all-True.
cut_range –
(low, high)interval of cut values to sweep over.n_samples – Number of evenly-spaced cut values within cut_range.
mode – Direction of the cut:
"greater"or"less".
- Returns:
out_df – DataFrame indexed by cut value with columns
sfandsf_errgiving the survival fraction and its uncertainty at each sweep point.sf – Survival fraction in percent at final_cut_value.
err – Uncertainty on sf in percent.
- Return type:
- pygama.pargen.survival_fractions.energy_guess(energy, func_i, fit_range=None, bin_width=1, peak=None, eres=None)¶
Generate an initial parameter guess for a peak fit.
Delegates to
pygama.pargen.energy_cal.get_hpge_energy_peak_par_guess()for supported distribution types. Currently onlyhpge_peakandgauss_on_stepare supported.- Parameters:
energy – 1-D array of energy values.
func_i – Distribution function for which to compute the guess (e.g.
hpge_peakorgauss_on_step).fit_range –
(low, high)energy window for the guess. Defaults to the full range of energy.bin_width – Histogram bin width used internally for peak characterisation.
peak – Known peak position in keV. If given, overrides the guessed mu.
eres – Known energy resolution (FWHM) in keV. If given, sets sigma = eres / 2.355.
- Returns:
parguess –
iminuit.Valuesof initial parameter values for func_i, orNoneif func_i is not supported.
- pygama.pargen.survival_fractions.fail_pdf_gos(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, hstep1, hstep2)¶
Extended PDF for events failing a cut, using a Gaussian-on-step model.
Re-parameterises the standard
gauss_on_stepPDF in terms of total event counts and cut efficiencies so thatepsilon_sigandepsilon_bkgcan be fit directly.- Parameters:
x – Energy values to evaluate the PDF at.
x_lo – Lower edge of the fit range.
x_hi – Upper edge of the fit range.
n_sig – Total number of signal events (passing + failing).
epsilon_sig – Fraction of signal events passing the cut;
1 - epsilon_sigevents fail.n_bkg – Total number of background events.
epsilon_bkg – Fraction of background events passing the cut;
1 - epsilon_bkgevents fail.mu – Peak centroid.
sigma – Peak width (standard deviation).
hstep1 – Step height parameter for passing events (unused here but kept for a consistent signature shared with
pass_pdf_gos()).hstep2 – Step height parameter for failing events.
- pygama.pargen.survival_fractions.fail_pdf_hpge(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, htail, tau, hstep1, hstep2)¶
Extended PDF for events failing a cut, using a HPGe peak model.
Re-parameterises the standard
hpge_peakPDF in terms of total event counts and cut efficiencies so thatepsilon_sigandepsilon_bkgcan be fit directly.- Parameters:
x – Energy values to evaluate the PDF at.
x_lo – Lower edge of the fit range.
x_hi – Upper edge of the fit range.
n_sig – Total number of signal events (passing + failing).
epsilon_sig – Fraction of signal events passing the cut;
1 - epsilon_sigevents fail.n_bkg – Total number of background events.
epsilon_bkg – Fraction of background events passing the cut;
1 - epsilon_bkgevents fail.mu – Peak centroid.
sigma – Peak width (standard deviation).
htail – Fraction of events in the low-energy tail.
tau – Decay constant of the low-energy tail.
hstep1 – Step height parameter for passing events (unused here but kept for a consistent signature shared with
pass_pdf_hpge()).hstep2 – Step height parameter for failing events.
- pygama.pargen.survival_fractions.fix_all_but_nevents(func)¶
Return the set of parameters to fix when fitting for cut efficiency.
Produces the list of fixed parameter names and a corresponding boolean mask for func, leaving only the event-count and efficiency parameters free. Used when re-fitting a peak with a known shape to extract the signal and background efficiencies of a data-quality cut.
- Parameters:
func – Distribution function; currently supports
gauss_on_stepandhpge_peak.- Returns:
fixed – List of parameter names to be held fixed during the fit, or
Noneif func is not supported.mask – Boolean array of length
len(func.required_args()), whereTrueindicates a free (floating) parameter.
- Return type:
- pygama.pargen.survival_fractions.get_bounds(func, parguess)¶
Build a parameter bounds dictionary for a peak fit.
Wraps
pygama.pargen.energy_cal.get_hpge_energy_bounds()and applies additional tight constraints on the peak position and event counts.- Parameters:
func – Distribution function; currently supports
hpge_peakandgauss_on_step.parguess – Initial parameter guess dictionary, as returned by
energy_guess()orupdate_guess().
- Returns:
bounds – Dictionary mapping parameter names to
(low, high)bound tuples, orNoneif func is not supported.- Return type:
- pygama.pargen.survival_fractions.get_sf_sweep(energy, cut_param, final_cut_value=None, peak=1592.5, eres_pars=None, data_mask=None, cut_range=(-5, 5), n_samples=26, mode='greater', fit_range=None, debug_mode=False)¶
Function sweeping through cut values and calculating the survival fraction for each value using a fit to the surviving and failing enegry distributions.
- Parameters:
energy (np.array) – Array of calibrated energies.
cut_param (np.array) – Array of the cut parameter values; must have the same length as energy.
final_cut_value (float | None) – Cut threshold at which to evaluate and return the final survival fraction. If
None, sf and err are returned asNone.peak (float) – Known peak position in keV.
eres_pars (list | None) – Energy resolution parameter (FWHM) in keV used to seed the fit.
data_mask – Boolean mask applied before fitting. Defaults to all-True.
cut_range –
(low, high)interval of cut values to sweep over.n_samples – Number of evenly-spaced cut values to evaluate within cut_range.
mode – Direction of the cut:
"greater"or"less".fit_range –
(low, high)energy window for the peak fit.debug_mode – If
True, re-raises exceptions encountered during the sweep instead of silently skipping them.
- Returns:
out_df – DataFrame indexed by cut value with columns
sfandsf_errgiving the survival fraction and its uncertainty at each sweep point.sf – Survival fraction in percent at final_cut_value, or
None.err – Uncertainty on sf in percent, or
None.
- Return type:
- pygama.pargen.survival_fractions.get_survival_fraction(energy, cut_param, cut_val, peak, eres_pars, fit_range=None, high_cut=None, pars=None, data_mask=None, mode='greater', func=<pygama.math.functions.sum_dists.SumDists object>, fix_step=True, display=0)¶
Function for calculating the survival fraction of a cut using a fit to the surviving and failing energy distributions.
- Parameters:
energy (ndarray) – Array of calibrated energies.
cut_param (ndarray) – Array of the cut parameter values; must have the same length as energy.
cut_val (float) – Threshold value of cut_param at which to evaluate the survival fraction.
peak (float) – Known peak position in keV.
eres_pars (float) – Energy resolution parameter (FWHM) in keV used to seed the fit.
fit_range (tuple | None) –
(low, high)energy window for the fit. Defaults to the full range of energy.high_cut (float | None) – Upper threshold for cut_param; when set, events in
(cut_val, high_cut)are taken as passing.pars (ValueView) – Initial parameter values for the peak fit. If
None, an automatic staged fit is performed first.data_mask (ndarray) – Boolean mask applied to energy and cut_param before fitting. Defaults to all-True.
mode (str) – Direction of the cut:
"greater"keeps events above cut_val;"less"keeps events below it.func – Peak-shape distribution to use; defaults to
hpge_peak.fix_step – If
True, fix the step-height parameters during the efficiency fit.display – Verbosity level; if greater than 1, plots the passing and failing distributions.
- Returns:
sf – Survival fraction in percent.
err – Statistical uncertainty on sf in percent.
values – Best-fit parameter values from the efficiency fit.
errors – Parameter uncertainties from the efficiency fit.
- pygama.pargen.survival_fractions.pass_pdf_gos(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, hstep1, hstep2)¶
Extended PDF for events passing a cut, using a Gaussian-on-step model.
Re-parameterises the standard
gauss_on_stepPDF in terms of total event counts and cut efficiencies so thatepsilon_sigandepsilon_bkgcan be fit directly.- Parameters:
x – Energy values to evaluate the PDF at.
x_lo – Lower edge of the fit range.
x_hi – Upper edge of the fit range.
n_sig – Total number of signal events (passing + failing).
epsilon_sig – Fraction of signal events passing the cut (signal efficiency).
n_bkg – Total number of background events.
epsilon_bkg – Fraction of background events passing the cut.
mu – Peak centroid.
sigma – Peak width (standard deviation).
hstep1 – Step height parameter for passing events.
hstep2 – Step height parameter for failing events (unused here but kept for a consistent signature shared with
fail_pdf_gos()).
- pygama.pargen.survival_fractions.pass_pdf_hpge(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, htail, tau, hstep1, hstep2)¶
Extended PDF for events passing a cut, using a HPGe peak model.
Re-parameterises the standard
hpge_peakPDF in terms of total event counts and cut efficiencies so thatepsilon_sigandepsilon_bkgcan be fit directly.- Parameters:
x – Energy values to evaluate the PDF at.
x_lo – Lower edge of the fit range.
x_hi – Upper edge of the fit range.
n_sig – Total number of signal events (passing + failing).
epsilon_sig – Fraction of signal events passing the cut (signal efficiency).
n_bkg – Total number of background events.
epsilon_bkg – Fraction of background events passing the cut.
mu – Peak centroid.
sigma – Peak width (standard deviation).
htail – Fraction of events in the low-energy tail.
tau – Decay constant of the low-energy tail.
hstep1 – Step height parameter for passing events.
hstep2 – Step height parameter for failing events (unused here but kept for a consistent signature shared with
fail_pdf_hpge()).
- pygama.pargen.survival_fractions.update_guess(func, parguess, energies)¶
Update the signal and background event-count guesses from data.
Refines the
n_sigandn_bkgentries in parguess by counting events in a ±2*sigma* window around the peak centroid and subtracting a sideband estimate for the background contribution.- Parameters:
func – Distribution function; currently supports
gauss_on_stepandhpge_peak.parguess – Parameter guess dictionary, modified in-place. Must contain
mu,sigma,x_lo,x_hi,n_sig, andn_bkg.energies – 1-D array of energy values from which to re-estimate the counts.
- Returns:
parguess – The input dictionary with updated
n_sigandn_bkgvalues.
pygama.pargen.utils module¶
Utility functions for parameter fitting, unit conversion, and LH5 data loading used across the pargen calibration and optimisation modules.
- pygama.pargen.utils.convert_to_minuit(pars, func)¶
Create an
iminuit.Minuitinstance from a parameter set and a PDF.- Parameters:
pars – Initial parameter values. Either a dict mapping parameter names to values, or a sequence of values that will be passed positionally.
func – Callable whose signature defines the parameters. If the object exposes a
pdf_extattribute it is used as the cost function (so that extended PDFs are handled correctly); otherwise the callable itself is used.
- Returns:
m – Configured Minuit object ready for minimisation.
- Return type:
Minuit
- pygama.pargen.utils.load_data(files, lh5_path, cal_dict, params, cal_energy_param='cuspEmax_ctc_cal', threshold=None, return_selection_mask=False)¶
Load parameters from LH5 files and apply calibration expressions.
Reads params from files, evaluates all expressions in cal_dict to produce calibrated columns, and optionally applies a lower energy threshold. When files is a dict keyed by run timestamp, the function recurses over each timestamp and concatenates the results.
- Parameters:
files (str | list | dict) – A single file path, a list of file paths, or a dict mapping run timestamps to lists of file paths.
lh5_path (str) – Path to the LH5 table within each file.
cal_dict (dict) – Calibration expressions in hit-dict format:
{outname: {"expression": ..., "parameters": {...}}}. When files is a timestamp dict, this may also be keyed by timestamp.params (set) – Set of output column names to include in the returned DataFrame.
cal_energy_param (str) – Name of the calibrated energy column used to apply the threshold.
threshold – Minimum energy value; events below this are dropped.
Nonekeeps all events.return_selection_mask – If
True, also return the boolean threshold mask.
- Returns:
df – DataFrame containing the requested params (plus an optional
run_timestampcolumn when files is a dict).masks – Boolean threshold mask of the same length as df. Only returned when return_selection_mask is
True.
- Return type:
pd.DataFrame | tuple(pd.DataFrame, np.array)
- pygama.pargen.utils.return_nans(input)¶
Return a NaN-filled result tuple with the same structure as a successful fit.
Useful for propagating fit failures without raising exceptions.
- Parameters:
input – Either a plain callable (whose positional arguments after the first define the parameter list) or an object with a
required_args()method (e.g. a pygama distribution).- Returns:
values – Parameter values, all set to NaN.
errors – Parameter uncertainties, all set to NaN.
covariance – Square covariance matrix filled with NaN.
- Return type: