pygama.pargen package¶

Utilities to generate and optimize parameters of interest from data (e.g. calibration routines)

Submodules¶

pygama.pargen.AoE_cal module¶

This module provides functions for correcting the a/e energy dependence, determining the cut level and calculating survival fractions.

class pygama.pargen.AoE_cal.CalAoE(cal_dicts=None, cal_energy_param='cuspEmax_ctc_cal', eres_func=<function CalAoE.<lambda>>, pdf=<pygama.math.functions.sum_dists.SumDists object>, selection_string='index==index', dt_corr=False, dep_correct=False, dt_cut=None, dt_param='dt_eff', high_cut_val=3, mean_func=<class 'pygama.pargen.AoE_cal.Pol1'>, sigma_func=<class 'pygama.pargen.AoE_cal.SigmaFit'>, compt_bands_width=20, debug_mode=False)¶

Bases: object

Class for calibrating the A/E, this follow 5 steps: the time correction, drift time correction, energy correction, the cut level calculation and the survival fraction calculation.

Parameters:

cal_dicts (dict) – Dictionary of calibration parameters can either be empty/None, for a single run or for multiple runs keyed by timestamp in the format YYYYMMDDTHHMMSSZ
cal_energy_param (str) – Calibrated energy parameter to use for A/E calibrations and for determining peak events
eres_func (callable) – Function to determine the energy resolution should take in a single variable the calibrated energy
pdf (PDF) – PDF to fit to the A/E distribution
selection_string (str) – Selection string for the data that will be passed as a query to the data dataframe
dt_corr (bool) – Whether to correct the drift time
dep_correct (bool) – Whether to correct the double escape peak into the single site band before cut determination
dt_cut (dict) –
Dictionary of the drift time cut parameters in the form:
```
{"out_param": "dt_cut", "hard": False}
```
where the out_param is the name of the parameter to cut on in the dataframe (should have been precalculated) and “hard” is whether to remove these events completely for survival fraction calculations or whether they they should only be removed in the cut determination/ A/E calibration steps but the survival fractions should be calculated with them included
high_cut_val (float) – Value to cut the A/E distribution at on the high side
mean_func (Callable) – Function to fit the energy dependence of the A/E mean, should be in the form of a class as above for Pol1
sigma_func (Callable) – Function to fit the energy dependence of the A/E sigma, should be in the form of a class as above for Sigma_Fit
compt_bands_width (float) – Width of the compton bands to use for the energy correction
debug_mode (bool) – If True will raise errors if the A/E calibration fails otherwise just return NaN values

calculate_survival_fractions(data, aoe_param, peaks, fit_widths, mode='greater')¶

Calculate survival fractions for the A/E cut for a list of peaks for the final A/E cut value

Parameters:

data (pd.DataFrame) – Dataframe containing the data
aoe_param (str) – Name of the parameter in the dataframe for the final A/E classifier
peaks (list) – List of peaks to calculate the survival fractions for
fit_widths (list) – List of tuples of the energy range to fit the peak in
mode (str) – mode to use for the cut determination, can be “greater” or “less” i.e. do we want to keep events with A/E greater or less than the cut value

calculate_survival_fractions_sweep(data, aoe_param, peaks, fit_widths, n_samples=26, cut_range=(-5, 5), mode='greater')¶

Calculate survival fractions for the A/E cut for a list of peaks by sweeping through values of the A/E cut to show how this varies

Parameters:

data (pd.DataFrame) – Dataframe containing the data
aoe_param (str) – Name of the parameter in the dataframe for the final A/E classifier
peaks (list) – List of peaks to calculate the survival fractions for
fit_widths (list) – List of tuples of the energy range to fit the peak in
n_samples (int) – Number of samples to take in the sweep
cut_range (tuple) – Range of the cut to sweep through
mode (str) – mode to use for the cut determination, can be “greater” or “less” i.e. do we want to keep events with A/E greater or less than the cut value

calibrate(df, initial_aoe_param, peaks_of_interest=None, fit_widths=None, cut_peak_idx=0, dep_acc=0.9, sf_nsamples=11, sf_cut_range=(-5, 5), timecorr_mode='full')¶

Main function to run a full A/E calibration with all steps i.e. time correction, drift time correction, energy correction, A/E cut determination and survival fraction calculation

Parameters:

df (pd.DataFrame) – Dataframe containing the data
initial_aoe_param (str) – Name of the A/E parameter in dataframe to use as starting point
peaks_of_interest (list) – List of peaks to calculate the survival fractions for
fit_widths (list) – List of tuples of the energy range to fit the peak in for survival fraction determination
cut_peak_idx (int) – Index of the peak in peaks of interest to use for the cut determination
dep_acc (float) – Desired survival fraction in the peak for final cut value
sf_nsamples (int) – Number of samples to take in the survival fraction sweep
sf_cut_range (tuple) – Range to use for the survival fraction sweep
timecorr_mode (str) – Mode to use for the time correction, see time_correction function for details

drift_time_correction(data, aoe_param, out_param='AoE_DTcorr', display=0)¶

Calculates the correction needed to align the two drift time regions for ICPC detectors. This is done by fitting the two drift time peaks in the drift time spectrum and then fitting the A/E peaks in each of these regions. A simple linear correction is then applied to align these regions.

Parameters:

data (pd.DataFrame) – Dataframe containing the data
aoe_param (str) – Name of the A/E parameter to use as starting point
output_name (str) – Name of the output parameter for the drift time corrected A/E in the dataframe and to be added to the calibration dictionary
display (int) – plot level

energy_correction(data, aoe_param, corrected_param='AoE_Corrected', classifier_param='AoE_Classifier', display=0)¶

Calculates the corrections needed for the energy dependence of the A/E. Does this by fitting the compton continuum in slices and then applies fits to the centroid and variance.

Parameters:

data (pd.DataFrame) – Dataframe containing the data
aoe_param (str) – Name of the A/E parameter to use as starting point
corrected_param (str) – Name of the output parameter for the energy mean corrected A/E to be added in the dataframe and to the calibration dictionary
classifier_param (str) – Name of the output parameter for the full mean and sigma energy corrected A/E classifier to be added in the dataframe and to the calibration dictionary
display (int) – plot level

get_aoe_cut_fit(data, aoe_param, peak, ranges, dep_acc=0.9, output_cut_param='AoE_Low_Cut', display=1)¶

Determines A/E cut by sweeping through values and for each one fitting the DEP to determine how many events survive. Fits the resulting distribution and interpolates to get cut value at desired DEP survival fraction (typically 90%)

Parameters:

data (pd.DataFrame) – Dataframe containing the data
aoe_param (str) – Name of the A/E parameter to use as starting point
peak (float) – Energy of the peak to use for the cut determination e.g. 1592.5
ranges (tuple) – Tuple of the range in keV below and above the peak to use for the cut determination e.g. (20,40)
dep_acc (float) – Desired DEP survival fraction for final cut
output_cut_param (str) – Name of the output parameter for the events passing A/E in the dataframe and to be added to the calibration dictionary
display (int) – plot level

time_correction(df, aoe_param, mode='full', output_name='AoE_Timecorr', display=0)¶

Calculates the time correction for the A/E parameter by fitting the 1-1.3 MeV Compton continuum and extracting the centroid. If only a single run is passed will just perform a shift of the A/E parameter to 1 using the centroid otherwise for multiple runs the shift will be determined on the mode given in.

Parameters:

df (pd.DataFrame) – Dataframe containing the data
aoe_param (str) – Name of the A/E parameter to use
mode (str) –
Mode to use for the time correction, can be “full”, “partial” or “none”:

none: just use the mean of the a/e centroids to shift all the data partial: iterate through the centroids if vary by less than 0.4 sigma then group and take mean otherwise when a run higher than 0.4 sigma is found if it is a single run set to nan otherwise start a new block full : each run will be corrected individually average_consecutive: average the consecutive centroids interpolate_consecutive: interpolate between the consecutive centroids
output_name (str) – Name of the output parameter for the time corrected A/E in the dataframe and to be added to the calibration dictionary
display (int) – plot level

update_cal_dicts(update_dict)¶: Util function for updating the calibration dictionaries checks if the dictionary is in the format of a single run or multiple runs and then updates the dictionary accordingly

class pygama.pargen.AoE_cal.Pol1¶

Bases: object

static func(x, a, b)¶

static guess(bands, means, mean_errs)¶

static string_func(input_param)¶

class pygama.pargen.AoE_cal.SigmaFit¶

Bases: object

static func(x, a, b, c)¶

static guess(bands, sigmas, sigma_errs)¶

static string_func(input_param)¶

class pygama.pargen.AoE_cal.SigmoidFit¶

Bases: object

static func(x, a, b, c, d)¶

static guess(xs, ys, y_errs)¶

pygama.pargen.AoE_cal.aoe_peak_bounds(func, guess, **kwargs)¶

pygama.pargen.AoE_cal.aoe_peak_fixed(func, **kwargs)¶

pygama.pargen.AoE_cal.aoe_peak_guess(func, hist, bins, var, **kwargs)¶

pygama.pargen.AoE_cal.average_consecutive(tstamps, means)¶

Fit the time dependence of the means of the A/E distribution by average consecutive entries

Parameters:

tstamps (np.array) – Timestamps of the data
means (np.array) – Means of the A/E distribution
sigmas (np.array) – Sigmas of the A/E distribution
Returns (dict) – Dictionary of the time dependence of the means

pygama.pargen.AoE_cal.drifttime_corr_plot(aoe_class, data, aoe_param='AoE_Timecorr', aoe_param_corr='AoE_DTcorr', figsize=(12, 8), fontsize=12)¶

pygama.pargen.AoE_cal.fit_time_means(tstamps, means, sigmas)¶

Fit the time dependence of the means of the A/E distribution

Parameters:

tstamps (np.array) – Timestamps of the data
means (np.array) – Means of the A/E distribution
sigmas (np.array) – Sigmas of the A/E distribution
Returns (dict) – Dictionary of the time dependence of the means

pygama.pargen.AoE_cal.get_peak_label(peak)¶

Return type:: str

pygama.pargen.AoE_cal.interpolate_consecutive(tstamps, means, times, aoe_param, output_name)¶

Fit the time dependence of the means of the A/E distribution by average consecutive entries

Parameters:

tstamps (np.array) – Timestamps of the data
means (np.array) – Means of the A/E distribution
sigmas (np.array) – Sigmas of the A/E distribution
times (np.array) – Times of the mean samples in unix time format
Returns (dict) – Dictionary of the time dependence of the means

pygama.pargen.AoE_cal.plot_aoe_mean_time(aoe_class, data, time_param='AoE_Timecorr', figsize=(12, 8), fontsize=12)¶

pygama.pargen.AoE_cal.plot_aoe_res_time(aoe_class, data, time_param='AoE_Timecorr', figsize=(12, 8), fontsize=12)¶

pygama.pargen.AoE_cal.plot_classifier(aoe_class, data, aoe_param='AoE_Classifier', xrange=(900, 3000), yrange=(-50, 10), xn_bins=700, yn_bins=500, figsize=(12, 8), fontsize=12)¶

Return type:: figure

pygama.pargen.AoE_cal.plot_compt_bands_overlayed(aoe_class, data, eranges, aoe_param='AoE_Timecorr', aoe_range=None, title='Compton Bands', density=True, n_bins=50, figsize=(12, 8), fontsize=12)¶

Function to plot various compton bands to check energy dependence and corrections

pygama.pargen.AoE_cal.plot_cut_fit(aoe_class, data, dep_acc=0.9, figsize=(12, 8), fontsize=12)¶

Return type:: figure

pygama.pargen.AoE_cal.plot_dt_dep(aoe_class, data, eranges, titles=None, aoe_param='AoE_Timecorr', bins=(200, 100), dt_max=2000, figsize=(12, 8), fontsize=12)¶

Function to produce 2d histograms of A/E against drift time to check dependencies

pygama.pargen.AoE_cal.plot_mean_fit(aoe_class, data, figsize=(12, 8), fontsize=12)¶

Return type:: figure

pygama.pargen.AoE_cal.plot_sf_vs_energy(aoe_class, data, xrange=(900, 3000), n_bins=701, figsize=(12, 8), fontsize=12)¶

Return type:: figure

pygama.pargen.AoE_cal.plot_sigma_fit(aoe_class, data, figsize=(12, 8), fontsize=12)¶

Return type:: figure

pygama.pargen.AoE_cal.plot_spectra(aoe_class, data, xrange=(900, 3000), n_bins=2101, xrange_inset=(1580, 1640), n_bins_inset=200, figsize=(12, 8), fontsize=12)¶

Return type:: figure

pygama.pargen.AoE_cal.plot_survival_fraction_curves(aoe_class, data, figsize=(12, 8), fontsize=12)¶

Return type:: figure

pygama.pargen.AoE_cal.unbinned_aoe_fit(aoe, pdf=<pygama.math.functions.sum_dists.SumDists object>, display=0)¶

Fitting function for A/E, first fits just a Gaussian before using the full pdf to fit if fails will return NaN values

Parameters:

aoe (np.array) – A/E values
pdf (PDF) – PDF to fit to
display (int) – Level of display

Returns:

tuple(np.array, np.array, np.ndarray, tuple) – Tuple of fit values, errors, covariance matrix and full fit info

Return type:

tuple(np.array, np.array)

pygama.pargen.data_cleaning module¶

This module provides routines for calculating and applying quality cuts

pygama.pargen.data_cleaning.find_pulser_properties(df, energy='daqenergy')¶: Searches for pulser in the energy spectrum using time between events in peaks

pygama.pargen.data_cleaning.fit_distributions(x_lo, x_hi, norm_par_array, display=0)¶

pygama.pargen.data_cleaning.generate_cut_classifiers(data, cut_dict, rounding=4, display=0)¶

Finds double sided cut boundaries for a file for the parameters specified

Parameters:

data (lh5 table, dictionary of arrays or pandas dataframe) – data to calculate cuts on

parameters (dict) –

dictionary of the form:

{
    "output_parameter_name": {
        "cut_parameter": "parameter_to_cut_on",
        "cut_level": "number_of_sigmas",
         "mode": "inclusive/exclusive"
    }
}

number of sigmas can instead be a dictionary to specify different cut levels for low and high side or to only have a one sided cut only specify one of the low or high side e.g.

{
    "output_parameter_name": {
        "cut_parameter": "parameter_to_cut_on",
        "cut_level": {"low_side": "3", "high_side": "2"},
        "mode": "inclusive/exclusive"
    }
}

alternatively can specify hit dict fields to just copy dict into output dict e.g.

{
    "is_valid_t0":{
        "expression":"(tp_0_est>a)&(tp_0_est<b)",
        "parameters":{"a":"46000", "b":"52000"}
    }
}

{
    "is_valid_cal":{
        "expression":"(~is_pileup_tail)&(~is_pileup_baseline)"
    }
}

rounding (int) – number of decimal places to round to
display (int) – if 1 will display plots of the cuts if 0 will not display plots

Returns:

dict – dictionary of the form (same as hit dicts):

{
    "output_parameter_name": {
        "expression": "cut_expression",
        "parameters": {"a": lower_bound, "b": upper_bound}
    }
}

plot_dict – dictionary of plots

Return type:

dict

pygama.pargen.data_cleaning.generate_cuts(data, cut_dict, rounding=4, display=0)¶

Finds double sided cut boundaries for a file for the parameters specified

Parameters:

data (lh5 table, dictionary of arrays or pandas dataframe) – data to calculate cuts on

parameters (dict) –

dictionary of the form:

{
    "output_parameter_name": {
        "cut_parameter": "parameter_to_cut_on",
        "cut_level": "number_of_sigmas",
        "mode": "inclusive/exclusive"
    }
}

number of sigmas can instead be a dictionary to specify different cut levels for low and high side or to only have a one sided cut only specify one of the low or high side e.g.

{
    "output_parameter_name": {
        "cut_parameter": "parameter_to_cut_on",
        "cut_level": {"low_side": "3", "high_side": "2"},
        "mode": "inclusive/exclusive"
    }
}

alternatively can specify hit dict fields to just copy dict into output dict e.g.

{
    "is_valid_t0":{
        "expression":"(tp_0_est>a)&(tp_0_est<b)",
        "parameters":{"a":"46000", "b":"52000"}
    }
}

{
    "is_valid_cal":{
        "expression":"(~is_pileup_tail)&(~is_pileup_baseline)"
    }
}

rounding (int) – number of decimal places to round to
display (int) – if 1 will display plots of the cuts if 0 will not display plots

Returns:

dict – dictionary of the form (same as hit dicts):

{
    "output_parameter_name": {
        "expression": "cut_expression",
        "parameters": {"a": "lower_bound", "b": "upper_bound"}
    }
}

plot_dict – dictionary of plots

Return type:

dict

pygama.pargen.data_cleaning.get_cut_indexes(data, cut_parameters)¶: Get the indexes of the data that pass the cuts in

pygama.pargen.data_cleaning.get_keys(in_data, cut_dict)¶: Get the keys of the data that are used in the cut dictionary

pygama.pargen.data_cleaning.get_mode_stdev(par_array)¶

pygama.pargen.data_cleaning.get_tcm_pulser_ids(tcm_file, channel, multiplicity_threshold)¶

pygama.pargen.data_cleaning.skewed_fit(x, n_sig, mu, sigma, alpha)¶

pygama.pargen.data_cleaning.skewed_pdf(x, n_sig, mu, sigma, alpha)¶

pygama.pargen.data_cleaning.tag_pulsers(df, chan_info, window=0.01)¶

pygama.pargen.dplms_ge_dict module¶

This module is for creating dplms dictionary for ge processing

pygama.pargen.dplms_ge_dict.dplms_ge_dict(raw_fft, raw_cal, dsp_config, par_dsp, dplms_dict, fom_func, decay_const=0, ene_par='dplmsEmax', ctc_par='dt_eff', display=0)¶

This function calculates the dplms dictionary for HPGe detectors.

Parameters:

raw_fft (Table) – table with fft raw data
raw_cal (Table) – table with cal raw data
dsp_config (dict) – dsp config file
par_dsp (dict) – Dictionary with db parameters for dsp processing
dplms_dict (dict) – Dictionary with various parameters
fom_func – Function for peak fits

Returns:

out_dict

Return type:

dict

pygama.pargen.dplms_ge_dict.filter_synthesis(ref, nmat, rmat, za, pmat, fmat, length, size, flip=True)¶

Return type:: array

pygama.pargen.dplms_ge_dict.filter_synthesis_corr(ref, nmat_corr, rmat, za, pmat, fmat, length, size, flip=True)¶

Return type:: array

pygama.pargen.dplms_ge_dict.is_not_pile_up(peak_pos, peak_pos_neg, thr, lim, size)¶

Return type:: list[bool]

pygama.pargen.dplms_ge_dict.is_valid_centroid(centroid, lim, size, full_size)¶

Return type:: list[bool]

pygama.pargen.dplms_ge_dict.is_valid_risetime(risetime, llim, perc)¶

pygama.pargen.dplms_ge_dict.noise_matrix(bls, length)¶

Return type:: array

pygama.pargen.dplms_ge_dict.noise_matrix_corr(bls, bls_corr, length)¶

Return type:: ndarray

pygama.pargen.dplms_ge_dict.signal_matrices(wfs, length, decay_const, ff=2)¶

Return type:: array

pygama.pargen.dplms_ge_dict.signal_selection(dsp_cal, dplms_dict, coeff_values)¶

pygama.pargen.dsp_optimize module¶

class pygama.pargen.dsp_optimize.BayesianOptimizer(acq_func, batch_size, kernel=None, sampling_rate=None, fom_value='y_val', fom_error='y_val_err')¶

Bases: object

Bayesian optimiser uses Gaussian Process Regressor from sklearn to fit kernel to data, takes in a series of init samples for this fit and then calculates the next point using the acquisition function specified.

_extend_prior_with_posterior_data(x, y, yerr)¶

_get_expected_improvement(x_new)¶

_get_lcb(x_new)¶

_get_next_probable_point()¶

_get_ucb(x_new)¶

add_dimension(name, parameter, min_val, max_val, round_to_samples=False, unit=None)¶

add_initial_values(x_init, y_init, yerr_init)¶

eta_param = 0¶

get_best_vals()¶

get_first_point()¶

get_n_dimensions()¶

iterate_values()¶

lambda_param = 0.01¶

plot(init_samples=None)¶

plot_acq(init_samples=None)¶

update(results)¶

update_db_dict(db_dict)¶

class pygama.pargen.dsp_optimize.OptimiserDimension(name, parameter, min_val, max_val, round, unit)¶

Bases: tuple

Create new instance of OptimiserDimension(name, parameter, min_val, max_val, round, unit)

_asdict()¶: Return a new dict which maps field names to their values.

_field_defaults = {}¶

_fields = ('name', 'parameter', 'min_val', 'max_val', 'round', 'unit')¶

classmethod _make(iterable)¶: Make a new OptimiserDimension object from a sequence or iterable

_replace(**kwds)¶: Return a new OptimiserDimension object replacing specified fields with new values

max_val¶: Alias for field number 3

min_val¶: Alias for field number 2

name¶: Alias for field number 0

parameter¶: Alias for field number 1

round¶: Alias for field number 4

unit¶: Alias for field number 5

class pygama.pargen.dsp_optimize.ParGrid¶

Bases: object

Parameter Grid class Each ParGrid entry corresponds to a dsp parameter to be varied. The ntuples must follow the pattern: ( name parameter value_strs) : ( str, str, list of str) where name and parameter are the same as ‘db.name.parameter’ in the processing chain, value_strs is the array of strings to set the argument to.

add_dimension(name, parameter, value_strs)¶

get_data(i_dim, i_par)¶

get_n_dimensions()¶

get_n_grid_points()¶

get_n_points_of_dim(i)¶

get_par_meshgrid(copy=False, sparse=False)¶: return a meshgrid of parameter values Always uses Matrix indexing (natural for par grid) so that mg[i1][i2][…] corresponds to index order in self.dims Note copy is False by default as opposed to numpy default of True

get_shape()¶

get_zero_indices()¶

iterate_indices(indices)¶: iterate given indices [i1, i2, …] by one. For easier iteration. The convention here is arbitrary, but its the order the arrays would be traversed in a series of nested for loops in the order appearing in dims (first dimension is first for loop, etc): Return False when the grid runs out of indices. Otherwise returns True.

print_data(indices)¶

set_dsp_pars(db_dict, indices)¶

class pygama.pargen.dsp_optimize.ParGridDimension(name, parameter, value_strs)¶

Bases: tuple

Create new instance of ParGridDimension(name, parameter, value_strs)

_asdict()¶: Return a new dict which maps field names to their values.

_field_defaults = {}¶

_fields = ('name', 'parameter', 'value_strs')¶

classmethod _make(iterable)¶: Make a new ParGridDimension object from a sequence or iterable

_replace(**kwds)¶: Return a new ParGridDimension object replacing specified fields with new values

name¶: Alias for field number 0

parameter¶: Alias for field number 1

value_strs¶: Alias for field number 2

pygama.pargen.dsp_optimize.get_grid_points(grid)¶: Generates a list of the indices of all possible grid points

pygama.pargen.dsp_optimize.run_bayesian_optimisation(tb_data, dsp_config, fom_function, optimisers, fom_kwargs=None, db_dict=None, nan_val=10, n_iter=10)¶

pygama.pargen.dsp_optimize.run_grid(tb_data, dsp_config, grid, fom_function, db_dict=None, verbosity=1, **fom_kwargs)¶

Extract a table of optimization values for a grid of DSP parameters The grid argument defines a list of parameters and values over which to run the DSP defined in dsp_config on tb_data. At each point, a scalar figure-of-merit is extracted.

Returns a N-dimensional ndarray of figure-of-merit values, where the array axes are in the order they appear in grid.

Parameters:

tb_data (lh5 Table) – An input table of lh5 data. Typically a selection is made prior to sending tb_data to this function: optimization typically doesn’t have to run over all data
dsp_config (dict) – Specifies the DSP to be performed (see build_processing_chain()) and the list of output variables to appear in the output table for each grid point
grid (ParGrid) – See ParGrid class for format
fom_function (function) – When given the output lh5 table of this DSP iteration, the fom_function must return a scalar figure-of-merit. Should accept verbosity as a second keyword argument
db_dict (dict (optional)) – DSP parameters database. See build_processing_chain for formatting info
verbosity (int (optional)) – verbosity for the processing chain and fom_function calls
**fom_kwargs – Any keyword arguments for fom_function

Returns:

grid_values (ndarray of floats) – An N-dimensional numpy ndarray whose Mth axis corresponds to the Mth row of the grid argument

pygama.pargen.dsp_optimize.run_grid_point(tb_data, dsp_config, grids, fom_function, iii, db_dict=None, verbosity=1, fom_kwargs=None)¶: Runs a single grid point for the index specified

pygama.pargen.dsp_optimize.run_one_dsp(tb_data, dsp_config, db_dict=None, fom_function=None, verbosity=0, fom_kwargs=None)¶

run one iteration of DSP on tb_data

Optionally returns a value for optimization

Parameters:

tb_data (lh5 Table) – An input table of lh5 data. Typically a selection is made prior to sending tb_data to this function: optimization typically doesn’t have to run over all data
dsp_config (dict) – Specifies the DSP to be performed for this iteration (see build_processing_chain()) and the list of output variables to appear in the output table
db_dict (dict (optional)) – DSP parameters database. See build_processing_chain for formatting info
fom_function (function or None (optional)) – When given the output lh5 table of this DSP iteration, the fom_function must return a scalar figure-of-merit value upon which the optimization will be based. Should accept verbosity as a second argument
verbosity (int (optional)) – verbosity for the processing chain and fom_function calls
fom_kwargs – any keyword arguments to pass to the fom

Returns:

figure_of_merit (float) – If fom_function is not None, returns figure-of-merit value for the DSP iteration
tb_out (lh5 Table) – If fom_function is None, returns the output lh5 table for the DSP iteration

pygama.pargen.energy_cal module¶

routines for automatic calibration.

hpge_find_energy_peaks (Find uncalibrated E peaks whose E spacing matches the pattern in peaks_kev)
hpge_get_energy_peaks (Get uncalibrated E peaks at the energies of peaks_kev)
hpge_fit_energy_peaks (fits the energy peals)
hpge_E_calibration (main routine – finds and fits peaks specified)

class pygama.pargen.energy_cal.FWHMLinear¶

Bases: object

static bounds(ys)¶

static func(x, a, b)¶

static guess(xs, ys, y_errs)¶

static string_func(input_param)¶

class pygama.pargen.energy_cal.FWHMQuadratic¶

Bases: object

static bounds(ys)¶

static func(x, a, b, c)¶

static guess(xs, ys, y_errs)¶

static string_func(input_param)¶

class pygama.pargen.energy_cal.HPGeCalibration(energy_param, glines, guess_kev, deg=1, uncal_is_int=False, fixed=None, debug_mode=False)¶

Bases: object

Calibrate HPGe data to a set of known peaks. Class stores the calibration parameters as well as the peaks locations and energies used. Each function called updates a results dictionary with any additional information which is stored in the class.

Parameters:

e_uncal (array) – uncalibrated energy data
glines (array) – list of peak energies to be fit to. Each must be in the data
guess_kev (float) – a rough initial guess at the conversion factor from e_uncal to kev. Must be positive
deg (non-negative int) – degree of the polynomial for the E_cal function E_kev = poly(e_uncal). deg = 0 corresponds to a simple scaling E_kev = scale * e_uncal. Otherwise follows the convention in np.polynomial.polynomial of lowest order to highest order
uncal_is_int (bool) – if True, attempts will be made to avoid picket-fencing when binning e_uncal
fixed (dict) – dictionary of fixed parameters for the calibration function

calibrate_prominent_peak(e_uncal, peak, peak_pars, allowed_p_val=1e-20, tail_weight=0, peak_param='mode', n_events=None)¶

fit_calibrated_peaks(e_uncal, peak_pars)¶

static fit_energy_res_curve(fwhm_func, fwhm_peaks, fwhms, dfwhms)¶

full_calibration(e_uncal, peak_pars, allowed_p_val=1e-20, tail_weight=0, peak_param='mode', n_events=None)¶

gen_pars_dict()¶

Generate a dictionary containing the expression and parameters used for energy calibration.

Returns:

dict: A dictionary with keys ‘expression’ and ‘parameters’.: ‘expression’ is a string representing the energy calibration expression. ‘parameters’ is a dictionary containing the parameter values used in the expression.

get_energy_res_curve(fwhm_func, interp_energy_kev=None)¶

get_fwhms()¶: Updates last results dictionary with fwhms in kev

hpge_cal_energy_peak_tops(e_uncal, n_sigmas=1.2, peaks_kev=None, default_n_bins=50, n_events=None, allowed_p_val=0.01, update_cal_pars=True)¶

Perform energy calibration for HPGe detector using peak fitting.

Args:

e_uncal (array-like):: Uncalibrated energy values.
n_sigmas (float, optional):: Number of standard deviations to use for peak fitting range. Defaults to 1.2.
peaks_kev (array-like, optional):: Known peak positions in keV. If not provided, uses self.peaks_kev. Defaults to None.
default_n_bins (int, optional):: Number of bins for histogram. Defaults to 50.
n_events (int, optional):: Number of events to consider for calibration. Defaults to None which uses all events.
allowed_p_val (float, optional):: Maximum p-value for a fit to be considered valid. Defaults to 0.05.
update_cal_pars (bool, optional):: Whether to update the calibration parameters. Defaults to True.

hpge_find_energy_peaks(e_uncal, peaks_kev=None, n_sigma=5, etol_kev=None, bin_width_kev=1, erange=None, var_zero=1, update_cal_pars=True)¶

Find uncalibrated energy peaks whose energy spacing matches the pattern in peaks_kev.

Parameters:

(array-like) (e_uncal) – Uncalibrated energy values.
(array-like (peaks_kev) – Pattern of energy peaks to match. If not provided, the peaks from the object’s attribute peaks_kev will be used.
optional) – Pattern of energy peaks to match. If not provided, the peaks from the object’s attribute peaks_kev will be used.
(float (var_zero) – Number of standard deviations above the mean to consider a peak significant. Default is 5.
optional) – Number of standard deviations above the mean to consider a peak significant. Default is 5.
(float – Tolerance in energy units for matching peaks. If not provided, it will be estimated based on the peak widths.
optional) – Tolerance in energy units for matching peaks. If not provided, it will be estimated based on the peak widths.
(float – Width of the energy bins for initial peak search. Default is 1 keV.
optional) – Width of the energy bins for initial peak search. Default is 1 keV.
(tuple (erange) – Range of uncalibrated energy values to consider. If not provided, the range will be determined based on the peaks.
optional) – Range of uncalibrated energy values to consider. If not provided, the range will be determined based on the peaks.
(float – Value to replace zero variance with. Default is 1.
optional) – Value to replace zero variance with. Default is 1.
(bool (update_cal_pars) – Whether to update the calibration parameters. Default is True.
optional) – Whether to update the calibration parameters. Default is True.
Returns – None

hpge_fit_energy_peaks(e_uncal, peak_pars=None, peaks_kev=None, bin_width_kev=1, peak_param='mode', method='unbinned', n_events=None, allowed_p_val=0.01, tail_weight=0, update_cal_pars=True, use_bin_width_in_fit=True)¶

Fit the energy peaks specified using the given function.

Parameters:

e_uncal (array) – Unbinned energy data to be fit.
peaks_kev (array, optional) – Array of energy values for the peaks to fit. If not provided, it uses the peaks_kev attribute of the class.
peak_pars (list of tuples, optional) – List containing tuples of the form (peak, range, func) where peak is the energy of the peak to fit, range is the range in keV to fit, and func is the function to fit.
bin_width_kev (int, optional) – Default binwidth to use for the fit window histogramming. Default is 1 keV.
peak_param (str, optional) – Parameter to use for peak fitting. Default is “mode”.
method (str, optional) – Method to use for fitting. Default is “unbinned”. Can specify to use binned fit method instead.
n_events (int, optional) – Number of events to use for unbinned fit.
allowed_p_val (float, optional) – Lower limit on p-value of fit.
tail_weight (int, optional) – Weight to apply to the tail of the fit.
update_cal_pars (bool, optional) – Whether to update the calibration parameters. Default is True.

Returns:

results_dict (dict) – Dictionary containing the fit results for each peak.

Raises:

RuntimeError – If the fit fails.

Notes

This function fits the energy peaks specified using the given function. It calculates the range around each peak to fit, performs the fitting using either unbinned or binned method, and returns the fit results in a dictionary.

hpge_get_energy_peaks(e_uncal, peaks_kev=None, n_sigma=3, etol_kev=5, var_zero=1, bin_width_kev=0.2, update_cal_pars=True, erange=None)¶

Get uncalibrated E peaks at the energies of peaks_kev

Parameters:

e_uncal (array) – Uncalibrated energy values.
peaks_kev (array, optional) – Energies of peaks to search for (in keV). If not provided, the peaks_kev attribute of the object will be used.
n_sigma (float, optional) – Threshold for detecting a peak in sigma (i.e. sqrt(var)). Default is 3.
etol_kev (float, optional) – Absolute tolerance in energy for matching peaks. Default is 5.
var_zero (float, optional) – Number used to replace zeros of var to avoid divide-by-zero in hist/sqrt(var). Default is 1. Usually when var = 0, it’s because hist = 0, and any value here is fine.
bin_width_kev (float, optional) – Width of the energy bins for re-binning the histogram. Default is 0.2 keV.
update_cal_pars (bool, optional) – Flag indicating whether to update the calibration parameters. Default is True.
erange (tuple, optional) – Range of energy values to consider for peak search. If not provided, the range will be determined automatically based on the peaks_kev values.

Returns:

None

Notes

This method performs the following steps: 1. Re-bins the histogram in ~0.2 keV bins with updated energy scale parameters for peak-top fits. 2. Finds all local maxima in the histogram with significance greater than n_sigma. 3. Matches the calculated peak energies with the expected peak energies. 4. Removes duplicate peak matches. 5. Updates the input peaks, got peaks, and got peak locations in the results dictionary. 6. If update_cal_pars is True, calculates the updated calibration curve using the matched peak energies.

static interpolate_energy_res(fwhm_func, fwhm_peaks, fwhm_results, interp_energy_kev=None, debug_mode=False)¶

plot_cal_fit(data, figsize=(12, 8), fontsize=12, erange=(200, 2700))¶

plot_cal_fit_with_errors(data, figsize=(10, 6), fontsize=12, erange=(200, 2700))¶

plot_eres_fit(data, erange=(200, 2700), figsize=(12, 8), fontsize=12)¶

plot_fits(energies, figsize=(12, 8), fontsize=12, ncols=3, nrows=3, binning_kev=5)¶

update_results_dict(results_dict)¶

class pygama.pargen.energy_cal.TailPrior(data, model, tail_weight=0)¶

Bases: object

Generic least-squares cost function with error.

_call(*pars)¶

_value(*pars)¶

errordef = 0.5¶

verbose = 0¶

pygama.pargen.energy_cal.average_counts_check(hist, bins, var, threshold=1)¶

pygama.pargen.energy_cal.get_hpge_energy_bounds(func, parguess)¶

pygama.pargen.energy_cal.get_hpge_energy_fixed(func)¶

Get the fixed indexes for fitting and mask for parameters based on the given function.

Parameters:

func (function) – The function for which the fixed indexes and mask are to be determined.

Returns:

fixed (list) – A sequence list of fixed indexes for fitting.
mask (ndarray) – A boolean mask indicating which parameters are fixed (False) and which are not fixed (True).

pygama.pargen.energy_cal.get_hpge_energy_peak_par_guess(energy, func, fit_range=None, bin_width=None, mode_guess=None)¶

Get parameter guesses for func fit to peak in hist

Parameters:

energy (array) – An array of energy values in the range around the peak for guessing.
func (function) – The function to be fit to the peak in the histogram.
fit_range (tuple, optional) – A tuple specifying the range around the peak to perform the fit. If not provided, the entire range of energy values will be used.
bin_width (float, optional) – The width of the bins in the histogram. Default is 1.
mode_guess (float, optional) – A guess for the mode (mu) parameter of the function. If not provided, it will be estimated from the data.

Returns:

ValueView – A ValueView object from iminuit containing the parameter guesses for the function fit.

Notes

This function calculates parameter guesses for fitting a function to a peak in a histogram. It uses various methods to estimate the parameters based on the provided energy values and the selected function.

If the function is ‘gauss_on_step’, the following parameters will be estimated: - n_sig: Number of signal events in the peak. - mu: Mean of the peak. - sigma: Standard deviation of the peak. - n_bkg: Number of background events. - hstep: Height of the step between the peak and the background. - x_lo: Lower bound of the fit range. - x_hi: Upper bound of the fit range.

If the function is ‘hpge_peak’, the following parameters will be estimated: - n_sig: Number of signal events in the peak. - mu: Mean of the peak. - sigma: Standard deviation of the peak. - htail: Height of the tail component. - tau: Decay constant of the tail component. - n_bkg: Number of background events. - hstep: Height of the step between the peak and the background. - x_lo: Lower bound of the fit range. - x_hi: Upper bound of the fit range.

If the provided function is not implemented, an error will be raised.

Examples

>>> energy = [1, 2, 3, 4, 5]
>>> func = pgf.gauss_on_step
>>> fit_range = (2, 4)
>>> bin_width = 0.5
>>> mode_guess = 3.5
>>> get_hpge_energy_peak_par_guess(energy, func, fit_range, bin_width, mode_guess)
{'n_sig': 3, 'mu': 3.5, 'sigma': 0.5, 'n_bkg': 2, 'hstep': 0.5, 'x_lo': 2, 'x_hi': 4}

pygama.pargen.energy_cal.hpge_fit_energy_cal_func(mus, mu_vars, energies_kev, energy_scale_pars, deg=0, fixed=None)¶

Find best fit of E = poly(mus +/- sqrt(mu_vars)) This is an inversion of hpge_fit_energy_scale. E uncertainties are computed from mu_vars / dmu/dE where mu = poly(E) is the E_scale function

Parameters:

mus (array) – uncalibrated energies
mu_vars (array) – variances in the mus
energies_kev (array) – energies to fit to, in kev
energy_scale_pars (array) – Parameters from the escale fit (kev to ADC) used for calculating uncertainties
deg (int) – degree for energy scale fit. deg=0 corresponds to a simple scaling mu = scale * E. Otherwise deg follows the definition in np.polyfit
fixed (dict) – dict where keys are index of polyfit pars to fix and vals are the value to fix at, can be None to fix at guess value

Returns:

pars (array) – parameters of the best fit. Follows the convention in np.polyfit
cov (2D array) – covariance matrix for the best fit parameters.

pygama.pargen.energy_cal.hpge_fit_energy_peak_tops(hist, bins, var, peak_locs, n_to_fit=7, cost_func='Least Squares', inflate_errors=False, gof_method='var', debug_mode=False)¶

Fit gaussians to the tops of peaks

Parameters:

hist (array, array, array) – Histogram of uncalibrated energies, see pgh.get_hist()
bins (array, array, array) – Histogram of uncalibrated energies, see pgh.get_hist()
var (array, array, array) – Histogram of uncalibrated energies, see pgh.get_hist()
peak_locs (array) – locations of peaks in hist. Must be accurate two within +/- 2*n_to_fit
n_to_fit (int) – number of hist bins near the peak top to include in the gaussian fit
cost_func (bool (optional)) – Flag passed to gauss_mode_width_max()
inflate_errors (bool (optional)) – Flag passed to gauss_mode_width_max()
gof_method (str (optional)) – method flag passed to gauss_mode_width_max()

Returns:

pars_list (list of array) – a list of best-fit parameters (mode, sigma, max) for each peak-top fit
cov_list (list of 2D arrays) – a list of covariance matrices for each pars

pygama.pargen.energy_cal.hpge_fit_energy_scale(mus, mu_vars, energies_kev, deg=0, fixed=None)¶

Find best fit of poly(E) = mus +/- sqrt(mu_vars) Compare to hpge_fit_energy_cal_func which fits for E = poly(mu)

Parameters:

mus (array) – uncalibrated energies
mu_vars (array) – variances in the mus
energies_kev (array) – energies to fit to, in kev
deg (int) – degree for energy scale fit. deg=0 corresponds to a simple scaling mu = scale * E. Otherwise deg follows the definition in np.polyfit
fixed (dict) – dict where keys are index of polyfit pars to fix and vals are the value to fix at, can be None to fix at guess value

Returns:

pars (array) – parameters of the best fit. Follows the convention in np.polyfit
cov (2D array) – covariance matrix for the best fit parameters.

pygama.pargen.energy_cal.poly_match(xx, yy, deg=-1, rtol=1e-05, atol=1e-08, fixed=None)¶

Find the polynomial function best matching pol(xx) = yy

Finds the poly fit of xx to yy that obtains the most matches between pol(xx) and yy in the np.isclose() sense. If multiple fits give the same number of matches, the fit with the best gof is used, where gof is computed only among the matches. Assumes that the relationship between xx and yy is monotonic

Parameters:

xx (array-like) – domain data array. Must be sorted from least to largest. Must satisfy len(xx) >= len(yy)
yy (array-like) – range data array: the values to which pol(xx) will be compared. Must be sorted from least to largest. Must satisfy len(yy) > max(2, deg+2)
deg (int) – degree of the polynomial to be used. If deg = 0, will fit for a simple scaling: scale * xx = yy. If deg = -1, fits to a simple shift in the data: xx + shift = yy. Otherwise, deg is equivalent to the deg argument of np.polyfit()
rtol (float) – the relative tolerance to be sent to np.isclose()
atol (float) – the absolute tolerance to be sent to np.isclose(). Has the same units as yy.

Returns:

pars (None or array of floats) – The parameters of the best fit of poly(xx) = yy. Follows the convention used for the return value “p” of polyfit. Returns None when the inputs are bad.
i_matches (list of int) – list of indices in xx for the matched values in the best match

pygama.pargen.energy_cal.poly_wrapper(x, *pars)¶

pygama.pargen.energy_cal.sum_bins(hist, bins, var, threshold=5)¶

pygama.pargen.energy_cal.unbinned_staged_energy_fit(energy, func, gof_range=None, fit_range=None, guess=None, guess_func=<function get_hpge_energy_peak_par_guess>, bounds_func=<function get_hpge_energy_bounds>, fixed_func=<function get_hpge_energy_fixed>, guess_kwargs=None, bounds_kwargs=None, fixed_kwargs=None, tol=None, tail_weight=0, allow_tail_drop=True, bin_width=None, lock_guess=False, p_val_threshold=1e-19, display=0)¶: Unbinned fit to energy. This is different to the default fitting as it will try different fitting methods and choose the best. This is necessary for the lower statistics.

pygama.pargen.energy_optimisation module¶

This module contains the functions for performing the energy optimisation. This happens in 2 steps, firstly a grid search is performed on each peak separately using the optimiser, then the resulting grids are interpolated to provide the best energy resolution at Qbb

pygama.pargen.energy_optimisation.fom_fwhm_no_alpha_sweep(tb_in, kwarg_dict, ctc_param=None, alpha=0, idxs=None, frac_max=0.5, kev=True, display=0)¶: FOM with no ctc sweep, used for optimising ftp.

pygama.pargen.energy_optimisation.fom_fwhm_with_alpha_fit(tb_in, kwarg_dict, ctc_parameter, nsteps=11, idxs=None, frac_max=0.2, display=0)¶: FOM for sweeping over ctc values to find the best value, returns the best found fwhm with its error, the corresponding alpha value and the number of events in the fitted peak, also the reduced chisquare of the

pygama.pargen.energy_optimisation.fom_interpolate_energy_res_with_single_peak_alpha_sweep(data, kwarg_dict, display=0)¶

pygama.pargen.energy_optimisation.fom_single_peak_alpha_sweep(data, kwarg_dict, display=0)¶

pygama.pargen.energy_optimisation.get_peak_fwhm_with_dt_corr(energies, alpha, dt, func, peak, kev_width, guess=None, kev=False, frac_max=0.5, bin_width=1, allow_tail_drop=False, display=0)¶: Applies the drift time correction and fits the peak returns the fwhm, fwhm/max and associated errors, along with the number of signal events and the reduced chi square of the fit. Can return result in ADC or keV.

pygama.pargen.energy_optimisation.simple_guess(energy, func, fit_range=None, bin_width=None)¶: Simple guess for peak fitting

pygama.pargen.lq_cal module¶

class pygama.pargen.lq_cal.LQCal(cal_dicts, cal_energy_param, dt_param, eres_func, cdf=<pygama.math.functions.gauss.GaussianGen object>, selection_string='is_valid_cal&is_not_pulser', debug_mode=False)¶

Bases: object

A class for calibrating the LQ parameter and determining the LQ cut value

Parameters:

cal_dicts (dict) – A dictionary containing the hit-level operations to apply to the data.
cal_energy_param (string) – The calibrated energy parameter of choice
dt_param (string) – The drift-time parameter of choice
eres_function (callable) – The energy resolutions function
cdf (callable) – The CDF used for the binned fits
selection_string (string) – A string of flags to apply the data when running the calibration
debug_mode (boolean) – Determines if the class is initialized in debug mode

calibrate(df, initial_lq_param)¶: Run the LQ calibration and calculate the cut value

drift_time_correction(df, lq_param, cal_energy_param, display=0)¶

Deterimines the drift time correction parameters for LQ by fitting a degree 1 polynomial to the LQ vs drift time distribution for DEP events. Corrects for any linear dependence and centers the final LQ distribution to a mean of 0.

get_cut_lq_dep(df, lq_param, cal_energy_param)¶

Determines the cut value for LQ. Value is calculated by fitting the LQ distribution for events in the DEP to a gaussian. The LQ values are normalized by the fitted sigma, and the cut value is set to a value of 3. Sideband subtraction is used to determine the LQ distribution for DEP events. Events greater than the cut value fail the cut.

lq_timecorr(df, lq_param, output_name='LQ_Timecorr', display=0)¶: Calculates the average LQ value for DEP events for each specified run_timestamp. Applies a time normalization based on the average LQ value in the DEP across all run_timestamps.

update_cal_dicts(update_dict)¶

pygama.pargen.lq_cal.binned_lq_fit(df, lq_param, cal_energy_param, peak, cdf=<pygama.math.functions.gauss.GaussianGen object>, sidebands=True)¶

Function for fitting a distribution of LQ values within a specified: energy peak. Fits a gaussian to the distribution

Parameters:

df (pd.DataFrame()) – Dataframe containing the data for fitting. Data must contain the desired lq parameter and the calibrated energy
lq_param (string) – Name of the LQ parameter to fit
cal_energy_param (string) – Name of the calibrated energy parameter of choice
peak (float) – Energy value, in keV, of the peak who’s LQ distribution will be fit
cdf (callable) – Function to be used for the binned fit
sidebands (bool) – Whether or not to perform a sideband subtraction when fitting the LQ distribution

Returns:

m1.values (array-like object) – Resulting parameter values from the peak fit
m1.errors (array-like object) – Resulting parameter errors from the peak fit
hist (array) – Histogram that was used for the binned fit
bins (array) – Array of bin edges used for the binned fit

pygama.pargen.lq_cal.calculate_time_means(df, lq_param, cal_energy_param, peak, sidebands=True)¶

Function for calculating the arithmetic mean and sigma for LQ events in a specified peak

pygama.pargen.lq_cal.fit_time_means(tstamps, means, reses)¶

pygama.pargen.lq_cal.get_fit_range(lq)¶

Function for determining the fit range for a given distribution of lq values

Return type:: tuple(float, float)

pygama.pargen.lq_cal.get_lq_hist(df, lq_param, cal_energy_param, peak, sidebands=True)¶

Function for getting a distribution of LQ values for a given peak. Returns a histogram of the LQ distribution as well as an array of bin edges

pygama.pargen.lq_cal.plot_classifier(lq_class, data, lq_param='LQ_Classifier', xrange=(800, 3000), yrange=(-10, 30), xn_bins=700, yn_bins=500, figsize=(12, 8), fontsize=12)¶

Return type:: figure

pygama.pargen.lq_cal.plot_drift_time_correction(lq_class, data, lq_param='LQ_Timecorr', figsize=(12, 8), fontsize=12)¶

Plots a 2D histogram of LQ versus effective drift time in a 6 keV window around the DEP. Additionally plots the fit results for the drift time correction.

Return type:: figure

pygama.pargen.lq_cal.plot_lq_cut_fit(lq_class, data, figsize=(12, 8), fontsize=12)¶

Plots the final histogram of LQ values for events in the DEP, and the fit results used for determining the cut value

Return type:: figure

pygama.pargen.lq_cal.plot_lq_mean_time(lq_class, data, lq_param='LQ_Timecorr', figsize=(12, 8), fontsize=12)¶

Plots the mean LQ value calculated for each given timestamp

Return type:: figure

pygama.pargen.lq_cal.plot_sf_vs_energy(lq_class, data, xrange=(900, 3000), n_bins=701, figsize=(12, 8), fontsize=12)¶

Plots the survival fraction as a function of energy

Return type:: figure

pygama.pargen.lq_cal.plot_spectra(lq_class, data, xrange=(900, 3000), n_bins=2101, xrange_inset=(1580, 1640), n_bins_inset=200, figsize=(12, 8), fontsize=12)¶

Plots a 2D histogram of the LQ classifier vs calibrated energy

Return type:: figure

pygama.pargen.lq_cal.plot_survival_fraction_curves(lq_class, data, figsize=(12, 8), fontsize=12)¶

Plots the survival fraction curves as a function of LQ cut values for every peak of interest

Return type:: figure

pygama.pargen.noise_optimization module¶

This module contains the functions for performing the filter optimisation. This happens with a grid search performed on ENC peak.

pygama.pargen.noise_optimization.calculate_spread(energies, percentile_low, percentile_high, n_samples)¶

pygama.pargen.noise_optimization.noise_optimization(tb_data, dsp_proc_chain, par_dsp, opt_dict, lh5_path, display=0)¶

This function calculates the optimal filter par. :param tb_data: raw table to run the macro on :type tb_data: str :param dsp_proc_chain: Path to minimal dsp config file :type dsp_proc_chain: str :param par_dsp: Dictionary with default dsp parameters :type par_dsp: str :param opt_dict: Dictionary with parameters for optimization :type opt_dict: str :param lh5_path: Name of channel to process, should be name of lh5 group in raw files :type lh5_path: str

Returns:: res_dict (dict)
Return type:: dict

pygama.pargen.noise_optimization.simple_gaussian_fit(energies, dx=1, sigma_thr=4, allowed_p_val=1e-20)¶

pygama.pargen.noise_optimization.simple_gaussian_guess(hist, bins, func, toll=0.2)¶

pygama.pargen.pz_correct module¶

This module is for extracting the pole zero constants from the decay tail

class pygama.pargen.pz_correct.PZCorrect(dsp_config, wf_field, debug_mode=False)¶

Bases: object

get_dpz_decay_constants(tb_data, percent_tau1_fit=0.1, percent_tau2_fit=0.2, offset_from_wf_max=10, superpulse_bl_idx=25, superpulse_window_width=13, display=0)¶

Gets values for the DPZ time constants in 3 stages:

Perform a linear fit to the start and end of the decaying tail of a superpulse
Use those initial guesses to seed a LSQ fit to a DPZ model of the sum of two decaying exponentials
Use the results of the model fit as initial guesses in a DSP routine that optimizes the flatness of the decaying tail

The first step in this process is to generate a superpulse from high-energy waveforms.

Parameters:

tb_data (Table) – Table containing high-energy event waveforms and daqenergy. Pulser events must have already been removed from this table
percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.
percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.
offset_from_wf_max (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.
superpulse_bl_idx (int) – The index at which to stop when computing the mean value of a baseline, used for DPZ
superpulse_window_width (int) – The window of acceptance for tp100s while selecting waveforms to make a superpulse, with a center set at the median tp100 value. If a tp100 falls outside this window, ignore this waveform for the superpulse
display (int) – An integer. If greater than 1, plots and shows the attempts to fit the long and short time constants. If greater than 0, saves the plot to a dictionary.

Returns:

tau_dict – dictionary of form {‘dpz’: {‘tau1’: decay_constant1, ‘tau2’: decay_constant2, ‘frac’: fraction}}
out_plot_dict – A dictionary containing monitoring plots

Return type:

dict

Notes

tau1 is the shorter time constant, tau2 is the longer, and frac is the amount of the larger time constant present in the sum of the two exponentials

get_single_decay_constant(tb_data, slope_param='tail_slope', display=0)¶

Finds the decay constant from the modal value of the tail slope after cuts and saves it to the specified json. Updates self.output_dict with tau value

Parameters:

slopes (-)
wfs (-)

plot_slopes(tb_data, slope_field, with_correction=False, display=0, figsize=(8, 6), fontsize=8)¶

plot_waveforms_after_correction(tb_data, wf_field, xlim=(0, 1024), ylim=None, norm_param=None, display=0, figsize=(8, 6), fontsize=8, n_waveforms=100, downsample=1)¶

pygama.pargen.pz_correct.dpz_model(ts, amp, tau1, tau2, f2)¶

Models the double pole zero function as A*[(1-f2)*exp(-t/tau1) + f2*exp(-t/tau2)], this is the expected shape of an HPGe waveform. Used in performing fits in dpz_model_fit()

Parameters:

ts (array) – Array of time point values, in samples
A – The overall amplitude of the waveform
tau1 (float) – The short time constant present
tau2 (float) – The long time constant present
f2 (float) – The fraction of the long time constant present in the overall waveform

Return type:

array

pygama.pargen.pz_correct.dpz_model_fit(waveform, percent_tau1_fit, percent_tau2_fit, idx_shift, plot)¶

Parameters:

waveform (array) – An array containing waveform data that has been baseline subtracted
percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.
percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.
idx_shift (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.
plot (int) – An integer, that if greater than 1 plots and shows the fit results, if greater than 0 saves the plots to a dictionary

Returns:

tau1 – A guess of the shorter double pole zero time constant
tau2 – A guess of the longer double pole zero time constant
frac2 – A guess of the fraction in the double pole zero of the longer time constant
out_plot_dict – A dictionary containing the output plots if requested

Return type:

tuple[float, float, float, dict]

Notes

Extracts the double pole zero parameters by fitting to an analytic model, utilizing the best-guess from linear fitting. Fits to the function f(t) = A*[(1-f2)*exp(-t/tau1)+f2*exp(-t/tau2)] It fits from the maximum of the waveform onwards to the end of the tail. The waveform must be baseline subtracted in order to get the best fit results

pygama.pargen.pz_correct.linear_dpz_fit(waveform, percent_tau1_fit, percent_tau2_fit, idx_shift, plot)¶

Parameters:

waveform (array) – An array containing waveform data
percent_tau1_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the start of the tail.
percent_tau2_fit (float) – The fractional percent of the length of the tail of the waveform to fit, used to fit the end of the tail.
idx_shift (int) – The number of indices off the maximum of the waveform to start the fit of the tail. Usually ~100 samples works fine.
plot (int) – An integer, if greater than 1 plots and shows the fit results, if greater than 0 saves the plot to a dictionary.

Returns:

tau1 – A guess of the shorter double pole zero time constant
tau2 – A guess of the longer double pole zero time constant
frac – A guess of the fraction in the double pole zero of the longer time constant
out_plot_dict – A dictionary containing the output plots if requested

Return type:

tuple[float, float, float, dict]

Notes

Extracts an initial guess of the double pole zero parameters by performing two linear fits to the log of the waveform. The first log-fit is to long time scales, and fits the long time constant tau2. This piece is subtracted off the waveform, and then the shorter time constant is fit. The average of the fractions returned by the intercepts of these fits is returned.

The waveform must be baseline subtracted to provide accurate fit results!

pygama.pargen.pz_correct.tp100_align(wfs, tp100_window_width, tp100s)¶

Align provided waveforms at their maximum, so that an average over all the waveforms creates a valid superpulse to fit using dpz_model_fit(). Do this without sacrificing too much of the length of the decaying tail

Parameters:

wfs (array) – An array of arrays containing waveforms
tp100_window_width (int) – The window of acceptance, with a center set at the median tp100 value. If a tp100 falls outside this window, ignore this waveform for the superpulse
tp100s (array) – An array containing the indices of the maximums of the waveforms, in samples

Returns:

time_aligned_wfs – An array of waveforms that are all aligned at their maximal values

Return type:

array

pygama.pargen.survival_fractions module¶

This module provides functions for calculating survival fractions for a cut.

pygama.pargen.survival_fractions.compton_sf(cut_param, low_cut_val, high_cut_val=None, mode='greater', data_mask=None)¶

Function for calculating the survival fraction of a cut in a compton region with a simple counting analysis.

Parameters:

cut_param (array) – array of the cut parameter for the survival fraction calculation, should have the same length as energy
low_cut_val (float) – value of the cut parameter to be used for the survival fraction calculation
high_cut_val (float) – upper value for the cut parameter to have a range in the cut value
mode (str) – mode of the cut, either “greater” or “less”
data_mask (array) – mask for the data to be used in the fit

Returns:

sf (dict) – dictionary containing the low cut value, survival fraction and error on the survival fraction

Return type:

dict

pygama.pargen.survival_fractions.compton_sf_sweep(energy, cut_param, final_cut_value, data_mask=None, cut_range=(-5, 5), n_samples=51, mode='greater')¶

Function sweeping through cut values and calculating the survival fraction for each value using a simple counting analysis.

Parameters:

energy (array) – array of energies
cut_param (array) – array of the cut parameter for the survival fraction calculation, should have the same length as energy
final_cut_val (float) – value of the final cut parameter to be used for the survival fraction calculation
data_mask (array) – mask for the data to be used in the fit
cut_range (tuple) – range of the cut values to be swept through
n_samples (int) – number of samples to be taken in the cut range
mode (str) – mode of the cut, either “greater” or “less”

Returns:

out_df (Dataframe) – Dataframe of cut values with the survival fraction and error
sf (float) – survival fraction
err (float) – error on the survival fraction

Return type:

tuple(pd.DataFrame, float, float)

pygama.pargen.survival_fractions.energy_guess(energy, func_i, fit_range=None, bin_width=1, peak=None, eres=None)¶: Simple guess for peak fitting

pygama.pargen.survival_fractions.fail_pdf_gos(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, hstep1, hstep2)¶: pdf for gauss on step fit reparamerised to calculate the efficiency of the cut this is the cut pdf

pygama.pargen.survival_fractions.fail_pdf_hpge(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, htail, tau, hstep1, hstep2)¶: pdf for hpge peak fit reparamerised to calculate the efficiency of the cut this is the cut pdf

pygama.pargen.survival_fractions.fix_all_but_nevents(func)¶: Fixes all parameters except the number of signal and background events and their efficiencies Returns: Sequence list of fixed indexes for fitting and mask for parameters

pygama.pargen.survival_fractions.get_bounds(func, parguess)¶: Gets bounds for the fit parameters

pygama.pargen.survival_fractions.get_sf_sweep(energy, cut_param, final_cut_value=None, peak=1592.5, eres_pars=None, data_mask=None, cut_range=(-5, 5), n_samples=26, mode='greater', fit_range=None, debug_mode=False)¶

Function sweeping through cut values and calculating the survival fraction for each value using a fit to the surviving and failing enegry distributions.

Parameters:

energy (array) – array of energies
cut_param (array) – array of the cut parameter for the survival fraction calculation, should have the same length as energy
final_cut_val (float) – value of the final cut parameter to be used for the survival fraction calculation
peak (float) – energy of the peak to be fitted
eres_pars (float) – energy resolution parameter for the peak
data_mask (array) – mask for the data to be used in the fit
cut_range (tuple) – range of the cut values to be swept through
n_samples (int) – number of samples to be taken in the cut range
mode (str) – mode of the cut, either “greater” or “less”
fit_range (tuple) – range of the fit in keV
debug_mode (bool) – option to raise an error if there is an issue with

Returns:

out_df (Dataframe) – Dataframe of cut values with the survival fraction and error
sf (float) – survival fraction
err (float) – error on the survival fraction

Return type:

tuple(pd.DataFrame, float, float)

pygama.pargen.survival_fractions.get_survival_fraction(energy, cut_param, cut_val, peak, eres_pars, fit_range=None, high_cut=None, pars=None, data_mask=None, mode='greater', func=<pygama.math.functions.sum_dists.SumDists object>, fix_step=True, display=0)¶

Function for calculating the survival fraction of a cut using a fit to the surviving and failing energy distributions.

Parameters:

energy (array) – array of energies
cut_param (array) – array of the cut parameter for the survival fraction calculation, should have the same length as energy
cut_val (float) – value of the cut parameter to be used for the survival fraction calculation
peak (float) – energy of the peak to be fitted
eres_pars (float) – energy resolution parameter for the peak
fit_range (tuple) – range of the fit in keV
high_cut (float) – upper value for the cut parameter to have a range in the cut value
pars (iMinuit ValueView) – initial parameters for the fit
data_mask (array) – mask for the data to be used in the fit
mode (str) – mode of the cut, either “greater” or “less”
func (function) – function to be used in the fit
fix_step (bool) – option to fix the step parameters in the fit
display (int) – option to display the fit if greater than 1

Returns:

sf (float) – survival fraction
err (float) – error on the survival fraction
values (iMinuit ValueView) – values of the parameters of the fit
errors (iMinuit ValueView) – errors on the parameters of the fit

pygama.pargen.survival_fractions.pass_pdf_gos(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, hstep1, hstep2)¶: pdf for gauss on step fit reparamerised to calculate the efficiency of the cut this is the passing pdf

pygama.pargen.survival_fractions.pass_pdf_hpge(x, x_lo, x_hi, n_sig, epsilon_sig, n_bkg, epsilon_bkg, mu, sigma, htail, tau, hstep1, hstep2)¶: pdf for hpge peak fit reparamerised to calculate the efficiency of the cut. this is the passing pdf

pygama.pargen.survival_fractions.update_guess(func, parguess, energies)¶: Updates guess for the number of signal and background events

pygama.pargen.utils module¶

pygama.pargen.utils.convert_to_minuit(pars, func)¶

pygama.pargen.utils.load_data(files, lh5_path, cal_dict, params, cal_energy_param='cuspEmax_ctc_cal', threshold=None, return_selection_mask=False)¶

Loads parameters from data files. Applies calibration to cal_energy_param and uses this to apply a lower energy threshold.

files: file or list of files or dict pointing from timestamps to lists of files
lh5_path: path to table in files
cal_dict: dictionary with operations used to apply calibration constants
params: list of parameters to load from file
cal_energy_param: name of uncalibrated energy parameter
threshold: lower energy threshold for events to load
return_selection_map: if True, return selection mask for threshold along with data

Return type:: pd.DataFrame | tuple(pd.DataFrame, np.array)

pygama.pargen.utils.return_nans(input)¶