pygama.math package¶

Useful functions for performing quick computations of statistical distributions, as well as helper functions for histogramming data

Subpackages¶

pygama.math.functions package

Submodules¶

pygama.math.binned_fitting module¶

pygama convenience functions for fitting binned data

pygama.math.binned_fitting.fit_binned(func, hist, bins, var=None, guess=None, cost_func='LL', extended=True, simplex=False, bounds=None, fixed=None)¶

Do a binned fit to a histogram.

Default is extended Log Likelihood fit, with option for either Least Squares or other cost function.

Parameters:

func (Callable) – the function to fit, if using LL as method needs to be a cdf
hist (ndarray) – histogrammed data
bins (ndarray) – histogrammed data
var (ndarray) – histogrammed data
guess (ndarray) – initial guess parameters
cost_func (str) – cost function to use
extended (bool) – run extended or non extended fit
simplex (bool) – whether to include a round of simpson minimisation before main minimisation
bounds (tuple[tuple[float, float], ...] | None) – list of tuples with bounds can be None, e.g. [(0,None), (0,10)]
fixed (tuple[int, ...] | None) – list of parameter indices to fix

Returns:

coeff – Returned fit parameters
error – Iminuit errors
cov_matrix – Covariance matrix

Return type:

tuple[ndarray, …]

pygama.math.binned_fitting.gauss_mode(hist, bins, **kwargs)¶

Alias for gauss_mode_max that just returns the mode (position) of a peak

Returns:

mode (the estimated x-position of the maximum)
dmode (the uncertainty in the mode)

Return type:

tuple[float, float]

pygama.math.binned_fitting.gauss_mode_max(hist, bins, **kwargs)¶

Alias for gauss_mode_width_max that just returns the max and mode

See also

gauss_mode_width_max

Returns:

pars (ndarray with the parameters (mode, maximum) of the gaussian fit) – mode : the estimated x-position of the maximum maximum : the estimated maximum value of the peak
cov (2x2 ndarray of floats) – The covariance matrix for the 2 parameters in pars

Return type:

tuple[ndarray, …]

Examples

>>> import pygama.math.histograms as pgh
>>> from numpy.random import normal
>>> import pygama.math.peak_fitting as pgf
>>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5))
>>> pgf.gauss_mode_max(hist, bins, var=var, n_bins=20)

pygama.math.binned_fitting.gauss_mode_width_max(hist, bins, var=None, mode_guess=None, n_bins=5, cost_func='Least Squares', inflate_errors=False, gof_method='var', sigma_limit=None)¶

Get the max, mode, and width of a peak based on gauss fit near the max Returns the parameters of a gaussian fit over n_bins in the vicinity of the maximum of the hist (or the max near mode_guess, if provided). This is equivalent to a Taylor expansion around the peak maximum because near its maximum a Gaussian can be approximated by a 2nd-order polynomial in x:

\[A \exp[ -(x-\mu)^2 / 2 \sigma^2 ] \simeq A [ 1 - (x-\mu)^2 / 2 \sigma^2 ] = A - (1/2!) (A/\sigma^2) (x-\mu)^2\]

The advantage of using a gaussian over a polynomial directly is that the gaussian parameters are the ones we care about most for a peak, whereas for a poly we would have to extract them after the fit, accounting for covariances. The gaussian also better approximates most peaks farther down the peak. However, the gauss fit is nonlinear and thus less stable.

Parameters:

hist (ndarray) – The values of the histogram to be fit
bins (ndarray) – The bin edges of the histogram to be fit
var (ndarray | None) – The variances of the histogram values. If not provided, square-root variances are assumed.
mode_guess (float | None) – An x-value (not a bin index!) near which a peak is expected. The algorithm fits around the maximum within +/- n_bins of the guess. If not provided, the center of the max bin of the histogram is used.
n_bins (int | None) – The number of bins (including the max bin) to be used in the fit. Also used for searching for a max near mode_guess
cost_func (str) – Passed to fit_binned()
inflate_errors (bool | None) – If true, the parameter uncertainties are inflated by sqrt(chi2red) if it is greater than 1
gof_method (str) – method flag for goodness_of_fit
sigma_limit (float | None) – upper bound on sigma

Returns:

pars (ndarray containing the parameters (mode, sigma, maximum) of the gaussian fit) –
- mode : the estimated x-position of the maximum
- sigma : the estimated width of the peak. Equivalent to a gaussian width (sigma), but based only on the curvature within n_bins of the peak. Note that the Taylor-approxiamted curvature of the underlying function in the vicinity of the max is given by max / sigma^2
- maximum : the estimated maximum value of the peak
cov (3x3 ndarray of floats) – The covariance matrix for the 3 parameters in pars

Return type:

tuple[ndarray, …]

pygama.math.binned_fitting.goodness_of_fit(hist, bins, var, func, pars, method='var', scale_bins=False)¶

Compute chisq and dof of fit

Parameters:

hist (ndarray) – histogram data. var can be None if hist is integer counts
bins (ndarray) – histogram data. var can be None if hist is integer counts
var (ndarray) – histogram data. var can be None if hist is integer counts
func (Callable) – the function that was fit to the hist
pars (ndarray) – the best-fit pars of func. Assumes all pars are free parameters
method (str) – Sets the choice of “denominator” in the chi2 sum ‘var’: user passes in the variances in var (must not have zeros) ‘Pearson’: use func (hist must contain integer counts) ‘Neyman’: use hist (hist must contain integer counts and no zeros)

Returns:

chisq – the summed up value of chisquared
dof – the number of degrees of freedom

Return type:

tuple[float, int]

pygama.math.binned_fitting.poisson_gof(pars, func, hist, bins, is_integral=False, **kwargs)¶

Calculate the goodness of fit for the Poisson likelihood.

Parameters:

pars (ndarray) – The parameters of the function, func
func (Callable) – The function that was fit
hist (ndarray) – The data that were fit
bins (ndarray) – The bins of the histogram that was fit to
is_integral (bool) – Tells get_bin_estimates if the function is an integral function

Returns:

Poisson G.O.F.

Return type:

float

Notes

The Poisson likelihood does not give a good GOF until the counts are very high and all the poisson stats are roughly gaussian and you don’t need it anyway. But the G.O.F. is calculable for the Poisson likelihood. So we do it here.

pygama.math.binned_fitting.taylor_mode_max(hist, bins, var=None, mode_guess=None, n_bins=5)¶

Get the max and mode of a peak based on Taylor exp near the max Returns the amplitude and position of a peak based on a poly fit over n_bins in the vicinity of the maximum of the hist (or the max near mode_guess, if provided)

Parameters:

hist (ndarray) – The values of the histogram to be fit. Often: send in a slice around a peak
bins (ndarray) – The bin edges of the histogram to be fit
var (ndarray | None) – The variances of the histogram values. If not provided, square-root variances are assumed.
mode_guess (float | None) – An x-value (not a bin index!) near which a peak is expected. The algorithm fits around the maximum within +/- n_bins of the guess. If not provided, the center of the max bin of the histogram is used.
n_bins (int) – The number of bins (including the max bin) to be used in the fit. Also used for searching for a max near mode_guess

Returns:

pars (2-tuple with the parameters (mode, max) of the fit) – mode : the estimated x-position of the maximum maximum : the estimated maximum value of the peak
cov (2x2 matrix of floats) – The covariance matrix for the 2 parameters in pars

Return type:

tuple[ndarray, …]

Examples

>>> import pygama.analysis.histograms as pgh
>>> from numpy.random import normal
>>> import pygama.analysis.peak_fitting as pgf
>>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5))
>>> pgf.taylor_mode_max(hist, bins, var, n_bins=5)

pygama.math.distributions module¶

Contains a list of distribution functions, all implemented using Numba’s numba.jit() to take advantage of the just-in-time speed boost.

pygama.math.histogram module¶

pygama convenience functions for 1D histograms.

1D hists in pygama require 3 things available from all implementations of 1D histograms of numerical data in python: hist, bins, and var: - hist: an array of histogram values - bins: an array of bin edges - var: an array of variances in each bin If var is not provided, pygama assuems that the hist contains “counts” with variance = counts (Poisson stats)

These are just convenience functions, provided for your convenience. Hopefully they will help you if you need to do something trickier than is provided (e.g. 2D hists).

pygama.math.histogram.better_int_binning(x_lo=0, x_hi=None, dx=None, n_bins=None)¶

Get a good binning for integer data.

Guarantees an integer bin width.

At least two of x_hi, dx, or n_bins must be provided.

Parameters:

x_lo (float) – Desired low x value for the binning
x_hi (float | None) – Desired high x value for the binning
dx (float | None) – Desired bin width
n_bins (float | None) – Desired number of bins

Returns:

x_lo – int values for best x_lo
x_hi – int values for best x_hi, returned if x_hi is not None
dx – best int bin width, returned if arg dx is not None
n_bins – best int n_bins, returned if arg n_bins is not None

Return type:

tuple[int, int, int, int]

pygama.math.histogram.find_bin(x, bins)¶

Returns the index of the bin containing x Returns -1 for underflow, and len(bins) for overflow For uniform bins, jumps directly to the appropriate index. For non-uniform bins, binary search is used.

Parameters:

x (float) – The value to search for amongst the bins
bins (ndarray) – The input array of bin-edges

Returns:

bin_idx – Index of the bin containing x
TODO (replace np.searchsorted with for loops, numba will speed this function up then)

Return type:

int

pygama.math.histogram.get_bin_centers(bins)¶

Returns an array of bin centers from an input array of bin edges. Works for non-uniform binning. Note: a new array is allocated

Parameters:: bins (ndarray) – The input array of bin-edges
Returns:: bin_centers – The array of bin centers
Return type:: ndarray

pygama.math.histogram.get_bin_estimates(pars, func, bins, is_integral=False, **kwargs)¶

Bin expected means are estimated by f(bin_center)*bin_width. Supply an integrating function to compute the integral over the bin instead. TODO: make default integrating function a numerical method that is off by default.

Parameters:

pars (ndarray) – The parameters of the function, func
func (Callable) – The function at which to evaluate the bin centers
bins (ndarray) – Array of histogram bins
is_integral (bool) – if True, then func is an integral function

Returns:

f(bin_center)*bin_width – The expected mean of a bin

Return type:

ndarray

See also

get_bin_widths: Returns the width of the bins
get_bin_centers: Finds the bin centers of the supplied bins

pygama.math.histogram.get_bin_widths(bins)¶

Returns an array of bin widths from an input array of bin edges. Works for non-uniform binning.

Parameters:: bins (ndarray) – The input array of bin-edges
Returns:: bin_widths – An array of bin widths
Return type:: ndarray

pygama.math.histogram.get_fwfm(fraction, hist, bins, var=None, mx=None, dmx=0, bl=0, dbl=0, method='bins_over_f', n_slope=3)¶

Estimate the full width at some fraction of the max of data in a histogram

Typically used by sending slices around a peak. Searches about argmax(hist) for the peak to fall by [fraction] from mx to bl

Parameters:

fraction (float) – The fractional amplitude at which to evaluate the full width
hist (ndarray) – The histogram data array containing the peak
bins (ndarray) – An array of bin edges for the histogram
var (ndarray | None) – An array of histogram variances. Used with the ‘fit_slopes’ method
mx (float | tuple[float, float] | None) – The value to use for the max of the peak. If None, np.amax(hist) is used.
dmx (float | None) – The uncertainty in mx
bl (float | tuple[float, float] | None) – Used to specify an offset from which to estimate the FWFM.
dbl (float | None) – The uncertainty in the bl
method (str) –

‘bins_over_f’the simplest method: just take the difference in the bin
centers that are over [fraction] of max. Only works for high stats and FWFM/bin_width >> 1

’interpolate’interpolate between the bins that cross the [fraction]
line. Works well for high stats and a reasonable number of bins. Uncertainty incorporates var, if provided.

’fit_slopes’fit over n_slope bins in the vicinity of the FWFM and
interpolate to get the fractional crossing point. Works okay even when stats are moderate but requires enough bins that dx traversed by n_slope bins is approximately linear. Incorporates bin variances in fit and final uncertainties if provided.
n_slope (int) – Number of bins in the vicinity of the FWFM used to interpolate the fractional crossing point with the ‘fit_slopes’ method

Returns:

fwfm, dfwfm – fwfm: the full width at [fraction] of the maximum above bl dfwfm: the uncertainty in fwfm

Return type:

tuple[float, float]

Examples

>>> import pygama.analysis.histograms as pgh
>>> from numpy.random import normal
>>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5))
>>> pgh.get_fwfm(0.5, hist, bins, var, method='bins_over_f')
(2.2, 0.15919638684132664) # may vary

>>> pgh.get_fwfm(0.5, hist, bins, var, method='interpolate')
(2.2041666666666666, 0.09790931254396479) # may vary

>>> pgh.get_fwfm(0.5, hist, bins, var, method='fit_slopes')
(2.3083363869003466, 0.10939486522749278) # may vary

pygama.math.histogram.get_fwhm(hist, bins, var=None, mx=None, dmx=0, bl=0, dbl=0, method='bins_over_f', n_slope=3)¶

Convenience function for the FWHM of data in a histogram

Typically used by sending slices around a peak. Searches about argmax(hist) for the peak to fall by [fraction] from mx to bl

Parameters:

fraction – The fractional amplitude at which to evaluate the full width
hist (ndarray) – The histogram data array containing the peak
bins (ndarray) – An array of bin edges for the histogram
var (ndarray | None) – An array of histogram variances. Used with the ‘fit_slopes’ method
mx (float | tuple[float, float] | None) – The value to use for the max of the peak. If None, np.amax(hist) is used.
dmx (float | None) – The uncertainty in mx
bl (float | tuple[float, float] | None) – Used to specify an offset from which to estimate the FWFM.
dbl (float | None) – The uncertainty in the bl
method (str) –

‘bins_over_f’the simplest method: just take the difference in the bin
centers that are over [fraction] of max. Only works for high stats and FWFM/bin_width >> 1

’interpolate’interpolate between the bins that cross the [fraction]
line. Works well for high stats and a reasonable number of bins. Uncertainty incorporates var, if provided.

’fit_slopes’fit over n_slope bins in the vicinity of the FWFM and
interpolate to get the fractional crossing point. Works okay even when stats are moderate but requires enough bins that dx traversed by n_slope bins is approximately linear. Incorporates bin variances in fit and final uncertainties if provided.
n_slope (int) – Number of bins in the vicinity of the FWFM used to interpolate the fractional crossing point with the ‘fit_slopes’ method

Returns:

fwhm, dfwhm – fwfm: the full width at half of the maximum above bl dfwfm: the uncertainty in fwhm

Return type:

tuple[float, float]

See also

get_fwfm: Function that computes the FWFM

pygama.math.histogram.get_gaussian_guess(hist, bins)¶

given a hist, gives guesses for mu, sigma, and amplitude

Parameters:

hist (ndarray) – Array of histogrammed data
bins (ndarray) – Array of histogram bins

Returns:

guess_mu – guess for the mu parameter of a Gaussian
guess_sigma – guess for the sigma parameter of a Gaussian
guess_area – guess for the A parameter of a Gaussian

Return type:

tuple[float, float, float]

pygama.math.histogram.get_hist(data, bins=None, range=None, dx=None, wts=None)¶

return hist, bins, var after binning data

This is just a wrapper for humba.histogram, with optional weights for each element and proper computing of variances.

Note: there are no overflow / underflow bins.

Available binning methods:

Default (no binning arguments) : 100 bins over an auto-detected range

bins=n, range=(x_lo, x_hi) : n bins over the specified range (or leave range=None for auto-detected range)

bins=[str] : use one of np.histogram’s automatic binning algorithms

bins=bin_edges_array : array lower bin edges, supports non-uniform binning

dx=dx, range=(x_lo, x_hi): bins of width dx over the specified range. Note: dx overrides the bins argument!

Parameters:

data (ndarray) – The array of data to be histogrammed
bins (int | ndarray | str | None) – int: the number of bins to be used in the histogram array: an array of bin edges to use str: the name of the np.histogram automatic binning algorithm to use If not provided, humba.histogram’s default auto-binning routine is used
range (tuple[float, float] | None) – (x_lo, x_high) is the tuple of low and high x values to uses for the very ends of the bin range. If not provided, np.histogram chooses the ends based on the data in data
dx (float | None) – Specifies the bin width. Overrides “bins” if both arguments are present
wts (float | ndarray | None) – Array of weights for each bin. For example, if you want to divide all bins by a time T to get the bin contents in count rate, set wts = 1/T. Variances will be computed for each bin that appropriately account for each data point’s weighting.

Returns:

hist – the values in each bin of the histogram
bins – an array of bin edges: bins[i] is the lower edge of the ith bin. Note: it includes the upper edge of the last bin and does not include underflow or overflow bins. So len(bins) = len(hist) + 1
var – array of variances in each bin of the histogram

Return type:

tuple[ndarray, …]

pygama.math.histogram.get_i_local_extrema(data, delta)¶

Get lists of indices of the local maxima and minima of data

The “local” extrema are those maxima / minima that have heights / depths of at least delta. Converted from MATLAB script at: http://billauer.co.il/peakdet.html

Parameters:

data (array-like) – the array of data within which extrema will be found
delta (scalar) – the absolute level by which data must vary (in one direction) about an extremum in order for it to be tagged

Returns:

imaxes, imins (2-tuple ( array, array )) – A 2-tuple containing arrays of variable length that hold the indices of the identified local maxima (first tuple element) and minima (second tuple element)

pygama.math.histogram.get_i_local_maxima(data, delta)¶

pygama.math.histogram.get_i_local_minima(data, delta)¶

pygama.math.histogram.plot_hist(hist, bins, var=None, show_stats=False, stats_hloc=0.75, stats_vloc=0.85, fill=False, fillcolor='r', fillalpha=0.2, **kwargs)¶

Plot a step histogram, with optional error bars

Parameters:

hist (ndarray) – The histogram data array containing the peak
bins (ndarray) – An array of bin edges for the histogram
var (ndarray | None) – An array of histogram variances. Used with the ‘fit_slopes’ method
show_stats (bool) – If True, shows the mean, mean error, and standard deviation of the histogram on the plot
stats_hloc (float) – matplotlib.pyplot horizontal location to place the stats
stats_vloc (float) – matplotlib.pyplot vertical location to place the stats
fill (bool) – If True, fills in a step histogram when plotting
fill_color – Color to fill the step histogram if fill is True
fill_alpha – Alpha amount if fill is True

pygama.math.histogram.range_slice(x_min, x_max, hist, bins, var=None)¶

Get the histogram bins and values for a given slice of the range

Parameters:

x_min (float) – Lower endpoint of the range
x_min – Upper endpoint of the range
hist (ndarray) – The histogrammed data to search through
bins (ndarray) – The histogrammed data to search through
var (ndarray | None) – The histogrammed data to search through

Return type:

tuple[ndarray, …]

See also

find_bin: for parameters and return values

pygama.math.hpge_peak_fitting module¶

pygama convenience functions for fitting hpge peak shape data

pygama.math.hpge_peak_fitting.hpge_peak_fwfm(sigma, htail, tau, frac_max=0.5, cov=None)¶: Return the FWHM of the radford_peak function, ignoring background and step components. If calculating error also need the normalisation for the step function.

pygama.math.hpge_peak_fitting.hpge_peak_fwhm(sigma, htail, tau, cov=None)¶

Return the FWHM of the hpge_peak function, ignoring background and step components. If calculating error also need the normalisation for the step function.

Parameters:

sigma (float) – The width of the hpge_peak
htail (float) – The height of the tail in the hpge_peak
tau (float) – The characteristic scale in the extended Gaussian in the hpge_peak
cov (float | None) – The covariant matrix of the previous parameters
Returns –

FWHM, FWHM_uncertainty
The FWHM of the hpge_peak and its uncertainty

Return type:

tuple[float, float]

pygama.math.hpge_peak_fitting.hpge_peak_mode(mu, sigma, htail, tau, cov=None)¶

pygama.math.hpge_peak_fitting.hpge_peak_parameter_gradient(e, pars, step_norm)¶

Computes the gradient of the hpge_peak parameters

Parameters:

e (float) – The energy of the hpge_peak
pars (ndarray) – The parameters of the hpge_peak fit
step_norm (float) – The normalization of the background step function in the hpge_peak

Returns:

gradient – gradient of the n_sig, mu, sigma, h_tail, tau, n_bkg, and hstep parameters of the HPGe peak

Return type:

ndarray

pygama.math.hpge_peak_fitting.hpge_peak_peakshape_derivative(e, pars, step_norm)¶

Computes the derivative of the hpge_peak peak shape

Parameters:

e (ndarray) – The array of energies of the hpge_peak
pars (ndarray) – The parameters of the hpge_peak fit
step_norm (float) – The normalization of the background step function in the hpge_peak

Returns:

derivative – the derivative of the hpge_peak

Return type:

ndarray

pygama.math.least_squares module¶

pygama convenience functions for linearly fitting data

pygama.math.least_squares.fit_simple_scaling(x, y, var=1)¶

Fast computation of weighted linear least squares fit to a simple scaling

I.e. y = scale * x. Returns the best fit scale parameter and its variance.

Parameters:

x (ndarray) – x values for the fit
y (ndarray) – y values for the fit
var (float | ndarray | None) – The variances for each y-value

Returns:

scale, scale_var – The scale parameter and its variance

Return type:

tuple[float, float]

pygama.math.least_squares.linear_fit_by_sums(x, y, var=1)¶

Fast computation of weighted linear least squares fit to a linear model

Note: doesn’t compute covariances. If you want covariances, just use polyfit

Parameters:

x (ndarray) – x values for the fit
y (ndarray) – y values for the fit
var (float | ndarray | None) – The variances for each y-value

Returns:

(m, b) – The slope (m) and y-intercept (b) of the best fit (in the least-squares sense) of the data to y = mx + b

Return type:

tuple[float, float]

pygama.math.rebin module¶

Adaptive rebinning routines for 1D histograms.

Two adaptive strategies are offered:

Bayesian blocks (Scargle et al. 2013), backed by hepstats.modeling.bayesian_blocks().
Peak-aware: peaks are detected and each peak region is collapsed into a single bin. The continuum is either left at bblocks spacing or rebinned to a uniform width.

Functions taking a raw data array are named hist_*; those taking a finely-binned hist.Hist are named rebin_*. All routines return a Variable-axis Weight-storage hist.Hist.

Public functions¶

hist_bblocks(): histogram raw data using Bayesian-blocks edges.
hist_bblocks_with_peaks(): histogram raw data with bblocks edges on the continuum and one merged bin per detected peak.
hist_uniform_with_peaks(): histogram raw data with a uniform continuum binning and one merged bin per detected peak.
rebin_bblocks(): rebin an existing fine histogram onto Bayesian-blocks edges.
rebin_bblocks_with_peaks(): rebin an existing fine histogram onto bblocks edges and merge each detected peak into a single bin.
rebin_uniform_with_peaks(): rebin an existing fine histogram onto a uniform continuum binning and merge each detected peak into a single bin.

Rebinning a histogram with the edges of another¶

To rebin a fine histogram h_fine onto the edges of another histogram h_ref, use hist.rebin() directly (the edges of h_ref must be a subset of those of h_fine):

>>> import hist as bh
>>> h_out = h_fine[:: bh.rebin(edges=h_ref.axes[0].edges.tolist())]

pygama.math.rebin._bblocks_edges(data, **kwargs)¶

Bblocks edges; last edge nudged by 1 ULP for right-exclusive axes.

Return type:: ndarray

pygama.math.rebin._bblocks_edges_from_hist(h, *, p0=0.05)¶

Bayesian-blocks edges from a fine histogram, snapped to its grid.

Empty bins are dropped to avoid the log(0) warning hepstats raises on zero-weight events.

Return type:: ndarray

pygama.math.rebin._collapse_peaks(h, *, n_sigma=5.0, width_factor=5.0, max_bin_width=None)¶

Merge bins inside each detected peak region into a single bin.

Return type:: Hist

pygama.math.rebin._find_peak_regions(h, *, n_sigma=5.0, width_factor=5.0, max_bin_width=None)¶

Detect peaks and return (n_regions, 2) merged region edges.

Edges are in axis units; the array is empty if no peaks pass the significance threshold. Peaks are strict local maxima of the rate (counts/width), filtered by prominence * w_peak / sqrt(n_peak) against n_sigma. When max_bin_width is given, candidates sitting on a bin wider than the cap are also rejected (line-like features have resolution-bounded widths; wide bins flag Compton edges and smooth-continuum plateaus). Each surviving peak is grown outward via _walk_peak() and overlapping regions are merged.

Return type:: ndarray

pygama.math.rebin._prepare_data(data, prebin_low, prebin_high)¶

Flatten awkward inputs and apply the optional range mask.

Return type:: tuple[ndarray, float, float]

pygama.math.rebin._snap_indices(fine_edges, target_edges)¶

Nearest-edge indices into fine_edges for each target_edges.

Endpoints are clamped to 0 and len(fine_edges) - 1 and duplicates are removed, so the result is a strictly increasing integer array suitable as group boundaries for hist.rebin().

Return type:: ndarray

pygama.math.rebin._uniform_edges_with_peaks(low, high, *, bin_width, peak_regions)¶

Uniform grid over [low, high] with peak regions as single bins.

Any uniform edge strictly inside a peak region, OR within bin_width of one of its boundaries, is dropped: the peak region’s neighboring continuum bins absorb the would-be fragment, making them up to 2 * bin_width wide instead. Uniform edges exactly on a peak boundary are kept. Assumes peak_regions is sorted and non-overlapping.

Return type:: ndarray

pygama.math.rebin._walk_peak(start, step, rate, widths, width_cap, n)¶

Walk outward from a peak bin and return the farthest index reached.

Steps in direction step (-1 or +1), stopping when the rate stops descending, the next bin’s width reaches width_cap, or the next bin is a strict local minimum of rate. At the array boundary the missing neighbor is treated as +inf.

Return type:: int

pygama.math.rebin.hist_bblocks(data, *, prebin_width=None, prebin_low=None, prebin_high=None, p0=0.05)¶

Histogram an unbinned data array using Bayesian-blocks edges.

For large samples, pass prebin_width to first bin the data on a uniform fine grid and then call rebin_bblocks() – this is much faster than running bblocks on the raw data directly. Awkward inputs are flattened before binning.

Parameters:

data (ndarray | Array) – Array of measurement values (numpy or awkward).
prebin_width (float | None) – Width of the uniform pre-binning grid. Should be a few times finer than the narrowest feature you want to resolve (e.g. ~0.5 keV for HPGe lines with ~3 keV FWHM).
prebin_low (float | None) – Histogram range. Default to data.min() / data.max(). Data outside the range is discarded. Only valid together with prebin_width. The upper edge may overshoot prebin_high by up to prebin_width so the range is covered by an integer number of bins.
prebin_high (float | None) – Histogram range. Default to data.min() / data.max(). Data outside the range is discarded. Only valid together with prebin_width. The upper edge may overshoot prebin_high by up to prebin_width so the range is covered by an integer number of bins.
p0 (float) – Bblocks sensitivity: lower values give coarser bins, higher values finer. Typical 0.01-0.1.

Returns:

h – A Variable-axis Weight-storage histogram filled with data.

Return type:

Hist

Examples

>>> from pygama.math.rebin import hist_bblocks
>>> h = hist_bblocks(energies, prebin_width=0.5, prebin_low=0, prebin_high=3000)

pygama.math.rebin.hist_bblocks_with_peaks(data, *, prebin_width=None, prebin_low=None, prebin_high=None, p0=0.05, n_sigma=5.0, width_factor=5.0, max_bin_width=None)¶

Histogram raw data with Bayesian-blocks edges and collapsed peaks.

Equivalent to _collapse_peaks(hist_bblocks(data, ...)).

Parameters:

data (ndarray | Array) – See hist_bblocks().
prebin_width (float | None) – See hist_bblocks().
prebin_low (float | None) – See hist_bblocks().
prebin_high (float | None) – See hist_bblocks().
p0 (float) – See hist_bblocks().
n_sigma (float) – See rebin_bblocks_with_peaks().
width_factor (float) – See rebin_bblocks_with_peaks().
max_bin_width (float | None) – See rebin_bblocks_with_peaks().

Returns:

h – A Variable-axis Weight-storage histogram.

Return type:

Hist

Examples

Starting with a large data array, prebin it for performance and use a maximum peak width of 10.

>>> from pygama.math.rebin import hist_bblocks_with_peaks
>>> data = [...]
>>> h = hist_bblocks_with_peaks(
...     data, prebin_width=0.5, prebin_low=0, prebin_high=3000, max_bin_width=10
... )

pygama.math.rebin.hist_uniform_with_peaks(data, *, bin_width, prebin_width=None, prebin_low=None, prebin_high=None, p0=0.05, n_sigma=5.0, width_factor=5.0, max_bin_width=None)¶

Histogram raw data with uniform continuum bins and collapsed peaks.

Parameters:

data (ndarray | Array) – See hist_bblocks().
prebin_width (float | None) – See hist_bblocks().
p0 (float) – See hist_bblocks().
prebin_low (float | None) – See hist_bblocks(); here they also define the output range.
prebin_high (float | None) – See hist_bblocks(); here they also define the output range.
bin_width (float) – See rebin_uniform_with_peaks().
n_sigma (float) – See rebin_bblocks_with_peaks().
width_factor (float) – See rebin_bblocks_with_peaks().
max_bin_width (float | None) – See rebin_bblocks_with_peaks().

Returns:

h – A Variable-axis Weight-storage histogram filled with data.

Return type:

Hist

Examples

Starting with a large data array, prebin it for performance and use a maximum peak width of 10. In between the peaks use an uniform bin width of 2.

>>> from pygama.math.rebin import hist_uniform_with_peaks
>>> data = [...]
>>> h = hist_uniform_with_peaks(
...     energies,
...     bin_width=2,
...     prebin_width=0.5,
...     prebin_low=0,
...     prebin_high=3000,
...     max_bin_width=10,
... )

pygama.math.rebin.rebin_bblocks(h, *, p0=0.05)¶

Rebin a fine 1D histogram using Bayesian-blocks adaptive binning.

Each fine bin is treated as a weighted event (weight = bin count); the resulting block edges are snapped to the input grid.

Parameters:

h (Hist) – Input finely-binned 1D histogram.
p0 (float) – See hist_bblocks().

Returns:

out – A Variable-axis Weight-storage histogram whose edges are a subset of the input edges.

Return type:

Hist

Examples

>>> from pygama.math.rebin import rebin_bblocks
>>> import hist
>>> h_fine = hist.new.Reg(1000, 0, 1000).Double().fill(...)
>>> h_bb = rebin_bblocks(h_fine)

pygama.math.rebin.rebin_bblocks_with_peaks(h, *, n_sigma=5.0, width_factor=5.0, max_bin_width=None, p0=0.05)¶

Rebin a fine histogram with bblocks edges and collapsed peaks.

First applies rebin_bblocks() to get adaptive continuum bins, then merges each detected gamma peak into a single bin.

Parameters:

h (Hist) – Input finely-binned 1D histogram.
n_sigma (float) – How tall a peak must be above its local baseline to be flagged, in units of Poisson noise. Default 5.0. Lower values pick up weaker peaks but also more spurious detections.
width_factor (float) – How far the merge can extend on each side of a peak, relative to the width of the peak’s own bblocks bin. The walk stops at the first bin wider than width_factor * peak_bin_width. Default 5.0; try 2.0-3.0 for tighter merges on HPGe spectra.
max_bin_width (float | None) –
Maximum allowed width for a peak bin (in axis units). Two effects:
1. Detection: candidates whose bblocks bin is wider than this are rejected – useful for filtering out Compton edges, continuum plateaus, and other broad features that the peak finder cannot tell apart from real gamma lines based on shape alone (see Notes).
2. Walk: the merge walk never extends into a bin wider than this, regardless of width_factor.
Default None (no filter). For HPGe spectra set it to a few times FWHM at the highest energy of interest (5-20 keV is typical) to suppress non-line features.
p0 (float) – See hist_bblocks().

Returns:

out – A Variable-axis Weight-storage histogram whose edges are a subset of the input edges.

Return type:

Hist

Notes

The peak finder only looks at local maxima of the rate (counts/width) and how high they stand above their local baseline; it knows nothing about what a real gamma line looks like. It can therefore flag features that aren’t line-like:

resolution-limited gamma peaks (the intended target);
the upper end of a long Compton-edge ramp;
step transitions;
wide continuum plateaus that bblocks captured in a single wide bin.

Real HPGe peaks live on narrow bblocks bins (FWHM is a few keV even at multi-MeV energies); the other features end up on much wider bins. max_bin_width is the simplest filter to reject everything that isn’t a real line.

Examples

>>> from pygama.math.rebin import rebin_bblocks_with_peaks
>>> h_out = rebin_bblocks_with_peaks(h_fine, max_bin_width=10)

pygama.math.rebin.rebin_uniform_with_peaks(h, *, bin_width, n_sigma=5.0, width_factor=5.0, max_bin_width=None, p0=0.05)¶

Rebin a fine histogram onto uniform continuum bins with collapsed peaks.

Runs an internal rebin_bblocks() pass to find the peaks, then builds the output: uniform-bin_width continuum bins everywhere except inside detected peak regions, which each become a single bin. Output edges are snapped to h.axes[0].edges, so bin_width need not align exactly with the input grid.

Parameters:

h (Hist) – Input finely-binned 1D histogram.
bin_width (float) – Continuum bin width (in axis units). To avoid tiny fragment bins right next to peaks, the continuum bin bordering each peak absorbs the leftover and can be up to 2 * bin_width wide.
n_sigma (float) – See rebin_bblocks_with_peaks().
width_factor (float) – See rebin_bblocks_with_peaks().
max_bin_width (float | None) – See rebin_bblocks_with_peaks().
p0 (float) – See rebin_bblocks_with_peaks().

Returns:

out – A Variable-axis Weight-storage histogram whose edges are a subset of the input edges.

Return type:

Hist

Examples

>>> from pygama.math.rebin import rebin_uniform_with_peaks
>>> import hist
>>> h_fine = hist.new.Reg(1000, 0, 1000).Double().fill(...)
>>> h_out = rebin_uniform_with_peaks(h_fine, bin_width=2, max_bin_width=10)

pygama.math.unbinned_fitting module¶

pygama convenience functions for fitting ubinned data

pygama.math.unbinned_fitting.fit_unbinned(func, data, guess=None, extended=True, cost_func='LL', simplex=False, bounds=None, fixed=None)¶

Do a unbinned fit to data. Default is extended Log Likelihood fit, with option for other cost functions.

Parameters:

func (Callable) – the function to fit
data (ndarray) – the data values to be fit
guess (ndarray) – initial guess parameters
extended (bool) – run extended or non extended fit
cost_func (str) – cost function to use. LL is ExtendedUnbinnedNLL, None is for just UnbinnedNLL
simplex (bool) – whether to include a round of simpson minimisation before main minimisation
bounds (list[tuple[float, float]] | None) – Each tuple is (min, max) for the corresponding parameters. Bounds can be None, e.g. [(0,None), (0,10)]
fixed (list[bool] | None) – list of parameter indices to fix

Returns:

pars, errs, cov – the best-fit parameters and their errors / covariance

Return type:

tuple[ndarray, …]

pygama.math.units module¶

pygama.math.utils module¶

pygama utility functions.

pygama.math.utils.get_formatted_stats(mean, sigma, ndigs=2)¶

convenience function for formatting mean +/- sigma to the right number of significant figures.

Parameters:

mean (float) – The mean value we want to format
sigma (float) – The sigma value we want to format
ndigs (int) – The number of significant digits we want to display

Return type:

tuple[str, str]

pygama.math.utils.get_par_names(func)¶

Return a list containing the names of the arguments of “func” other than the first argument. In pygamaland, those are the function’s “parameters.”

Parameters:: func (Callable) – A function whose parameters we want to return
Return type:: tuple[str, …]

pygama.math.utils.print_fit_results(pars, cov, func=None, title=None, pad=True)¶

Convenience function to write scipy.optimize.curve_fit results to the log

Parameters:

pars (ndarray) – The parameter values of the function func
func (Callable | None) – A function, if passed then the function’s parameters’ names are logged
title (str | None) – A title to log
pad (bool | None) – If True, adds spaces to the log messages

Returns:

None – Writes the curve_fit results to the log

Return type:

None

pygama.math.utils.sizeof_fmt(num, suffix='B')¶

given a file size in bytes, output a human-readable form. :param num: File size, in bytes :param suffix: Desired file size suffix

Return type:: str