pygama.math package

Useful functions for performing quick computations of statistical distributions, as well as helper functions for histogramming data

Subpackages

Submodules

pygama.math.binned_fitting module

pygama convenience functions for fitting binned data

pygama.math.binned_fitting.fit_binned(func, hist, bins, var=None, guess=None, cost_func='LL', extended=True, simplex=False, bounds=None, fixed=None)

Do a binned fit to a histogram.

Default is extended Log Likelihood fit, with option for either Least Squares or other cost function.

Parameters:
  • func (Callable) – the function to fit, if using LL as method needs to be a cdf

  • hist (ndarray) – histogrammed data

  • bins (ndarray) – histogrammed data

  • var (ndarray) – histogrammed data

  • guess (ndarray) – initial guess parameters

  • cost_func (str) – cost function to use

  • extended (bool) – run extended or non extended fit

  • simplex (bool) – whether to include a round of simpson minimisation before main minimisation

  • bounds (tuple[tuple[float, float], ...] | None) – list of tuples with bounds can be None, e.g. [(0,None), (0,10)]

  • fixed (tuple[int, ...] | None) – list of parameter indices to fix

Returns:

  • coeff – Returned fit parameters

  • error – Iminuit errors

  • cov_matrix – Covariance matrix

Return type:

tuple[ndarray, …]

pygama.math.binned_fitting.gauss_mode(hist, bins, **kwargs)

Alias for gauss_mode_max that just returns the mode (position) of a peak

Returns:

  • mode (the estimated x-position of the maximum)

  • dmode (the uncertainty in the mode)

Return type:

tuple[float, float]

pygama.math.binned_fitting.gauss_mode_max(hist, bins, **kwargs)

Alias for gauss_mode_width_max that just returns the max and mode

Returns:

  • pars (ndarray with the parameters (mode, maximum) of the gaussian fit) – mode : the estimated x-position of the maximum maximum : the estimated maximum value of the peak

  • cov (2x2 ndarray of floats) – The covariance matrix for the 2 parameters in pars

Return type:

tuple[ndarray, …]

Examples

>>> import pygama.math.histograms as pgh
>>> from numpy.random import normal
>>> import pygama.math.peak_fitting as pgf
>>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5))
>>> pgf.gauss_mode_max(hist, bins, var=var, n_bins=20)
pygama.math.binned_fitting.gauss_mode_width_max(hist, bins, var=None, mode_guess=None, n_bins=5, cost_func='Least Squares', inflate_errors=False, gof_method='var', sigma_limit=None)

Get the max, mode, and width of a peak based on gauss fit near the max Returns the parameters of a gaussian fit over n_bins in the vicinity of the maximum of the hist (or the max near mode_guess, if provided). This is equivalent to a Taylor expansion around the peak maximum because near its maximum a Gaussian can be approximated by a 2nd-order polynomial in x:

\[A \exp[ -(x-\mu)^2 / 2 \sigma^2 ] \simeq A [ 1 - (x-\mu)^2 / 2 \sigma^2 ] = A - (1/2!) (A/\sigma^2) (x-\mu)^2\]

The advantage of using a gaussian over a polynomial directly is that the gaussian parameters are the ones we care about most for a peak, whereas for a poly we would have to extract them after the fit, accounting for covariances. The gaussian also better approximates most peaks farther down the peak. However, the gauss fit is nonlinear and thus less stable.

Parameters:
  • hist (ndarray) – The values of the histogram to be fit

  • bins (ndarray) – The bin edges of the histogram to be fit

  • var (ndarray | None) – The variances of the histogram values. If not provided, square-root variances are assumed.

  • mode_guess (float | None) – An x-value (not a bin index!) near which a peak is expected. The algorithm fits around the maximum within +/- n_bins of the guess. If not provided, the center of the max bin of the histogram is used.

  • n_bins (int | None) – The number of bins (including the max bin) to be used in the fit. Also used for searching for a max near mode_guess

  • cost_func (str) – Passed to fit_binned()

  • inflate_errors (bool | None) – If true, the parameter uncertainties are inflated by sqrt(chi2red) if it is greater than 1

  • gof_method (str) – method flag for goodness_of_fit

  • sigma_limit (float | None) – upper bound on sigma

Returns:

  • pars (ndarray containing the parameters (mode, sigma, maximum) of the gaussian fit) –

    • mode : the estimated x-position of the maximum

    • sigma : the estimated width of the peak. Equivalent to a gaussian width (sigma), but based only on the curvature within n_bins of the peak. Note that the Taylor-approxiamted curvature of the underlying function in the vicinity of the max is given by max / sigma^2

    • maximum : the estimated maximum value of the peak

  • cov (3x3 ndarray of floats) – The covariance matrix for the 3 parameters in pars

Return type:

tuple[ndarray, …]

pygama.math.binned_fitting.goodness_of_fit(hist, bins, var, func, pars, method='var', scale_bins=False)

Compute chisq and dof of fit

Parameters:
  • hist (ndarray) – histogram data. var can be None if hist is integer counts

  • bins (ndarray) – histogram data. var can be None if hist is integer counts

  • var (ndarray) – histogram data. var can be None if hist is integer counts

  • func (Callable) – the function that was fit to the hist

  • pars (ndarray) – the best-fit pars of func. Assumes all pars are free parameters

  • method (str) – Sets the choice of “denominator” in the chi2 sum ‘var’: user passes in the variances in var (must not have zeros) ‘Pearson’: use func (hist must contain integer counts) ‘Neyman’: use hist (hist must contain integer counts and no zeros)

Returns:

  • chisq – the summed up value of chisquared

  • dof – the number of degrees of freedom

Return type:

tuple[float, int]

pygama.math.binned_fitting.poisson_gof(pars, func, hist, bins, is_integral=False, **kwargs)

Calculate the goodness of fit for the Poisson likelihood.

Parameters:
  • pars (ndarray) – The parameters of the function, func

  • func (Callable) – The function that was fit

  • hist (ndarray) – The data that were fit

  • bins (ndarray) – The bins of the histogram that was fit to

  • is_integral (bool) – Tells get_bin_estimates if the function is an integral function

Returns:

Poisson G.O.F.

Return type:

float

Notes

The Poisson likelihood does not give a good GOF until the counts are very high and all the poisson stats are roughly gaussian and you don’t need it anyway. But the G.O.F. is calculable for the Poisson likelihood. So we do it here.

pygama.math.binned_fitting.taylor_mode_max(hist, bins, var=None, mode_guess=None, n_bins=5)

Get the max and mode of a peak based on Taylor exp near the max Returns the amplitude and position of a peak based on a poly fit over n_bins in the vicinity of the maximum of the hist (or the max near mode_guess, if provided)

Parameters:
  • hist (ndarray) – The values of the histogram to be fit. Often: send in a slice around a peak

  • bins (ndarray) – The bin edges of the histogram to be fit

  • var (ndarray | None) – The variances of the histogram values. If not provided, square-root variances are assumed.

  • mode_guess (float | None) – An x-value (not a bin index!) near which a peak is expected. The algorithm fits around the maximum within +/- n_bins of the guess. If not provided, the center of the max bin of the histogram is used.

  • n_bins (int) – The number of bins (including the max bin) to be used in the fit. Also used for searching for a max near mode_guess

Returns:

  • pars (2-tuple with the parameters (mode, max) of the fit) – mode : the estimated x-position of the maximum maximum : the estimated maximum value of the peak

  • cov (2x2 matrix of floats) – The covariance matrix for the 2 parameters in pars

Return type:

tuple[ndarray, …]

Examples

>>> import pygama.analysis.histograms as pgh
>>> from numpy.random import normal
>>> import pygama.analysis.peak_fitting as pgf
>>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5))
>>> pgf.taylor_mode_max(hist, bins, var, n_bins=5)

pygama.math.distributions module

Contains a list of distribution functions, all implemented using Numba’s numba.jit() to take advantage of the just-in-time speed boost.

pygama.math.histogram module

pygama convenience functions for 1D histograms.

1D hists in pygama require 3 things available from all implementations of 1D histograms of numerical data in python: hist, bins, and var: - hist: an array of histogram values - bins: an array of bin edges - var: an array of variances in each bin If var is not provided, pygama assuems that the hist contains “counts” with variance = counts (Poisson stats)

These are just convenience functions, provided for your convenience. Hopefully they will help you if you need to do something trickier than is provided (e.g. 2D hists).

pygama.math.histogram.better_int_binning(x_lo=0, x_hi=None, dx=None, n_bins=None)

Get a good binning for integer data.

Guarantees an integer bin width.

At least two of x_hi, dx, or n_bins must be provided.

Parameters:
  • x_lo (float) – Desired low x value for the binning

  • x_hi (float | None) – Desired high x value for the binning

  • dx (float | None) – Desired bin width

  • n_bins (float | None) – Desired number of bins

Returns:

  • x_lo – int values for best x_lo

  • x_hi – int values for best x_hi, returned if x_hi is not None

  • dx – best int bin width, returned if arg dx is not None

  • n_bins – best int n_bins, returned if arg n_bins is not None

Return type:

tuple[int, int, int, int]

pygama.math.histogram.find_bin(x, bins)

Returns the index of the bin containing x Returns -1 for underflow, and len(bins) for overflow For uniform bins, jumps directly to the appropriate index. For non-uniform bins, binary search is used.

Parameters:
  • x (float) – The value to search for amongst the bins

  • bins (ndarray) – The input array of bin-edges

Returns:

  • bin_idx – Index of the bin containing x

  • TODO (replace np.searchsorted with for loops, numba will speed this function up then)

Return type:

int

pygama.math.histogram.get_bin_centers(bins)

Returns an array of bin centers from an input array of bin edges. Works for non-uniform binning. Note: a new array is allocated

Parameters:

bins (ndarray) – The input array of bin-edges

Returns:

bin_centers – The array of bin centers

Return type:

ndarray

pygama.math.histogram.get_bin_estimates(pars, func, bins, is_integral=False, **kwargs)

Bin expected means are estimated by f(bin_center)*bin_width. Supply an integrating function to compute the integral over the bin instead. TODO: make default integrating function a numerical method that is off by default.

Parameters:
  • pars (ndarray) – The parameters of the function, func

  • func (Callable) – The function at which to evaluate the bin centers

  • bins (ndarray) – Array of histogram bins

  • is_integral (bool) – if True, then func is an integral function

Returns:

f(bin_center)*bin_width – The expected mean of a bin

Return type:

ndarray

See also

get_bin_widths

Returns the width of the bins

get_bin_centers

Finds the bin centers of the supplied bins

pygama.math.histogram.get_bin_widths(bins)

Returns an array of bin widths from an input array of bin edges. Works for non-uniform binning.

Parameters:

bins (ndarray) – The input array of bin-edges

Returns:

bin_widths – An array of bin widths

Return type:

ndarray

pygama.math.histogram.get_fwfm(fraction, hist, bins, var=None, mx=None, dmx=0, bl=0, dbl=0, method='bins_over_f', n_slope=3)

Estimate the full width at some fraction of the max of data in a histogram

Typically used by sending slices around a peak. Searches about argmax(hist) for the peak to fall by [fraction] from mx to bl

Parameters:
  • fraction (float) – The fractional amplitude at which to evaluate the full width

  • hist (ndarray) – The histogram data array containing the peak

  • bins (ndarray) – An array of bin edges for the histogram

  • var (ndarray | None) – An array of histogram variances. Used with the ‘fit_slopes’ method

  • mx (float | tuple[float, float] | None) – The value to use for the max of the peak. If None, np.amax(hist) is used.

  • dmx (float | None) – The uncertainty in mx

  • bl (float | tuple[float, float] | None) – Used to specify an offset from which to estimate the FWFM.

  • dbl (float | None) – The uncertainty in the bl

  • method (str) –

    ‘bins_over_f’the simplest method: just take the difference in the bin

    centers that are over [fraction] of max. Only works for high stats and FWFM/bin_width >> 1

    ’interpolate’interpolate between the bins that cross the [fraction]

    line. Works well for high stats and a reasonable number of bins. Uncertainty incorporates var, if provided.

    ’fit_slopes’fit over n_slope bins in the vicinity of the FWFM and

    interpolate to get the fractional crossing point. Works okay even when stats are moderate but requires enough bins that dx traversed by n_slope bins is approximately linear. Incorporates bin variances in fit and final uncertainties if provided.

  • n_slope (int) – Number of bins in the vicinity of the FWFM used to interpolate the fractional crossing point with the ‘fit_slopes’ method

Returns:

fwfm, dfwfm – fwfm: the full width at [fraction] of the maximum above bl dfwfm: the uncertainty in fwfm

Return type:

tuple[float, float]

Examples

>>> import pygama.analysis.histograms as pgh
>>> from numpy.random import normal
>>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5))
>>> pgh.get_fwfm(0.5, hist, bins, var, method='bins_over_f')
(2.2, 0.15919638684132664) # may vary
>>> pgh.get_fwfm(0.5, hist, bins, var, method='interpolate')
(2.2041666666666666, 0.09790931254396479) # may vary
>>> pgh.get_fwfm(0.5, hist, bins, var, method='fit_slopes')
(2.3083363869003466, 0.10939486522749278) # may vary
pygama.math.histogram.get_fwhm(hist, bins, var=None, mx=None, dmx=0, bl=0, dbl=0, method='bins_over_f', n_slope=3)

Convenience function for the FWHM of data in a histogram

Typically used by sending slices around a peak. Searches about argmax(hist) for the peak to fall by [fraction] from mx to bl

Parameters:
  • fraction – The fractional amplitude at which to evaluate the full width

  • hist (ndarray) – The histogram data array containing the peak

  • bins (ndarray) – An array of bin edges for the histogram

  • var (ndarray | None) – An array of histogram variances. Used with the ‘fit_slopes’ method

  • mx (float | tuple[float, float] | None) – The value to use for the max of the peak. If None, np.amax(hist) is used.

  • dmx (float | None) – The uncertainty in mx

  • bl (float | tuple[float, float] | None) – Used to specify an offset from which to estimate the FWFM.

  • dbl (float | None) – The uncertainty in the bl

  • method (str) –

    ‘bins_over_f’the simplest method: just take the difference in the bin

    centers that are over [fraction] of max. Only works for high stats and FWFM/bin_width >> 1

    ’interpolate’interpolate between the bins that cross the [fraction]

    line. Works well for high stats and a reasonable number of bins. Uncertainty incorporates var, if provided.

    ’fit_slopes’fit over n_slope bins in the vicinity of the FWFM and

    interpolate to get the fractional crossing point. Works okay even when stats are moderate but requires enough bins that dx traversed by n_slope bins is approximately linear. Incorporates bin variances in fit and final uncertainties if provided.

  • n_slope (int) – Number of bins in the vicinity of the FWFM used to interpolate the fractional crossing point with the ‘fit_slopes’ method

Returns:

fwhm, dfwhm – fwfm: the full width at half of the maximum above bl dfwfm: the uncertainty in fwhm

Return type:

tuple[float, float]

See also

get_fwfm

Function that computes the FWFM

pygama.math.histogram.get_gaussian_guess(hist, bins)

given a hist, gives guesses for mu, sigma, and amplitude

Parameters:
  • hist (ndarray) – Array of histogrammed data

  • bins (ndarray) – Array of histogram bins

Returns:

  • guess_mu – guess for the mu parameter of a Gaussian

  • guess_sigma – guess for the sigma parameter of a Gaussian

  • guess_area – guess for the A parameter of a Gaussian

Return type:

tuple[float, float, float]

pygama.math.histogram.get_hist(data, bins=None, range=None, dx=None, wts=None)

return hist, bins, var after binning data

This is just a wrapper for humba.histogram, with optional weights for each element and proper computing of variances.

Note: there are no overflow / underflow bins.

Available binning methods:

  • Default (no binning arguments) : 100 bins over an auto-detected range

  • bins=n, range=(x_lo, x_hi) : n bins over the specified range (or leave range=None for auto-detected range)

  • bins=[str] : use one of np.histogram’s automatic binning algorithms

  • bins=bin_edges_array : array lower bin edges, supports non-uniform binning

  • dx=dx, range=(x_lo, x_hi): bins of width dx over the specified range. Note: dx overrides the bins argument!

Parameters:
  • data (ndarray) – The array of data to be histogrammed

  • bins (int | ndarray | str | None) – int: the number of bins to be used in the histogram array: an array of bin edges to use str: the name of the np.histogram automatic binning algorithm to use If not provided, humba.histogram’s default auto-binning routine is used

  • range (tuple[float, float] | None) – (x_lo, x_high) is the tuple of low and high x values to uses for the very ends of the bin range. If not provided, np.histogram chooses the ends based on the data in data

  • dx (float | None) – Specifies the bin width. Overrides “bins” if both arguments are present

  • wts (float | ndarray | None) – Array of weights for each bin. For example, if you want to divide all bins by a time T to get the bin contents in count rate, set wts = 1/T. Variances will be computed for each bin that appropriately account for each data point’s weighting.

Returns:

  • hist – the values in each bin of the histogram

  • bins – an array of bin edges: bins[i] is the lower edge of the ith bin. Note: it includes the upper edge of the last bin and does not include underflow or overflow bins. So len(bins) = len(hist) + 1

  • var – array of variances in each bin of the histogram

Return type:

tuple[ndarray, …]

pygama.math.histogram.get_i_local_extrema(data, delta)

Get lists of indices of the local maxima and minima of data

The “local” extrema are those maxima / minima that have heights / depths of at least delta. Converted from MATLAB script at: http://billauer.co.il/peakdet.html

Parameters:
  • data (array-like) – the array of data within which extrema will be found

  • delta (scalar) – the absolute level by which data must vary (in one direction) about an extremum in order for it to be tagged

Returns:

imaxes, imins (2-tuple ( array, array )) – A 2-tuple containing arrays of variable length that hold the indices of the identified local maxima (first tuple element) and minima (second tuple element)

pygama.math.histogram.get_i_local_maxima(data, delta)
pygama.math.histogram.get_i_local_minima(data, delta)
pygama.math.histogram.plot_hist(hist, bins, var=None, show_stats=False, stats_hloc=0.75, stats_vloc=0.85, fill=False, fillcolor='r', fillalpha=0.2, **kwargs)

Plot a step histogram, with optional error bars

Parameters:
  • hist (ndarray) – The histogram data array containing the peak

  • bins (ndarray) – An array of bin edges for the histogram

  • var (ndarray | None) – An array of histogram variances. Used with the ‘fit_slopes’ method

  • show_stats (bool) – If True, shows the mean, mean error, and standard deviation of the histogram on the plot

  • stats_hloc (float) – matplotlib.pyplot horizontal location to place the stats

  • stats_vloc (float) – matplotlib.pyplot vertical location to place the stats

  • fill (bool) – If True, fills in a step histogram when plotting

  • fill_color – Color to fill the step histogram if fill is True

  • fill_alpha – Alpha amount if fill is True

pygama.math.histogram.range_slice(x_min, x_max, hist, bins, var=None)

Get the histogram bins and values for a given slice of the range

Parameters:
  • x_min (float) – Lower endpoint of the range

  • x_min – Upper endpoint of the range

  • hist (ndarray) – The histogrammed data to search through

  • bins (ndarray) – The histogrammed data to search through

  • var (ndarray | None) – The histogrammed data to search through

Return type:

tuple[ndarray, …]

See also

find_bin

for parameters and return values

pygama.math.hpge_peak_fitting module

pygama convenience functions for fitting hpge peak shape data

pygama.math.hpge_peak_fitting.hpge_peak_fwfm(sigma, htail, tau, frac_max=0.5, cov=None)

Return the FWHM of the radford_peak function, ignoring background and step components. If calculating error also need the normalisation for the step function.

pygama.math.hpge_peak_fitting.hpge_peak_fwhm(sigma, htail, tau, cov=None)

Return the FWHM of the hpge_peak function, ignoring background and step components. If calculating error also need the normalisation for the step function.

Parameters:
  • sigma (float) – The width of the hpge_peak

  • htail (float) – The height of the tail in the hpge_peak

  • tau (float) – The characteristic scale in the extended Gaussian in the hpge_peak

  • cov (float | None) – The covariant matrix of the previous parameters

  • Returns

    FWHM, FWHM_uncertainty

    The FWHM of the hpge_peak and its uncertainty

Return type:

tuple[float, float]

pygama.math.hpge_peak_fitting.hpge_peak_mode(mu, sigma, htail, tau, cov=None)
pygama.math.hpge_peak_fitting.hpge_peak_parameter_gradient(e, pars, step_norm)

Computes the gradient of the hpge_peak parameters

Parameters:
  • e (float) – The energy of the hpge_peak

  • pars (ndarray) – The parameters of the hpge_peak fit

  • step_norm (float) – The normalization of the background step function in the hpge_peak

Returns:

gradient – gradient of the n_sig, mu, sigma, h_tail, tau, n_bkg, and hstep parameters of the HPGe peak

Return type:

ndarray

pygama.math.hpge_peak_fitting.hpge_peak_peakshape_derivative(e, pars, step_norm)

Computes the derivative of the hpge_peak peak shape

Parameters:
  • e (ndarray) – The array of energies of the hpge_peak

  • pars (ndarray) – The parameters of the hpge_peak fit

  • step_norm (float) – The normalization of the background step function in the hpge_peak

Returns:

derivative – the derivative of the hpge_peak

Return type:

ndarray

pygama.math.least_squares module

pygama convenience functions for linearly fitting data

pygama.math.least_squares.fit_simple_scaling(x, y, var=1)

Fast computation of weighted linear least squares fit to a simple scaling

I.e. y = scale * x. Returns the best fit scale parameter and its variance.

Parameters:
  • x (ndarray) – x values for the fit

  • y (ndarray) – y values for the fit

  • var (float | ndarray | None) – The variances for each y-value

Returns:

scale, scale_var – The scale parameter and its variance

Return type:

tuple[float, float]

pygama.math.least_squares.linear_fit_by_sums(x, y, var=1)

Fast computation of weighted linear least squares fit to a linear model

Note: doesn’t compute covariances. If you want covariances, just use polyfit

Parameters:
  • x (ndarray) – x values for the fit

  • y (ndarray) – y values for the fit

  • var (float | ndarray | None) – The variances for each y-value

Returns:

(m, b) – The slope (m) and y-intercept (b) of the best fit (in the least-squares sense) of the data to y = mx + b

Return type:

tuple[float, float]

pygama.math.rebin module

Adaptive rebinning routines for 1D histograms.

Two adaptive strategies are offered:

  • Bayesian blocks (Scargle et al. 2013), backed by hepstats.modeling.bayesian_blocks().

  • Peak-aware: peaks are detected and each peak region is collapsed into a single bin. The continuum is either left at bblocks spacing or rebinned to a uniform width.

Functions taking a raw data array are named hist_*; those taking a finely-binned hist.Hist are named rebin_*. All routines return a Variable-axis Weight-storage hist.Hist.

Public functions

Rebinning a histogram with the edges of another

To rebin a fine histogram h_fine onto the edges of another histogram h_ref, use hist.rebin() directly (the edges of h_ref must be a subset of those of h_fine):

>>> import hist as bh
>>> h_out = h_fine[:: bh.rebin(edges=h_ref.axes[0].edges.tolist())]
pygama.math.rebin._bblocks_edges(data, **kwargs)

Bblocks edges; last edge nudged by 1 ULP for right-exclusive axes.

Return type:

ndarray

pygama.math.rebin._bblocks_edges_from_hist(h, *, p0=0.05)

Bayesian-blocks edges from a fine histogram, snapped to its grid.

Empty bins are dropped to avoid the log(0) warning hepstats raises on zero-weight events.

Return type:

ndarray

pygama.math.rebin._collapse_peaks(h, *, n_sigma=5.0, width_factor=5.0, max_bin_width=None)

Merge bins inside each detected peak region into a single bin.

Return type:

Hist

pygama.math.rebin._find_peak_regions(h, *, n_sigma=5.0, width_factor=5.0, max_bin_width=None)

Detect peaks and return (n_regions, 2) merged region edges.

Edges are in axis units; the array is empty if no peaks pass the significance threshold. Peaks are strict local maxima of the rate (counts/width), filtered by prominence * w_peak / sqrt(n_peak) against n_sigma. When max_bin_width is given, candidates sitting on a bin wider than the cap are also rejected (line-like features have resolution-bounded widths; wide bins flag Compton edges and smooth-continuum plateaus). Each surviving peak is grown outward via _walk_peak() and overlapping regions are merged.

Return type:

ndarray

pygama.math.rebin._prepare_data(data, prebin_low, prebin_high)

Flatten awkward inputs and apply the optional range mask.

Return type:

tuple[ndarray, float, float]

pygama.math.rebin._snap_indices(fine_edges, target_edges)

Nearest-edge indices into fine_edges for each target_edges.

Endpoints are clamped to 0 and len(fine_edges) - 1 and duplicates are removed, so the result is a strictly increasing integer array suitable as group boundaries for hist.rebin().

Return type:

ndarray

pygama.math.rebin._uniform_edges_with_peaks(low, high, *, bin_width, peak_regions)

Uniform grid over [low, high] with peak regions as single bins.

Any uniform edge strictly inside a peak region, OR within bin_width of one of its boundaries, is dropped: the peak region’s neighboring continuum bins absorb the would-be fragment, making them up to 2 * bin_width wide instead. Uniform edges exactly on a peak boundary are kept. Assumes peak_regions is sorted and non-overlapping.

Return type:

ndarray

pygama.math.rebin._walk_peak(start, step, rate, widths, width_cap, n)

Walk outward from a peak bin and return the farthest index reached.

Steps in direction step (-1 or +1), stopping when the rate stops descending, the next bin’s width reaches width_cap, or the next bin is a strict local minimum of rate. At the array boundary the missing neighbor is treated as +inf.

Return type:

int

pygama.math.rebin.hist_bblocks(data, *, prebin_width=None, prebin_low=None, prebin_high=None, p0=0.05)

Histogram an unbinned data array using Bayesian-blocks edges.

For large samples, pass prebin_width to first bin the data on a uniform fine grid and then call rebin_bblocks() – this is much faster than running bblocks on the raw data directly. Awkward inputs are flattened before binning.

Parameters:
  • data (ndarray | Array) – Array of measurement values (numpy or awkward).

  • prebin_width (float | None) – Width of the uniform pre-binning grid. Should be a few times finer than the narrowest feature you want to resolve (e.g. ~0.5 keV for HPGe lines with ~3 keV FWHM).

  • prebin_low (float | None) – Histogram range. Default to data.min() / data.max(). Data outside the range is discarded. Only valid together with prebin_width. The upper edge may overshoot prebin_high by up to prebin_width so the range is covered by an integer number of bins.

  • prebin_high (float | None) – Histogram range. Default to data.min() / data.max(). Data outside the range is discarded. Only valid together with prebin_width. The upper edge may overshoot prebin_high by up to prebin_width so the range is covered by an integer number of bins.

  • p0 (float) – Bblocks sensitivity: lower values give coarser bins, higher values finer. Typical 0.01-0.1.

Returns:

h – A Variable-axis Weight-storage histogram filled with data.

Return type:

Hist

Examples

>>> from pygama.math.rebin import hist_bblocks
>>> h = hist_bblocks(energies, prebin_width=0.5, prebin_low=0, prebin_high=3000)
pygama.math.rebin.hist_bblocks_with_peaks(data, *, prebin_width=None, prebin_low=None, prebin_high=None, p0=0.05, n_sigma=5.0, width_factor=5.0, max_bin_width=None)

Histogram raw data with Bayesian-blocks edges and collapsed peaks.

Equivalent to _collapse_peaks(hist_bblocks(data, ...)).

Parameters:
Returns:

h – A Variable-axis Weight-storage histogram.

Return type:

Hist

Examples

Starting with a large data array, prebin it for performance and use a maximum peak width of 10.

>>> from pygama.math.rebin import hist_bblocks_with_peaks
>>> data = [...]
>>> h = hist_bblocks_with_peaks(
...     data, prebin_width=0.5, prebin_low=0, prebin_high=3000, max_bin_width=10
... )
pygama.math.rebin.hist_uniform_with_peaks(data, *, bin_width, prebin_width=None, prebin_low=None, prebin_high=None, p0=0.05, n_sigma=5.0, width_factor=5.0, max_bin_width=None)

Histogram raw data with uniform continuum bins and collapsed peaks.

Parameters:
Returns:

h – A Variable-axis Weight-storage histogram filled with data.

Return type:

Hist

Examples

Starting with a large data array, prebin it for performance and use a maximum peak width of 10. In between the peaks use an uniform bin width of 2.

>>> from pygama.math.rebin import hist_uniform_with_peaks
>>> data = [...]
>>> h = hist_uniform_with_peaks(
...     energies,
...     bin_width=2,
...     prebin_width=0.5,
...     prebin_low=0,
...     prebin_high=3000,
...     max_bin_width=10,
... )
pygama.math.rebin.rebin_bblocks(h, *, p0=0.05)

Rebin a fine 1D histogram using Bayesian-blocks adaptive binning.

Each fine bin is treated as a weighted event (weight = bin count); the resulting block edges are snapped to the input grid.

Parameters:
Returns:

out – A Variable-axis Weight-storage histogram whose edges are a subset of the input edges.

Return type:

Hist

Examples

>>> from pygama.math.rebin import rebin_bblocks
>>> import hist
>>> h_fine = hist.new.Reg(1000, 0, 1000).Double().fill(...)
>>> h_bb = rebin_bblocks(h_fine)
pygama.math.rebin.rebin_bblocks_with_peaks(h, *, n_sigma=5.0, width_factor=5.0, max_bin_width=None, p0=0.05)

Rebin a fine histogram with bblocks edges and collapsed peaks.

First applies rebin_bblocks() to get adaptive continuum bins, then merges each detected gamma peak into a single bin.

Parameters:
  • h (Hist) – Input finely-binned 1D histogram.

  • n_sigma (float) – How tall a peak must be above its local baseline to be flagged, in units of Poisson noise. Default 5.0. Lower values pick up weaker peaks but also more spurious detections.

  • width_factor (float) – How far the merge can extend on each side of a peak, relative to the width of the peak’s own bblocks bin. The walk stops at the first bin wider than width_factor * peak_bin_width. Default 5.0; try 2.0-3.0 for tighter merges on HPGe spectra.

  • max_bin_width (float | None) –

    Maximum allowed width for a peak bin (in axis units). Two effects:

    1. Detection: candidates whose bblocks bin is wider than this are rejected – useful for filtering out Compton edges, continuum plateaus, and other broad features that the peak finder cannot tell apart from real gamma lines based on shape alone (see Notes).

    2. Walk: the merge walk never extends into a bin wider than this, regardless of width_factor.

    Default None (no filter). For HPGe spectra set it to a few times FWHM at the highest energy of interest (5-20 keV is typical) to suppress non-line features.

  • p0 (float) – See hist_bblocks().

Returns:

out – A Variable-axis Weight-storage histogram whose edges are a subset of the input edges.

Return type:

Hist

Notes

The peak finder only looks at local maxima of the rate (counts/width) and how high they stand above their local baseline; it knows nothing about what a real gamma line looks like. It can therefore flag features that aren’t line-like:

  • resolution-limited gamma peaks (the intended target);

  • the upper end of a long Compton-edge ramp;

  • step transitions;

  • wide continuum plateaus that bblocks captured in a single wide bin.

Real HPGe peaks live on narrow bblocks bins (FWHM is a few keV even at multi-MeV energies); the other features end up on much wider bins. max_bin_width is the simplest filter to reject everything that isn’t a real line.

Examples

>>> from pygama.math.rebin import rebin_bblocks_with_peaks
>>> h_out = rebin_bblocks_with_peaks(h_fine, max_bin_width=10)
pygama.math.rebin.rebin_uniform_with_peaks(h, *, bin_width, n_sigma=5.0, width_factor=5.0, max_bin_width=None, p0=0.05)

Rebin a fine histogram onto uniform continuum bins with collapsed peaks.

Runs an internal rebin_bblocks() pass to find the peaks, then builds the output: uniform-bin_width continuum bins everywhere except inside detected peak regions, which each become a single bin. Output edges are snapped to h.axes[0].edges, so bin_width need not align exactly with the input grid.

Parameters:
Returns:

out – A Variable-axis Weight-storage histogram whose edges are a subset of the input edges.

Return type:

Hist

Examples

>>> from pygama.math.rebin import rebin_uniform_with_peaks
>>> import hist
>>> h_fine = hist.new.Reg(1000, 0, 1000).Double().fill(...)
>>> h_out = rebin_uniform_with_peaks(h_fine, bin_width=2, max_bin_width=10)

pygama.math.unbinned_fitting module

pygama convenience functions for fitting ubinned data

pygama.math.unbinned_fitting.fit_unbinned(func, data, guess=None, extended=True, cost_func='LL', simplex=False, bounds=None, fixed=None)

Do a unbinned fit to data. Default is extended Log Likelihood fit, with option for other cost functions.

Parameters:
  • func (Callable) – the function to fit

  • data (ndarray) – the data values to be fit

  • guess (ndarray) – initial guess parameters

  • extended (bool) – run extended or non extended fit

  • cost_func (str) – cost function to use. LL is ExtendedUnbinnedNLL, None is for just UnbinnedNLL

  • simplex (bool) – whether to include a round of simpson minimisation before main minimisation

  • bounds (list[tuple[float, float]] | None) – Each tuple is (min, max) for the corresponding parameters. Bounds can be None, e.g. [(0,None), (0,10)]

  • fixed (list[bool] | None) – list of parameter indices to fix

Returns:

pars, errs, cov – the best-fit parameters and their errors / covariance

Return type:

tuple[ndarray, …]

pygama.math.units module

pygama.math.utils module

pygama utility functions.

pygama.math.utils.get_formatted_stats(mean, sigma, ndigs=2)

convenience function for formatting mean +/- sigma to the right number of significant figures.

Parameters:
  • mean (float) – The mean value we want to format

  • sigma (float) – The sigma value we want to format

  • ndigs (int) – The number of significant digits we want to display

Return type:

tuple[str, str]

pygama.math.utils.get_par_names(func)

Return a list containing the names of the arguments of “func” other than the first argument. In pygamaland, those are the function’s “parameters.”

Parameters:

func (Callable) – A function whose parameters we want to return

Return type:

tuple[str, …]

pygama.math.utils.print_fit_results(pars, cov, func=None, title=None, pad=True)

Convenience function to write scipy.optimize.curve_fit results to the log

Parameters:
  • pars (ndarray) – The parameter values of the function func

  • func (Callable | None) – A function, if passed then the function’s parameters’ names are logged

  • title (str | None) – A title to log

  • pad (bool | None) – If True, adds spaces to the log messages

Returns:

None – Writes the curve_fit results to the log

Return type:

None

pygama.math.utils.sizeof_fmt(num, suffix='B')

given a file size in bytes, output a human-readable form. :param num: File size, in bytes :param suffix: Desired file size suffix

Return type:

str