pygama.math package#
Useful functions for performing quick computations of statistical distributions, as well as helper functions for histogramming data
Subpackages#
- pygama.math.functions package
- Submodules
- pygama.math.functions.crystal_ball module
- pygama.math.functions.error_function module
- pygama.math.functions.exgauss module
- pygama.math.functions.exponential module
- pygama.math.functions.gauss module
- pygama.math.functions.gauss_on_exgauss module
- pygama.math.functions.gauss_on_linear module
- pygama.math.functions.gauss_on_step module
- pygama.math.functions.gauss_on_uniform module
- pygama.math.functions.hpge_peak module
- pygama.math.functions.linear module
- pygama.math.functions.moyal module
- pygama.math.functions.poisson module
- pygama.math.functions.polynomial module
- pygama.math.functions.pygama_continuous module
- pygama.math.functions.step module
- pygama.math.functions.sum_dists module
_precompute_fracs()_precompute_shape_par_idx()get_idx()sum_dist_frozensum_distssum_dists._argcheck()sum_dists._argcheck_rvs()sum_dists._attach_methods()sum_dists._cdf()sum_dists._construct_argparser()sum_dists._link_pars()sum_dists._pdf()sum_dists._ppf()sum_dists._ppf_to_solve()sum_dists._rvs()sum_dists.cdf_ext()sum_dists.draw_cdf()sum_dists.draw_pdf()sum_dists.get_cdf()sum_dists.get_fwhm()sum_dists.get_mu()sum_dists.get_pdf()sum_dists.get_req_args()sum_dists.get_total_events()sum_dists.link_pars()sum_dists.norm_cdf()sum_dists.norm_pdf()sum_dists.pdf_ext()sum_dists.rvs()
- pygama.math.functions.triple_gauss_on_double_step module
- pygama.math.functions.uniform module
Submodules#
pygama.math.binned_fitting module#
pygama convenience functions for fitting binned data
- pygama.math.binned_fitting.fit_binned(func: Callable, hist: ndarray, bins: ndarray, var: Optional[ndarray] = None, guess: Optional[ndarray] = None, cost_func: str = 'LL', Extended: bool = True, simplex: bool = False, bounds: Optional[tuple[tuple[float, float], ...]] = None, fixed: Optional[tuple[int, ...]] = None) tuple[numpy.ndarray, ...]#
Do a binned fit to a histogram.
Default is Extended Log Likelihood fit, with option for either Least Squares or other cost function.
- Parameters:
func (Callable) – the function to fit, if using LL as method needs to be a cdf
hist (ndarray) – histogrammed data
bins (ndarray) – histogrammed data
cost_func (str) – cost function to use
Extended (bool) – run extended or non extended fit
simplex (bool) – whether to include a round of simpson minimisation before main minimisation
bounds (Optional[tuple[tuple[float, float], ...]]) – list of tuples with bounds can be None, e.g. [(0,None), (0,10)]
fixed (Optional[tuple[int, ...]]) – list of parameter indices to fix
- Returns:
coeff – Returned fit parameters
error – Iminuit errors
cov_matrix – Covariance matrix
- Return type:
tuple[numpy.ndarray, …]
- pygama.math.binned_fitting.gauss_mode(hist: ndarray, bins: ndarray, **kwargs) tuple[float, float]#
Alias for gauss_mode_max that just returns the mode (position) of a peak
See also
- pygama.math.binned_fitting.gauss_mode_max(hist: ndarray, bins: ndarray, **kwargs) tuple[numpy.ndarray, ...]#
Alias for gauss_mode_width_max that just returns the max and mode
See also
- Returns:
pars (ndarray with the parameters (mode, maximum) of the gaussian fit) – mode : the estimated x-position of the maximum maximum : the estimated maximum value of the peak
cov (2x2 ndarray of floats) – The covariance matrix for the 2 parameters in pars
- Return type:
tuple[numpy.ndarray, …]
Examples
>>> import pygama.math.histograms as pgh >>> from numpy.random import normal >>> import pygama.math.peak_fitting as pgf >>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5)) >>> pgf.gauss_mode_max(hist, bins, var=var, n_bins=20)
- pygama.math.binned_fitting.gauss_mode_width_max(hist: ndarray, bins: ndarray, var: Optional[ndarray] = None, mode_guess: Optional[float] = None, n_bins: Optional[int] = 5, cost_func: str = 'Least Squares', inflate_errors: Optional[bool] = False, gof_method: Optional[str] = 'var') tuple[numpy.ndarray, ...]#
Get the max, mode, and width of a peak based on gauss fit near the max Returns the parameters of a gaussian fit over n_bins in the vicinity of the maximum of the hist (or the max near mode_guess, if provided). This is equivalent to a Taylor expansion around the peak maximum because near its maximum a Gaussian can be approximated by a 2nd-order polynomial in x:
\[A \exp[ -(x-\mu)^2 / 2 \sigma^2 ] \simeq A [ 1 - (x-\mu)^2 / 2 \sigma^2 ] = A - (1/2!) (A/\sigma^2) (x-\mu)^2\]The advantage of using a gaussian over a polynomial directly is that the gaussian parameters are the ones we care about most for a peak, whereas for a poly we would have to extract them after the fit, accounting for covariances. The gaussian also better approximates most peaks farther down the peak. However, the gauss fit is nonlinear and thus less stable.
- Parameters:
hist (ndarray) – The values of the histogram to be fit
bins (ndarray) – The bin edges of the histogram to be fit
var (Optional[ndarray]) – The variances of the histogram values. If not provided, square-root variances are assumed.
mode_guess (Optional[float]) – An x-value (not a bin index!) near which a peak is expected. The algorithm fits around the maximum within +/- n_bins of the guess. If not provided, the center of the max bin of the histogram is used.
n_bins (Optional[int]) – The number of bins (including the max bin) to be used in the fit. Also used for searching for a max near mode_guess
cost_func (str) – Passed to fit_binned()
inflate_errors (Optional[bool]) – If true, the parameter uncertainties are inflated by sqrt(chi2red) if it is greater than 1
gof_method (Optional[str]) – method flag for goodness_of_fit
- Returns:
pars (ndarray containing the parameters (mode, sigma, maximum) of the gaussian fit) –
mode : the estimated x-position of the maximum
sigma : the estimated width of the peak. Equivalent to a gaussian width (sigma), but based only on the curvature within n_bins of the peak. Note that the Taylor-approxiamted curvature of the underlying function in the vicinity of the max is given by max / sigma^2
maximum : the estimated maximum value of the peak
cov (3x3 ndarray of floats) – The covariance matrix for the 3 parameters in pars
- Return type:
tuple[numpy.ndarray, …]
- pygama.math.binned_fitting.goodness_of_fit(hist: ndarray, bins: ndarray, var: ndarray, func: Callable, pars: ndarray, method: str = 'var', scale_bins: bool = False) tuple[float, int]#
Compute chisq and dof of fit
- Parameters:
hist (ndarray) – histogram data. var can be None if hist is integer counts
bins (ndarray) – histogram data. var can be None if hist is integer counts
var (ndarray) – histogram data. var can be None if hist is integer counts
func (Callable) – the function that was fit to the hist
pars (ndarray) – the best-fit pars of func. Assumes all pars are free parameters
method (str) – Sets the choice of “denominator” in the chi2 sum ‘var’: user passes in the variances in var (must not have zeros) ‘Pearson’: use func (hist must contain integer counts) ‘Neyman’: use hist (hist must contain integer counts and no zeros)
- Returns:
chisq – the summed up value of chisquared
dof – the number of degrees of freedom
- Return type:
- pygama.math.binned_fitting.poisson_gof(pars: ndarray, func: Callable, hist: ndarray, bins: ndarray, is_integral: bool = False, **kwargs) float#
Calculate the goodness of fit for the Poisson likelihood.
- Parameters:
- Returns:
Poisson G.O.F.
- Return type:
Notes
The Poisson likelihood does not give a good GOF until the counts are very high and all the poisson stats are roughly gaussian and you don’t need it anyway. But the G.O.F. is calculable for the Poisson likelihood. So we do it here.
- pygama.math.binned_fitting.taylor_mode_max(hist: ndarray, bins: ndarray, var: Optional[ndarray] = None, mode_guess: Optional[float] = None, n_bins: int = 5) tuple[numpy.ndarray, ...]#
Get the max and mode of a peak based on Taylor exp near the max Returns the amplitude and position of a peak based on a poly fit over n_bins in the vicinity of the maximum of the hist (or the max near mode_guess, if provided)
- Parameters:
hist (ndarray) – The values of the histogram to be fit. Often: send in a slice around a peak
bins (ndarray) – The bin edges of the histogram to be fit
var (Optional[ndarray]) – The variances of the histogram values. If not provided, square-root variances are assumed.
mode_guess (Optional[float]) – An x-value (not a bin index!) near which a peak is expected. The algorithm fits around the maximum within +/- n_bins of the guess. If not provided, the center of the max bin of the histogram is used.
n_bins (int) – The number of bins (including the max bin) to be used in the fit. Also used for searching for a max near mode_guess
- Returns:
pars (2-tuple with the parameters (mode, max) of the fit) – mode : the estimated x-position of the maximum maximum : the estimated maximum value of the peak
cov (2x2 matrix of floats) – The covariance matrix for the 2 parameters in pars
- Return type:
tuple[numpy.ndarray, …]
Examples
>>> import pygama.analysis.histograms as pgh >>> from numpy.random import normal >>> import pygama.analysis.peak_fitting as pgf >>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5)) >>> pgf.taylor_mode_max(hist, bins, var, n_bins=5)
pygama.math.distributions module#
Contains a list of distribution functions, all implemented using Numba’s
numba.jit() to take advantage of the just-in-time speed boost.
pygama.math.histogram module#
pygama convenience functions for 1D histograms.
1D hists in pygama require 3 things available from all implementations of 1D histograms of numerical data in python: hist, bins, and var: - hist: an array of histogram values - bins: an array of bin edges - var: an array of variances in each bin If var is not provided, pygama assuems that the hist contains “counts” with variance = counts (Poisson stats)
These are just convenience functions, provided for your convenience. Hopefully they will help you if you need to do something trickier than is provided (e.g. 2D hists).
- pygama.math.histogram.better_int_binning(x_lo: float = 0, x_hi: Optional[float] = None, dx: Optional[float] = None, n_bins: Optional[float] = None) tuple[int, int, int, int]#
Get a good binning for integer data.
Guarantees an integer bin width.
At least two of x_hi, dx, or n_bins must be provided.
- Parameters:
- Returns:
x_lo – int values for best x_lo
x_hi – int values for best x_hi, returned if x_hi is not None
dx – best int bin width, returned if arg dx is not None
n_bins – best int n_bins, returned if arg n_bins is not None
- Return type:
- pygama.math.histogram.find_bin(x: float, bins: ndarray) int#
Returns the index of the bin containing x Returns -1 for underflow, and len(bins) for overflow For uniform bins, jumps directly to the appropriate index. For non-uniform bins, binary search is used.
- pygama.math.histogram.get_bin_centers(bins: ndarray) ndarray#
Returns an array of bin centers from an input array of bin edges. Works for non-uniform binning. Note: a new array is allocated
- pygama.math.histogram.get_bin_estimates(pars: ndarray, func: Callable, bins: ndarray, is_integral: bool = False, **kwargs) ndarray#
Bin expected means are estimated by f(bin_center)*bin_width. Supply an integrating function to compute the integral over the bin instead. TODO: make default integrating function a numerical method that is off by default.
- Parameters:
- Returns:
f(bin_center)*bin_width – The expected mean of a bin
- Return type:
See also
get_bin_widthsReturns the width of the bins
get_bin_centersFinds the bin centers of the supplied bins
- pygama.math.histogram.get_bin_widths(bins: ndarray) ndarray#
Returns an array of bin widths from an input array of bin edges. Works for non-uniform binning.
- pygama.math.histogram.get_fwfm(fraction: float, hist: ndarray, bins: ndarray, var: Optional[ndarray] = None, mx: Optional[Union[float, tuple[float, float]]] = None, dmx: Optional[float] = 0, bl: Optional[Union[float, tuple[float, float]]] = 0, dbl: Optional[float] = 0, method: str = 'bins_over_f', n_slope: int = 3) tuple[float, float]#
Estimate the full width at some fraction of the max of data in a histogram
Typically used by sending slices around a peak. Searches about argmax(hist) for the peak to fall by [fraction] from mx to bl
- Parameters:
fraction (float) – The fractional amplitude at which to evaluate the full width
hist (ndarray) – The histogram data array containing the peak
bins (ndarray) – An array of bin edges for the histogram
var (Optional[ndarray]) – An array of histogram variances. Used with the ‘fit_slopes’ method
mx (Optional[Union[float, tuple[float, float]]]) – The value to use for the max of the peak. If None, np.amax(hist) is used.
bl (Optional[Union[float, tuple[float, float]]]) – Used to specify an offset from which to estimate the FWFM.
method (str) –
- ‘bins_over_f’the simplest method: just take the difference in the bin
centers that are over [fraction] of max. Only works for high stats and FWFM/bin_width >> 1
- ’interpolate’interpolate between the bins that cross the [fraction]
line. Works well for high stats and a reasonable number of bins. Uncertainty incorporates var, if provided.
- ’fit_slopes’fit over n_slope bins in the vicinity of the FWFM and
interpolate to get the fractional crossing point. Works okay even when stats are moderate but requires enough bins that dx traversed by n_slope bins is approximately linear. Incorporates bin variances in fit and final uncertainties if provided.
n_slope (int) – Number of bins in the vicinity of the FWFM used to interpolate the fractional crossing point with the ‘fit_slopes’ method
- Returns:
fwfm, dfwfm – fwfm: the full width at [fraction] of the maximum above bl dfwfm: the uncertainty in fwfm
- Return type:
Examples
>>> import pygama.analysis.histograms as pgh >>> from numpy.random import normal >>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5)) >>> pgh.get_fwfm(0.5, hist, bins, var, method='bins_over_f') (2.2, 0.15919638684132664) # may vary
>>> pgh.get_fwfm(0.5, hist, bins, var, method='interpolate') (2.2041666666666666, 0.09790931254396479) # may vary
>>> pgh.get_fwfm(0.5, hist, bins, var, method='fit_slopes') (2.3083363869003466, 0.10939486522749278) # may vary
- pygama.math.histogram.get_fwhm(hist: ndarray, bins: ndarray, var: Optional[ndarray] = None, mx: Optional[Union[float, tuple[float, float]]] = None, dmx: Optional[float] = 0, bl: Optional[Union[float, tuple[float, float]]] = 0, dbl: Optional[float] = 0, method: str = 'bins_over_f', n_slope: int = 3) tuple[float, float]#
Convenience function for the FWHM of data in a histogram
Typically used by sending slices around a peak. Searches about argmax(hist) for the peak to fall by [fraction] from mx to bl
- Parameters:
fraction – The fractional amplitude at which to evaluate the full width
hist (ndarray) – The histogram data array containing the peak
bins (ndarray) – An array of bin edges for the histogram
var (Optional[ndarray]) – An array of histogram variances. Used with the ‘fit_slopes’ method
mx (Optional[Union[float, tuple[float, float]]]) – The value to use for the max of the peak. If None, np.amax(hist) is used.
bl (Optional[Union[float, tuple[float, float]]]) – Used to specify an offset from which to estimate the FWFM.
method (str) –
- ‘bins_over_f’the simplest method: just take the difference in the bin
centers that are over [fraction] of max. Only works for high stats and FWFM/bin_width >> 1
- ’interpolate’interpolate between the bins that cross the [fraction]
line. Works well for high stats and a reasonable number of bins. Uncertainty incorporates var, if provided.
- ’fit_slopes’fit over n_slope bins in the vicinity of the FWFM and
interpolate to get the fractional crossing point. Works okay even when stats are moderate but requires enough bins that dx traversed by n_slope bins is approximately linear. Incorporates bin variances in fit and final uncertainties if provided.
n_slope (int) – Number of bins in the vicinity of the FWFM used to interpolate the fractional crossing point with the ‘fit_slopes’ method
- Returns:
fwhm, dfwhm – fwfm: the full width at half of the maximum above bl dfwfm: the uncertainty in fwhm
- Return type:
See also
get_fwfmFunction that computes the FWFM
- pygama.math.histogram.get_gaussian_guess(hist: ndarray, bins: ndarray) tuple[float, float, float]#
given a hist, gives guesses for mu, sigma, and amplitude
- Parameters:
- Returns:
guess_mu – guess for the mu parameter of a Gaussian
guess_sigma – guess for the sigma parameter of a Gaussian
guess_area – guess for the A parameter of a Gaussian
- Return type:
- pygama.math.histogram.get_hist(data: ndarray, bins: Optional[Union[int, ndarray, str]] = None, range: Optional[tuple[float, float]] = None, dx: Optional[float] = None, wts: Optional[Union[float, ndarray]] = None) tuple[numpy.ndarray, ...]#
return hist, bins, var after binning data
This is just a wrapper for humba.histogram, with optional weights for each element and proper computing of variances.
Note: there are no overflow / underflow bins.
Available binning methods:
Default (no binning arguments) : 100 bins over an auto-detected range
bins=N, range=(x_lo, x_hi) : N bins over the specified range (or leave range=None for auto-detected range)
bins=[str] : use one of np.histogram’s automatic binning algorithms
bins=bin_edges_array : array lower bin edges, supports non-uniform binning
dx=dx, range=(x_lo, x_hi): bins of width dx over the specified range. Note: dx overrides the bins argument!
- Parameters:
data (ndarray) – The array of data to be histogrammed
bins (Optional[Union[int, ndarray, str]]) – int: the number of bins to be used in the histogram array: an array of bin edges to use str: the name of the np.histogram automatic binning algorithm to use If not provided, humba.histogram’s default auto-binning routine is used
range (Optional[tuple[float, float]]) – (x_lo, x_high) is the tuple of low and high x values to uses for the very ends of the bin range. If not provided, np.histogram chooses the ends based on the data in data
dx (Optional[float]) – Specifies the bin width. Overrides “bins” if both arguments are present
wts (Optional[Union[float, ndarray]]) – Array of weights for each bin. For example, if you want to divide all bins by a time T to get the bin contents in count rate, set wts = 1/T. Variances will be computed for each bin that appropriately account for each data point’s weighting.
- Returns:
hist – the values in each bin of the histogram
bins – an array of bin edges: bins[i] is the lower edge of the ith bin. Note: it includes the upper edge of the last bin and does not include underflow or overflow bins. So len(bins) = len(hist) + 1
var – array of variances in each bin of the histogram
- Return type:
tuple[numpy.ndarray, …]
- pygama.math.histogram.plot_hist(hist: ndarray, bins: ndarray, var: Optional[ndarray] = None, show_stats: bool = False, stats_hloc: float = 0.75, stats_vloc: float = 0.85, fill: bool = False, fillcolor: str = 'r', fillalpha: float = 0.2, **kwargs) None#
Plot a step histogram, with optional error bars
- Parameters:
hist (ndarray) – The histogram data array containing the peak
bins (ndarray) – An array of bin edges for the histogram
var (Optional[ndarray]) – An array of histogram variances. Used with the ‘fit_slopes’ method
show_stats (bool) – If True, shows the mean, mean error, and standard deviation of the histogram on the plot
stats_hloc (float) – matplotlib.pyplot horizontal location to place the stats
stats_vloc (float) – matplotlib.pyplot vertical location to place the stats
fill (bool) – If True, fills in a step histogram when plotting
fill_color – Color to fill the step histogram if fill is True
fill_alpha – Alpha amount if fill is True
- pygama.math.histogram.range_slice(x_min: float, x_max: float, hist: ndarray, bins: ndarray, var: Optional[ndarray] = None) tuple[numpy.ndarray, ...]#
Get the histogram bins and values for a given slice of the range
- Parameters:
- Return type:
tuple[numpy.ndarray, …]
See also
find_binfor parameters and return values
pygama.math.hpge_peak_fitting module#
pygama convenience functions for fitting hpge peak shape data
- pygama.math.hpge_peak_fitting.hpge_peak_fwhm(sigma: float, htail: float, tau: float, cov: Optional[float] = None) tuple[float, float]#
Return the FWHM of the hpge_peak function, ignoring background and step components. If calculating error also need the normalisation for the step function.
- Parameters:
sigma (float) – The width of the hpge_peak
htail (float) – The height of the tail in the hpge_peak
tau (float) – The characteristic scale in the extended Gaussian in the hpge_peak
cov (Optional[float]) – The covariant matrix of the previous parameters
Returns –
- FWHM, FWHM_uncertainty
The FWHM of the hpge_peak and its uncertainty
- Return type:
- pygama.math.hpge_peak_fitting.hpge_peak_parameter_gradient(E: float, pars: ndarray, step_norm: float) ndarray#
Computes the gradient of the hpge_peak parameters
- Parameters:
- Returns:
gradient – gradient of the n_sig, mu, sigma, h_tail, tau, n_bkg, and hstep parameters of the HPGe peak
- Return type:
pygama.math.least_squares module#
pygama convenience functions for linearly fitting data
- pygama.math.least_squares.fit_simple_scaling(x: ndarray, y: ndarray, var: Optional[Union[float, ndarray]] = 1) tuple[float, float]#
Fast computation of weighted linear least squares fit to a simple scaling
I.e. y = scale * x. Returns the best fit scale parameter and its variance.
- pygama.math.least_squares.linear_fit_by_sums(x: ndarray, y: ndarray, var: Optional[Union[float, ndarray]] = 1) tuple[float, float]#
Fast computation of weighted linear least squares fit to a linear model
Note: doesn’t compute covariances. If you want covariances, just use polyfit
- Parameters:
- Returns:
(m, b) – The slope (m) and y-intercept (b) of the best fit (in the least-squares sense) of the data to y = mx + b
- Return type:
pygama.math.unbinned_fitting module#
pygama convenience functions for fitting ubinned data
- pygama.math.unbinned_fitting.fit_unbinned(func: Callable, data: ndarray, guess: Optional[ndarray] = None, Extended: bool = True, cost_func: str = 'LL', simplex: bool = False, bounds: Optional[list[tuple[float, float], ...]] = None, fixed: Optional[list[bool, ...]] = None) tuple[numpy.ndarray, ...]#
Do a unbinned fit to data. Default is Extended Log Likelihood fit, with option for other cost functions.
- Parameters:
func (Callable) – the function to fit
data (ndarray) – the data values to be fit
Extended (bool) – run extended or non extended fit
cost_func (str) – cost function to use. LL is ExtendedUnbinnedNLL, None is for just UnbinnedNLL
simplex (bool) – whether to include a round of simpson minimisation before main minimisation
bounds (Optional[list[tuple[float, float], ...]]) – Each tuple is (min, max) for the corresponding parameters. Bounds can be None, e.g. [(0,None), (0,10)]
fixed (Optional[list[bool, ...]]) – list of parameter indices to fix
- Returns:
pars, errs, cov – the best-fit parameters and their errors / covariance
- Return type:
tuple[numpy.ndarray, …]
pygama.math.units module#
pygama.math.utils module#
pygama utility functions.
- pygama.math.utils.get_formatted_stats(mean: float, sigma: float, ndigs: int = 2) str#
convenience function for formatting mean +/- sigma to the right number of significant figures.
- pygama.math.utils.get_par_names(func: Callable) tuple[str, ...]#
Return a list containing the names of the arguments of “func” other than the first argument. In pygamaland, those are the function’s “parameters.”
- pygama.math.utils.print_fit_results(pars: ndarray, cov: ndarray, func: Optional[Callable] = None, title: Optional[str] = None, pad: Optional[bool] = True) None#
Convenience function to write scipy.optimize.curve_fit results to the log
- Parameters:
- Returns:
None – Writes the curve_fit results to the log
- Return type:
None