FitMixin

class cait.mixins.FitMixin[source]

Bases: object

A Mixin Class to the DataHandler Class with methods for the calculation of fits.

apply_array_fit(type='events', only_channels=None, sample_length=None, max_shift=20, truncation_level=None, processes=4, name_appendix='', group_name_appendix='', first_channel_dominant=False, use_this_array=None, blcomp=4, no_bl_when_sat=True)[source]

Calculates the array fit for all events of type (events or tp) and stores in HDF5 file.

The stored parameters are (pulse_height, onset_in_ms, bl_offset, bl_linear_coeffiient, quadratic, cubic). This method is different to apply_sev_fit, because we can use an arbitrary numerical array as standard event fit component, not only one described by the parameters of the pulse shape model. Per default, the numerical sev is used as sev component. Furthermore, the fit is split into a linear and nonlinear regression part, which provides significant speedup. Only the t0 parameter is subject to a nonlinear optimization problem, while the others are calculated by matrix inversion.

Parameters:

type (string) – Name of the group in the HDF5 set, either events or testpulses.
only_channels (list of ints) – Only these channels are fitted, the others are left as is or filled with zeros.
sample_length (float) – The length of a sample in milliseconds. If None, this is calculated from the sample frequency.
max_shift (float) – The maximal shift in ms allowed for the t0 value.
truncation_level (list of nmbr_channel floats) – The pulse height Volt value at that the detector saturation starts.
processes (int) – The number of workers for the fit.
name_appendix (string) – This gets appendend to the dataset name in the HDF5 set.
group_name_appendix (string) – This is appendend to the group name of the stdevent in the HDF5 set.
first_channel_dominant (bool) – Take the peak position from the first channel and evaluate the others at the same position.
use_this_array (2D numpy array) – List of the standardevents/array that are used for the fit. Shape: (nmbr_channels, record_length).
blcomp (int) – Either 1,2,3 or 4 - number of the baseline components, i.e. order + 1 of the polynomial assumed for baseline.

Deprecated since version 1.3.0: This will be removed in 2.0.0. Use DataHandler.apply_template_fit() instead.

apply_sev_fit(type='events', only_channels=None, sample_length=None, down=1, order_bl_polynomial=3, t0_bounds=(-20, 20), truncation_level=None, interval_restriction_factor=None, verb=False, processes=4, name_appendix='', group_name_appendix='', first_channel_dominant=False, use_saturation=False)[source]

Calculates the SEV fit for all events of type (events or tp) and stores in HDF5 file. The stored parameters are (pulse_height, onset_in_ms, bl_offset, bl_linear_coeffiient, quadratic, cubic).

Attention! Since v1.1 it is recommended to use apply_array_fit instead, which provides a more stable fit implementation.

This method was described in “F. Reindl, Exploring Light Dark Matter With CRESST-II Low-Threshold Detector”, available via http://mediatum.ub.tum.de/?id=1294132 (accessed on the 9.7.2021).

Parameters:

type (string) – Name of the group in the HDF5 set, either events or testpulses.
only_channels (list of ints) – Only these channels are fitted, the others are left as is or filled with zeros.
order_bl_polynomial (int) – Either 0,1,2 or 3 - the order of the polynomial assumed for baseline.
sample_length (float) – The length of a sample in milliseconds. If None, this is calculated from the sample frequency.
down (int) – The downsample factor for the fit, has to be a power of 2.
t0_bounds (2-tuple of ints) – The lower and upper bounds in milliseconds for the onset position.
truncation_level (list of nmbr_channel floats) – The pulse height Volt value at that the detector saturation starts.
interval_restriction_factor (2-tuple of ints) – Value between 0 and 1, the inverval of the event is restricted around 1/4 by this factor.
verb (bool) – Verbal feedback about the progress.
processes (int) – The number of workers for the fit.
name_appendix (string) – This gets appendend to the dataset name in the HDF5 set.
group_name_appendix (string) – This is appendend to the group name of the stdevent in the HDF5 set.
first_channel_dominant (bool) – Take the peak position from the first channel and evaluate the others at the same position.

Deprecated since version 1.1.0: This will be removed in 2.0.0. Use DataHandler.apply_template_fit() instead.

apply_template_fit(group: str, sev: ndarray, bl_poly_order: int | List[int] = 3, truncation_limit: float | List[float] = None, correlated: bool = False, fit_onset: bool | List[bool] = True, max_shift: int = 50, only_channels: int | List[int] = None, event_flag: ndarray = None, with_processing: Callable | List[Callable] = None, tag: str = '', preview: bool = False, **kwargs)[source]

Perform a (correlated) template fit for for events of a specified group, i.e. fit a numeric SEV to data with possibility to also specify a polynomial baseline model (for each channel individually) and a truncation limit (for each channel individually). The ‘correlated’ in this context means that you can choose which of the channel’s onset should be fitted (possibly multiple, see below). See https://edoc.ub.uni-muenchen.de/23762/ and https://mediatum.ub.tum.de/?id=1294132 for details.

See cait.versatile.TemplateFit and cait.versatile.TemplateFitCorrelated for more details.

Parameters:

group (str) – The DataHandler group with the events that should be fitted. The fit parameters will be saved to this group, too.
sev (np.ndarray) – The template (SEV) to use in the fit (with as many channels as you want to fit).
bl_poly_order (Union[int, List[int]], optional) – List of the baseline models to use in the fit (one entry for each channel that you want to fit). Has to be a non-zero integer or None. If 0, a constant offset is fitted, if 1, a linear baseline is assumed, etc. If None, the baseline is assumed to be constantly 0 (here, it’s the users responsibility to remove the baseline accordingly). If only None or an integer is provided, this value is used for all fitted channels, defaults to 3, i.e. fitting a cubic baseline for all channels.
truncation_limit (Union[float, List[float]], optional) – List with as many entries as there are channels to fit. For each entry that is not None, a truncated fit is performed: all samples between the first and the last sample above ‘truncation_limit’ are ignored in the fit. To determine these samples, the baseline of the event is removed by fitting a linear polynomial to the beginning of the record window. If only None or a float is provided, this value is used for all fitted channels. Defaults to None, i.e. not performing a truncated fit in any channel.
correlated (bool, optional) – If True, a correlated fit is performed, i.e. the SEV is shifted for all channels simultaneously. Depending on ‘fit_onset’ (see below and also examples), different behavior can be achieved. If False, each channel is fitted independently of the others, defaults to False.
fit_onset (Union[bool, List[bool]], optional) – List with as many entries as there are channels to fit. For each entry that is True, the onset value of the respective channel is fitted. If correlated=False, the onsets are fitted independently. If correlated=True, all channels for which fit_onset=True participate in the onset fit (common minimization): If only one of the entries is True, this channel is the ‘dominant’ one, i.e. its onset is fitted and all other channels are moved (passively) in unison (and only their pulse height + baseline is fitted). If multiple are True, their chi-squared for the fit is combined, i.e. the template is still moved in unison, but the minimizer considers all channels. If only one boolean is provided, it is used for all fitted channels. Defaults to True, i.e. fit onset in all channels
max_shift (int, optional) – The maximum shift value (in samples) to search for a minimum. The onset fit will search the minimum for shifts in (-max_shift, +max_shift).
only_channels (Union[int, List[int]], optional) – If you only want to fit some of the channels in ‘group’, you can specify the channel index/indices here. Note that the size of ‘sev’, ‘bl_poly_order’, ‘truncation_limit’, and ‘fit_onset’ have to match the number of channels to be fitted, i.e. if you fit two channels, the ‘sev’ also has to have two channels. Defaults to None, i.e. fitting all channels in ‘group’.
event_flag (np.ndarray, optional) – A boolean flag. If you don’t want to fit all events in ‘group’, you can specify a flag for which to fit here. Events that were not fit, receive an RMS value of -404 in the output dataset. Has to have the same length as there are events in ‘group’ and applies to all channels. Defaults to None, i.e. fit all events.
with_processing (Union[Callable, List[Callable]], optional) – Optional processing to apply to each event before TemplateFit is applied. See add_processing().
tag (str, optional) – A string that is appended to the datasets when they are saved to the DataHandler (e.g. if you want to perform fits for different pulse shapes). This string is appended with a hyphen, i.e. for tag="wafer" this would result in datasets like templatefit_pars-wafer. Defaults to an empty string, i.e. no tag.
preview (bool, optional) – If True, an interactive preview illustrating the fit using the current input arguments on the event traces opens up. Defaults to False
kwargs (any, optional) – Additional keyword arguments passed to cait.versatile.TemplateFit or cait.versatile.TemplateFitCorrelated (depending on the ‘correlated’ keyword).

Example:

import cait as ai
import cait.versatile as vai

# You should have a DataHandler (dh) and a SEV by now.
# If you saved your SEV in a file, load it using
# sev = vai.SEV.from_file('path/to/sev')
# if you saved it in the dh, use
# sev = vai.SEV.from_dh(dh)
# if you want to use a numpy array 'sev', that is also fine.

# Example 0: Do not fit anything yet, only show what it would look like
dh.apply_template_fit(
    "events", # 'events' group has two channels
    sev, # must have two channels
    bl_poly_order=[1, 3], # separate baseline polynomial orders for each channel
    correlated=False, # do NOT correlate the channels
    preview=True # No fit is performed but it is shown what it would look like
)

# Example 1: Fit all channels separately with their respective SEV.
# In this case, we assume two channels, but it works with arbitrarily many.
dh.apply_template_fit(
    "events", # 'events' group has two channels
    sev, # must have two channels, too
    bl_poly_order=[1, 3], # separate baseline polynomial orders for each channel
    correlated=False, # do NOT correlate the channels
)

# To run a new fit, we have to delete the old results:
def drop_tf_ds():
    dh.drop("events", "templatefit_pars")
    dh.drop("events", "templatefit_rms")
    dh.drop("events", "templatefit_shift")
drop_tf_ds()
# Alternatively, you could also rename them:
dh.rename("events_new", templatefit_pars="tf_pars_old", templatefit_rms="tf_rms_old", templatefit_shift="tf_shift_old")

# Example 2: Fit only one of the two channels (the results of the other channel will
# be padded)
dh.apply_template_fit(
    "events",
    sev[0], # select only one of the SEV (corresponding to chosen channel)
    bl_poly_order=3, # specify only one
    only_channel=0, # choose one of the channels
)

drop_tf_ds()

# Example 3: Correlated template fit with first channel being the 'dominant' one
dh.apply_template_fit(
    "events",
    sev, # use all channels again
    bl_poly_order=3, # if we just say '3' here, it is used for both channels
    correlated=True, # here we activate the correlated fit
    fit_onset=[True, False] # Onset of channel 1 is fitted, but the one for 0 is passively moved along
)

drop_tf_ds()

# Example 4: Correlated template fit with common minimization in onset fit
dh.apply_template_fit(
    "events",
    sev,
    bl_poly_order=3,
    correlated=True,
    fit_onset=[True, True] # Onset of both channels is fitted together
)
drop_tf_ds()

# Example 5: Correct pulses for flux quantum losses before fitting
dh.apply_template_fit(
        "events",
        sev,
        with_processing = vai.FluxQuantumLossCorrection(),
        )

calc_bl_coefficients(type='noise', down=1)[source]

Calcualted the fit coefficients with a cubic polynomial on the noise baselines.

Parameters:

type (string) – The group name in the HDF5 set, should be noise.
down (int) – The baselines are downsampled by this factor before the fit.

calc_parametric_fit(path_h5=None, type='events', processes=4)[source]

Calculate the Parameteric Fit for the Events in an HDF5 File.

This methods was described in “(1995) F. Pröbst et. al., Model for cryogenic particle detectors with superconducting phase transition thermometers.”

Parameters:

path_h5 (string) – Optional, the full path to the hdf5 file, e.g. “data/bck_001.h5”.
type (string) – Either events or testpulses.
processes (int) – The number of processes to use for the calculation.

calc_saturation(channel=0, only_idx=None)[source]

Fit a logistics curve to the testpulse amplitudes vs their pulse heights.

This method was used to describe the detector saturation in “M. Stahlberg, Probing low-mass dark matter with CRESST-III : data analysis and first results”, available via https://doi.org/10.34726/hss.2021.45935 (accessed on the 9.7.2021).

Parameters:

channel (int) – The channel for that we calculate the saturation.
only_idx (list of ints) – Only these indices are used in the fit of the saturation.

estimate_trigger_threshold(channel, detector_mass, allowed_noise_triggers=1, sigma_x0=2, method='of', bins=200, yran=None, xran=None, xran_hist=None, ul=30, ll=0, cut_flag=None, plot=True, title=None, sample_length=None, record_length=None, interval_restriction=0.75, binned_fit=False, model='gauss', ylog=False, save_path=None, return_plotting_data=False)[source]

Estimate the trigger threshold to obtain a given number of noise triggers per exposure.

The method assumes a Gaussian sample distribution of the noise, following “A method to define the energy threshold depending on noise level for rare event searches” (arXiv:1711.11459). There are multiple extensions implemented, that descibe additional Gaussian mixture or non-Gaussian components. A more extensive description can be found in the corresponding tutorial.

Parameters:

channel (int) – The number of the channel for that we estimate the noise trigger threshold.
detector_mass (float) – The mass of the detector in kg.
allowed_noise_triggers (float) – The number of noise triggers that are allowed per kg day exposure.
sigma_x0 (float) – A start value for the baseline resolution. Is only used for the unbinned fit.
method (string) – Either ‘of’ for estimating the noise triggers after optimal filtering or ‘ph’ for taking the maximum value of the raw data.
bins (int) – The number of bins for the histogram plots.
yran (tuple of two floats) – The range of the y axis on both plots.
xran (tuple of two floats) – The range of the x axis on the noise trigger estimation plot.
xran_hist (tuple of two floats) – The range of the x axis on the histogram plot.
ul (float) – The upper limit of the interval that is used to search a threshold, in mV.
ll (float) – The lower limit of the interval that is used to search a threshold, in mV.
cut_flag (list of bool) – A list of boolean values that determine which events are excluded from the calculation.
plot (bool) – If True, a plot of the fit and the noise trigger estimation are shown.
title (string) – A title for both plots.
sample_length (float) – The length of a sample in seconds. If None, it is calculated from the sample frequency.
record_length (int) – The number of samples within a record window.
interval_restriction
binned_fit (bool) – Not recommended. If chosen, the model is fit with least squared to the histogram. Otherwise an unbinned likelihood fit is performed.
model (string) – Determine which model is fit to the noise. - ‘gauss’: Model of purely Gaussian noise. - ‘pollution_exponential’: Model of Gaussian noise with one exponentially distributed sample on each baseline. - ‘fraction_exponential’: Mixture model of Gaussian and exponentially distributed noise. - ‘pollution_gauss’: Model of Gaussian noise and one sample in each baseline that follows another, also Gaussian distribution. - ‘fraction_gauss’: Mixture model of two Gaussian noise components.
ylog (bool) – If set, the y axis is plotted logarithmically on the histogram plot.
save_path (string) – A path to save the plots.
return_plotting_data (bool) – Instead of plotting, the data for plotting can be returned, allowing for custom plots.