AnalysisMixin¶

class cait.mixins.AnalysisMixin[source]¶

Bases: object

A Mixin to the DataHandler class with methods for the typical raw data analysis.

calc_calibration(starts_saturation: list, cpe_factor: list, max_dist: float = 0.5, exclude_tpas: list = [], plot: bool = False, only_stable: bool = False, cut_flag: list = None, interpolation_method: str = 'linear', poly_order: int = 5, only_channels: list = None, method: str = 'ph', name_appendix_ev: str = '', name_appendix_tp: str = '', return_pulser_models: bool = False, pulser_models: object = None, name_appendix_energy: str = '', rasterized: bool = True, **kwargs)[source]¶

Calculates the calibrated energies of all events with uncertainties.

The energy calibration is a two step process.

In the first step, we need an time continuous estimation of the pulse height for each injected TPA. This is either done with a spline fit, a linear regression or a gradient boosted regression tree. In the later two methods, an uncertainty estimation is included.

In the second step, we fit for the time stamp of every event a higher order polyonmial to the TPA/pulse height relation, for which we value estimations at discrete TPA points. From this we can estimate a TPA value corresponding to the pulse height of the triggered event. The TPA is in a linear relation with the recoil energy, which is determined by the CPE factor. In the fitting process with a polynomial, we also include the uncertainties in the estimated pulse heights of the test pulses with an orthogonal distance relation, linear error propagation and the calculation of a prediction interval.

Additional key word arguments get passed to the regressor model.

This method was described in M. Stahlberg, Probing low-mass dark matter with CRESST-III : data analysis and first results, available via https://doi.org/10.34726/hss.2021.45935 (accessed on the 9.7.2021).

Parameters

starts_saturation (list of floats) – The pulse heights (V) at which the saturation of the pulses starts, for each channel.
cpe_factor (list of floats) – The CPE factors for all channels.
max_dist (float) – If two testpulses are more than this interval (in hours) apart, a new spline or linear regression model is started for the consecutive region. If the regression tree is used for the time continuous pulse height estimation, this argument is not used.
exclude_tpas (list of floats) – Testpulses with these TPA values are excluded from the energy calibration. This is useful, if there are only very few pulses with a certain TPA value.
plot (bool) – If set, the continuous pulse height estimation and the TPA/PH polynomial fit are plotted.
only_stable (bool) – If set, only stable test pulses are included in the energy calibration.
cut_flag (list of bools) – If provided, this is list of bool values, that determines which test pulses are to be included in the energy calibration.
interpolation_method (str) – If ‘linear’, we take linear regressions for the continuous pulse height estimation and include an uncertainty estimation. If ‘tree, we take a gradient boosted regression tree for the continuous pulse height estimation and include an uncertainty estimation.
poly_order (int) – The order of the polynomial that we fit to describe the TPA/PH relation. This should be between 3 and 5.
only_channels (list of ints or None) – If set, the calibration is done only on the channels that are handed here.
method (string) – Either ‘ph’ (main parameter pulse height), ‘of’ (optimum filter), ‘sef’ (standard event fit) or ‘true_ph’ (in case of simulated events - here you probably want to hand pulser models as well). Test pulse heights and event heights are then estimated with this method for the calibration.
name_appendix_ev (string) – This is appended to the event pulse height estimation method, e.g. ‘_down16’.
name_appendix_tp (string) – This is appended to the test pulse height estimation method, e.g. ‘_down16’.
return_pulser_models (bool) – If set to true, a list of the used PulserModels is returned.
pulser_models (list of instances of PulserModel) – Here a list of PulserModels that shall be used can be passed. This is useful in case the Calibration is done on one file with test pulses, but you want to predict the TPA equivalent values of another data set, e.g. the resolution data set, with the same pulser models.
rasterized (tuple) – The scatter plot gets rasterized (much faster).

>>> dh.calc_calibration(starts_saturation=[1.5, 0.8],
...                     cpe_factor=[1, 1],
...                     exclude_tpas=[0.01],
...                     plot=True,
...                     method='tree',
...                     poly_order=3,
...                     )
Energy Calibration for Channel  0
Unique TPAs:  [ 0.02        0.1         0.2         0.40000001  0.60000002  0.80000001
  1.          2.          3.          4.          5.          6.
  7.          8.          9.         10.        ]
Plot Regression Polynomial at 20.8 hours.
Calculating Recoil Energies: 0.0 %
Calculating Recoil Energies: 65.3 %
Energy Calibration for Channel  1
Unique TPAs:  [ 0.02        0.1         0.2         0.40000001  0.60000002  0.80000001
  1.          2.          3.          4.          5.          6.
  7.          8.          9.         10.        ]
Plot Regression Polynomial at 20.8 hours.
Calculating Recoil Energies: 0.0 %
Calculating Recoil Energies: 65.3 %
Finished.

calc_controlpulse_stability(channel: int, significance: float = 3, max_gap: float = 0.5, lb: float = 0, ub: float = 100, instable_iv: list = None)[source]¶

Do a stability cut on the control pulses.

In the stability cut we assign a boolean value to each event, if it is within a stable region or not. Stable regions are defined as the interval between two stable control pulses. A control pulse is stable, if its height is within a certain number of standard deviations of the average control pulse height. Single outlying control pulses are ignored. Control pulse heights higher than a certain maximal value or lower than a certain minimal value are also ignored. If for a duration of more than a certain interval no control pulses appear in the data, the region is automatically counted as unstable.

This method is described in “CRESST Collaboration, First results from the CRESST-III low-mass dark matter program” (10.1103/PhysRevD.100.102002).

Parameters

channel (int) – The number of the channel on that we calculate the cut in the HDF5 file.
significance (float) – Pulse heights further than this factor times the pulse height standard deviation away from the mean pulse height are counted as unstable.
max_gap (float) – Intervals longer than this value (in minutes) without control pulses are automatically counted as unstable.
lb (float) – Pulse heights lower than this value are ignored.
ub (float) – Pulse heights higher than this value are ignored.
instable_iv – A list of the instable intervals. If this is handed, the instable intervals are not calculated but those are used. Useful for e.g. the cut efficiency.

Rtype instable_iv

list

calc_light_correction(scintillation_efficiency: float, channels_to_calibrate: list = [0], light_channel: int = 1)[source]¶

Calculate the correction of the energy estimation that comes from the scintillation of light.

When a recoil happens inside an absorber crystal, both a phonon and a light signal emerge, the recoil energy is therefore split apart. If we only uns the phonon channel to estimate the recoil enery, we have an error depending on the share of energy that went into scintillation light. It is therefor necessary to correct the energy of the phonon channel with a light-energy-dependent factor.

This method was described in “F. Reindl (2016), Exploring Light Dark Matter With CRESST-II Low-Threshold Detectors”, available via http://mediatum.ub.tum.de/?id=1294132 (accessed on the 9.7.2021).

Parameters

scintillation_efficiency (float >0, <1) – The share of the recoil energy that is turned into scintillation light.
channels_to_calibrate (list) – All channels that scintillate light and are used as energy estimators (typically the phonon channel).
light_channel (int) – The number of the channel that is the light channel, typically 1, but e.g. 2 for Gode modules.

calc_rate_cut(interval: float = 10, significance: float = 3, min: float = 0, max: float = 60, intervals: list = None, use_poisson=True)[source]¶

Calculate a rate cut on the events.

The rate cut assignes a bool value to each event, that tells if the event is in a region with normal or anomalous rate. For this, we measure the event rate in every interval (typically 10 minutes) and exclude intervals from the analysis with a rate that is not within a certain number of standard deviations of the average rate per interval. We exclude intervals with rate that exceed a certain maximal value or subceed a certain minimal value from the calculation of the average rate. This renders the calculation robust against intervals in which the TES is superconducting (no events) or triggers only in the noise, e.g. due to warming up of the cryostat.

This method is described in “CRESST Collaboration, First results from the CRESST-III low-mass dark matter program” (10.1103/PhysRevD.100.102002).

Parameters

interval (float) – The interval length in minutes that is compared.
significance (float) – Rates that are by more than this factor times the standard deviation of the rates away from the average rate are excluded.
min (float) – Rates that are lower than this value are excluded from the calculation of the average rate.
max (float) – Rates that are higher than this value are excluded from the calculation of the average rate.
use_poisson (bool) – If this is activated (per default) we use the median and poisson confidence intervals instead of standard normal statistics.
intervals – A list of the stable intervals, in hours. If this is handed, these intervals are used instead of calculating them from scratch. This is useful e.g. for the cut efficiency.

Rtype intervals

list of 2-tuples

calc_resolution(ph_intervals: list, pec_factors: list = None, fit_gauss: bool = True, of_filter: bool = False, sev_fit: bool = False, use_tp: bool = False)[source]¶

This function calculates the resolution of a detector from the testpulses or from simulated events.

The pulse height intervals need to include only one low-energetic peak in the spectrum. Calculation can take a gauss fit of the peak or take the sample std deviation. The pulse height can be determined with the raw pulse height, the pulse height after filtering with the optimum filter and as estimation with an standard event fit. To use the raw pulse height on simulated events, set the bool arguments of_filter, sev_fit and use_tp all to false.

This method is described in “CRESST Collaboration, First results from the CRESST-III low-mass dark matter program” (10.1103/PhysRevD.100.102002).

Parameters

ph_intervals (list of float 2-tuples) – The upper and lower bounds of the peak in the pulse height spectrum to calculate the resolution.
pec_factors (list of floats or None) – The PEC factor in keV for each channel to calculate the resolution in energy. This is the linearization of the energy calibration, calculated by dividing the energy of the calibration peak by its height in Volt. Multiplication of a volt pulse height with this factor gives a rough estimation of the recoil energy. The exact energy calibration uses an energy-dependent PEC factor. If this argument is None, the output is in mV instead.
fit_gauss (bool) – If this argument is true, the width of the peak is estimated with a gauss fit rather than with the sample standard deviation.
of_filter (bool) – If this argument is true, the filtered pulse height is taken for the energy estimation. This option is prioritized against sev_fit.
sev_fit (bool) – If this argument is true, the standard event fit is taken for the pulse height estimation.
use_tp (bool) – If this argument is true, the raw pulse height of a test pulse is taken for the resolution estimation. This option is prioritized against of_filter and sev_fit.

Returns

The calculated resolutions and the mean values of the peaks.

Return type

tuple of two numpy arrays

>>> resolutions, mus = dh.calc_resolution(cpe_factors=[6, 12], ph_intervals=[(0.25,0.35), (0.25,0.35)], use_tp=True)
Calculating resolution.
Resolution channel 0: 13.197 eV (mean 1.747 keV, calculated with Testpulses)
Resolution channel 1: 25.183 eV (mean 3.619 keV, calculated with Testpulses)

calc_testpulse_stability(channel: int, significance: float = 3, noise_level: float = 0.005, max_gap: float = 0.5, ub: float = None, lb: float = None)[source]¶

Do a stability cut on the test pulses.

The stability for the test pulses is similar to the stability cut for the control pulses, with the difference that we calculate average pulse height values for each individual test pulse amplitude value for the declaration of unstable testpulses. This cut is especially needed for the energy calibration, where single outlying test pulses can disturb the fit curve between TPA values and pulse heights.

In the stability cut we assign a boolean value to each event, if it is within a stable region or not. Stable regions are defined as the interval between two stable test pulses. A test pulse is stable, if its height is within a certain number of standard deviations of the average test pulse height with its TPA. Single outlying test pulses are ignored. Test pulse heights higher than a certain maximal value or lower than a certain minimal value are also ignored. If for a duration of more than a certain interval no control pulses appear in the data, the region is automatically counted as unstable.

Parameters

channel (int) – The number of the channel on that we calculate the cut in the HDF5 file.
significance (float) – Pulse heights further than this factor times the pulse height standard deviation away from the mean pulse height are counted as unstable.
noise_level (float) – Test pulses lower than this value are generally ignored, as they are probably triggered noise instead of actual pulses.
max_gap (float) – Intervals longer than this value (in minutes) without test pulses are automatically counted as unstable.
lb (float) – Pulse heights lower than this value are ignored.
ub (float) – Pulse heights higher than this value are ignored.

exposure(detector_mass=None, max_dist=0.1, tp_exclusion_interval=1, return_values=False, exclude_instable=True)[source]¶

Calculate the exposure in the data set.

This method calculated the live time of the detector by excluding all test pulses, instable intervals and non-measurement times. If a detector mass is handed, the exposure is calculated as well. For each event, if any of the control pulses from all channels is unstable, or if the rate cut excluded it, the event is excluded.

Parameters

detector_mass (float) – The mass of the detector in kg.
max_dist (float) – The maximal distance between two test pulses in hours, such that the interval in between is still counted as measurement time.
tp_exclusion_interval – The time in seconds that has to be excluded for every test pulse. Typically this

is 1.5*length of record window, i.e. ~ a second for a window of length 16384 samples and 25 kHz sample frequency. :type tp_exclusion_interval: float :param return_values: If this is set to True, a tuple of (exposure, live_time) is returned. :type return_values: bool :param exclude_instable: If True, all intervals in between unstable control pulses or which are excluded by

the rate cut, are counted as dead time.