FeaturesMixin

class cait.mixins.FeaturesMixin[source]

Bases: object

A Mixin Class to the DataHandler Class with methods for the calculation of features of the data.

apply_logical_cut(cut_flag: list, naming: str, channel: int, type: str = 'events', delete_old: bool = False)[source]

Save the cut flag of a logical cut within the HDF5 file.

Parameters
  • cut_flag (list of bools) – The cut flag that we want to save.

  • naming (string) – The naming of the dataset to save.

  • channel (int) – The channel for that the cut flag is meant.

  • type (string) – The naming of the group in the HDF5 file, in that we want to save the cut flag, e.g. ‘events’.

  • delete_old (bool) – If true, the old dataset of this name in the group ‘type’ gets deleted.

apply_of(type='events', name_appendix_group: str = '', name_appendix_set: str = '', chunk_size=10000, hard_restrict=False, down=1, window=True, first_channel_dominant=False, baseline_model='constant', pretrigger_samples=500, onset_to_dominant_channel=None, flexibility=1)[source]

Calculates the height of events or testpulses after applying the optimum filter.

Parameters
  • type (string) – The group name in the HDF5 set, either events of testpulses.

  • name_appendix_group (string) – A string that is appended to the stdevent group in the HDF5 set.

  • name_appendix_set (string) – A string that is appended to the of_ph set in the HDF5 set.

  • chunk_size (int) – The size how many events are processes simultaneously to avoid memory error.

  • hard_restrict (bool) – If True, the maximum search is restricted to 20-30% of the record window.

  • down (int) – The events get downsampled with this factor before application of the filter.

  • window (bool) – If true, a window function is applied to the record window, before filtering. This is recommended, to avoid artifacts from left-right offset differences of the baseline.

  • first_channel_dominant (bool) – Take the maximum position from the first channel and evaluate the others at the same position.

  • baseline_model (str) – Either ‘constant’, ‘linear’ or ‘exponential’. The baseline model substracted from all events.

  • pretrigger_samples (int) – The number of samples from start of the record window that are considered the pre trigger region.

  • onset_to_dominant_channel (list of ints) – The difference in the onset value to the dominant channel. If e.g. the second channel has a typical max_pos value of 4000, but the first of 4100, then the onset for this would be -100.

  • flexibility (int) – In case a peak position is provided, the maximum search can still deviate by this amount of samples.

calc_additional_mp(type='events', path_h5=None, down=1, no_of=False)[source]

Calculate the additional Main Parameters for the Events in an HDF5 File.

Parameters
  • type (string) – The group name within the HDF5 file, either events or testpulses.

  • path_h5 (string) – An alternative full path to the hdf5 file, e.g. “data/bck_001.h5”.

  • down (int) – The downsample rate before calculating the parameters.

  • no_of (bool) – Do not use the optimum filter, fill the quantities with zeros instead.

calc_exceptional_sev(naming, channel=0, type='events', use_prediction_instead_label=False, model=None, correct_label=None, use_idx=None, pulse_height_interval=[0, 10], left_right_cutoff=None, rise_time_interval=None, decay_time_interval=None, onset_interval=None, remove_offset=True, verb=True, scale_fit_height=True, sample_length=None)[source]

Calculate an exceptional Standard Event for a Class in the HDF5 File, for only one specific channel.

Attention! Since v1.0 can as well use the regular calc_sev method for calculating SEVs of different pulse shapes. This method is therefore no longer maintained.

Parameters
  • naming (string) – Pick a name for the type of event, e.g. ‘carrier’.

  • channel (int) – The number of the channel in the hdf5 file.

  • type (string) – The group name in the HDF5 set, either “events” or “testpulses”.

  • use_prediction_instead_label (bool) – If True then instead of the labels the predictions are used.

  • model (string or None) – If set this is the name of the model whiches predictions are in the h5 file, e.g. “RF” –> look for “RF_predictions”.

  • correct_label (int or None) – Use only events with this label.

  • use_idx (list of ints or None) – If set then only these indices are used for the sev creation.

  • pulse_height_interval (list of length 2 (interval)) – The upper and lower bound for the pulse heights to include into the creation of the SEV.

  • left_right_cutoff (float) – The maximal abs value of the linear slope of events to be included in the Sev calculation. Based on the sample index as x-values.

  • rise_time_interval (lists of length 2 (interval)) – The upper and lower bound for the rise time to include into the creation of the SEV. Based on the sample index as x-values.

  • decay_time_interval (list of length 2 (interval)) – The upper and lower bound for the decay time to include into the creation of the SEV. Based on the sample index as x-values.

  • onset_interval (list of length 2 (interval)) – The upper and lower bound for the onset time to include into the creation of the SEV. Based on the sample index as x-values.

  • remove_offset (bool) – If True the offset is removed before the events are superposed for the sev calculation. Highly recommended!

  • verb (bool) – If True, some verbal feedback is output about the progress of the method.

  • scale_fit_height (bool) – If True the parametric fit to the sev is normalized to height 1 after the fit is done.

  • sample_length (float) – The length of one sample in milliseconds. If None, this is calculated from the sample frequency.

calc_mp(type='events', path_h5=None, processes=4, down=1, max_bounds=None)[source]

Calculate the Main Parameters for the Events in an HDF5 File.

This method is described in “CRESST Collaboration, First results from the CRESST-III low-mass dark matter program” (10.1103/PhysRevD.100.102002).

Parameters
  • type (string) – The group in the HDF5 set, either events or testpulses.

  • path_h5 (string or None) – An alternative full path to a hdf5 file, e.g. “data/bck_001.h5”.

  • processes (int) – The number of processes to use for the calculation.

  • down (int) – The events get downsampled by this factor for the calculation of main parameters.

  • max_bounds (tuple of two ints) – The interval of indices to which we restrict the maximum search for the pulse height.

calc_nps(use_labels=False, down=1, percentile=None, rms_cutoff=None, cut_flag=None, window=True, force_zero=True)[source]

Calculates the mean Noise Power Spectrum with option to use only the baselines that are labeled as noise (label == 3).

Parameters
  • use_labels (bool) – If True only baselines that are labeled as noise are included.

  • down (int) – A factor by that the baselines are downsampled before the calculation - must be 2^x.

  • percentile (int) – The lower percentile of the fit errors of the baselines that we include in the calculation.

  • rms_cutoff (list of nmbr_channels floats) – Only baselines with a fit rms below this values are included in the NPS calculation. This will overwrite the percentile argument, if it is not set to None.

  • cut_flag (1d bool array) – Only the noise baselines for which the value in this array is True, are used for the calculation.

  • window (bool) – If True, a window function is applied to the noise baselines before the calculation of the NPS.

  • force_zero (bool) – Force the zero coefficient (constant offset) of the NPS to zero.

calc_of(down: int = 1, name_appendix: str = '', window: bool = True, use_this_sev: Optional[list] = None)[source]

Calculate the Optimum Filer from the NPS and the SEV.

The data format and method was described in “(2018) N. Ferreiro Iachellini, Increasing the sensitivity to low mass dark matter in cresst-iii witha new daq and signal processing”, doi 10.5282/edoc.23762.

Parameters
  • down (int) – The downsample factor of the optimal filter transfer function.

  • name_appendix (string) – A string that is appended to the group name stdevent and optimumfilter.

  • window (bool) – Include a window function to the standard event before building the filter.

  • use_this_sev (list) – Here you can hand an alternativ list of standard events for all channels, in case you do not want to use one that is stored in the HDF5 set.

calc_peakdet(type='events', lag=1024, threshold=5, look_ahead=1024)[source]

Calculate the number of prominent peaks within the record window. A number > 1 points towards pile up events.

Based on https://stackoverflow.com/a/22640362/15216821.

Parameters
  • type (str) – The group name of the HDF5 set.

  • lag (int) – The lag value of the algorithm, i.e. the number of samples that are taken to calculate the moving mean and standard deviation.

  • threshold (int) –

  • look_ahead (int) – When a sample triggers, we look for even higher samples in the subsequent look_ahead number of samples.

calc_ph_correlated(type='events', dominant_channel=0, offset_to_dominant_channel=None, max_search_range=50)[source]

Calculate the correlated pulse heights of the channels.

Parameters
  • events (2D array of shape (nmbr_channels, record_length)) – The events of all channels.

  • dominant_channel (int) – Which channel is the one for the primary max search.

  • offset_to_dominant_channel (list of ints) – The expected offsets of the peaks of pulses to the pesk of the dominant channel.

  • max_search_range (int) – The number of samples that are included in the search range of the maximum search in the non-dominant channels.

>>> import cait as ai
>>> path_data = '../CRESST_DATA/run36/run36_Gode1/'
>>> fname = 'stream_bck_003'
>>> dh_stream = ai.DataHandler(channels=[9, 10, 11, ])
>>> dh_stream.set_filepath(path_h5=path_data, fname=fname, appendix=False)
>>> dh_stream.calc_ph_correlated()
calc_sev(type='events', use_labels=False, correct_label=None, use_idx=None, name_appendix='', pulse_height_interval=None, left_right_cutoff=None, rise_time_interval=None, decay_time_interval=None, onset_interval=None, remove_offset=True, baseline_model='constant', verb=True, scale_fit_height=True, scale_to_unit=None, sample_length=None, t0_start=None, opt_start=False, memsafe=True, batch_size=1000, lower_bound_tau=None, upper_bound_tau=None, pretrigger_samples=500)[source]

Calculate the Standard Event for the Events in the HDF5 File.

This method is described in “CRESST Collaboration, First results from the CRESST-III low-mass dark matter program” (10.1103/PhysRevD.100.102002).

Parameters
  • type (string) – The group name in the HDF5 set, either “events” or “testpulses”.

  • use_labels (bool) – Tf True a labels file must be included in the hdf5 file, then only the events labeled as events or testpulses are included in the calculation.

  • correct_label (int) – The label to be used for the sev generation.

  • use_idx (list of ints) – Only these indices are included for the sev generation.

  • name_appendix (string) – This gets appended to the group name stdevent in the HDF5 set.

  • pulse_height_interval (list of NMBR_CHANNELS lists of length 2 (intervals)) – The upper and lower bound for the pulse heights to include into the creation of the SEV.

  • left_right_cutoff (list of NMBR_CHANNELS floats) – The maximal abs value of the R-L baseline difference of events to be included in the SEV calculation.

  • rise_time_interval (list of NMBR_CHANNELS lists of length 2 (intervals)) – The upper and lower bound for the rise time in ms to include into the creation of the SEV.

  • decay_time_interval (list of NMBR_CHANNELS lists of length 2 (intervals)) – The upper and lower bound for the decay time in ms to include into the creation of the SEV.

  • onset_interval (list of NMBR_CHANNELS lists of length 2 (intervals)) – The upper and lower bound for the onset time in ms to include into the creation of the SEV.

  • remove_offset (bool) – Tf True the offset is removed before the events are superposed for the sev calculation. Highly recommended!

  • baseline_model (str) – Either ‘constant’, ‘linear’ or ‘exponential’. The baseline model substracted from all events.

  • verb (bool) – If True some verbal feedback is output about the progress of the method.

  • scale_fit_height (bool) – If True the parametric fit to the sev is normalized to height 1 after the fit is done.

  • scale_to_unit (bool list of length nmbr_channels or None) – If True corresponding to a channel, the standard event is scaled to 1. Default True. If False for a channel, the parametric fit is not applied but automatically set to values that produce an empty array. In this case, also the scale_fit_height is not done for this channel.

  • sample_length (float) – The length of one sample in milliseconds. If None, this is calculated from the sample frequency.

  • t0_start (2-tupel of floats) – The start values for t0 in the fit.

  • opt_start (bool) – If true, a pre-fit is applied to find optimal start values.

  • memsafe (bool) – Recommended! If activated, not all events get loaded into memory.

  • batch_size (int) – The batch size for the calculation of the SEV.

  • lower_bound_tau (float) – The lower bound for all tau values in the fit.

  • upper_bound_tau (float) – The upper bound for all tau values in the fit.

  • pretrigger_samples (int) – The number of samples from start of the record window that are considered the pre trigger region.

include_values(values: list, naming: str, channel: int, type: str = 'events', delete_old: bool = False)[source]

Include values as a data set in the HDF5 file.

Typically this is used to store values of cuts or calibrated energies.

Parameters
  • values (list of floats) – The values that we want to include in the file.

  • naming (string) – The name of the data set in the HDF5 file.

  • channel (int) – The channel number to which we want to include the cut values.

  • type (string) – The group name in the HDF5 set.

  • delete_old (bool) – If a set by this name exists already, it gets deleted first.

Deprecated since version This: method is deprecated. Use DataHandler.set() instead.