FeaturesMixin

class cait.mixins.FeaturesMixin[source]

Bases: object

A Mixin Class to the DataHandler Class with methods for the calculation of features of the data.

apply_logical_cut(cut_flag: list, naming: str, channel: int, type: str = 'events', delete_old: bool = False)[source]

Save the cut flag of a logical cut within the HDF5 file.

Parameters:
  • cut_flag (list of bools) – The cut flag that we want to save.

  • naming (string) – The naming of the dataset to save.

  • channel (int) – The channel for that the cut flag is meant.

  • type (string) – The naming of the group in the HDF5 file, in that we want to save the cut flag, e.g. ‘events’.

  • delete_old (bool) – If true, the old dataset of this name in the group ‘type’ gets deleted.

apply_of(type: str = 'events', name_appendix_group: str = '', name_appendix_set: str = '', chunk_size: int = 100, hard_restrict: bool = False, down: int = 1, window: bool = True, first_channel_dominant: bool = False, baseline_model: str = 'constant', pretrigger_samples: int = 500, onset_to_dominant_channel: List[int] = None, flexibility: int = 1, calc_rms: bool = False, processes: int = -1)[source]

Calculates the height of events or testpulses after applying the optimum filter.

Parameters:
  • type (string) – The group name in the HDF5 set, either events of testpulses.

  • name_appendix_group (string) – A string that is appended to the stdevent group in the HDF5 set.

  • name_appendix_set (string) – A string that is appended to the of_ph set in the HDF5 set.

  • chunk_size (int) – The size how many events are processes simultaneously to avoid memory error.

  • hard_restrict (bool) – If True, the maximum search is restricted to 20-30% of the record window.

  • down (int) – The events get downsampled with this factor before application of the filter.

  • window (bool) – If true, a window function is applied to the record window, before filtering. This is recommended, to avoid artifacts from left-right offset differences of the baseline.

  • first_channel_dominant (bool) – Take the maximum position from the first channel and evaluate the others at the same position.

  • baseline_model (str) – Either ‘constant’, ‘linear’ or ‘exponential’. The baseline model substracted from all events.

  • pretrigger_samples (int) – The number of samples from start of the record window that are considered the pre trigger region.

  • onset_to_dominant_channel (list of ints) – The difference in the onset value to the dominant channel. If e.g. the second channel has a typical max_pos value of 4000, but the first of 4100, then the onset for this would be -100.

  • flexibility (int) – In case a peak position is provided, the maximum search can still deviate by this amount of samples.

  • calc_rms (bool) – If true, calculated also the rms of the filtered pulses.

  • processes (int) – The number of processes to use for the calculation. If -1, all available resources are used.

Deprecated since version 1.3.0: This will be removed in 2.0.0. Use dh.apply_ofilter instead.

apply_ofilter(group: str, of: ndarray, sev: ndarray, *, only_channels: int | List[int] = None, with_processing: Callable | List[Callable] = None, on_stream: bool = False, tag: str = '', batch_size: int = 100, preview: bool = False, **kwargs)[source]

Calculate optimum filter pulse heights.

See below for some common pitfalls and refer to the documentation of cait.versatile.OFPulseHeight for more details.

Parameters:
  • group (str) – The DataHandler group with the events to which we want to apply the filter. The results will be saved to this group, too (see cait.versatile.OFPulseHeight for a description of the outputs).

  • of (np.ndarray) – The optimum filter(s) to use. One for each channel, i.e. has shape (N, M//2+1) where N is the number of channels and M is the record length. Using 2D filters is no exception here: The filters have to be supplied as shape (N, M//2+1) arrays.

  • sev (np.ndarray) – The standard events corresponding to the filters. They are used as reference to calculate the filter (peak) RMS. One standard event per entry in of is needed. I.e. has to be of shape (N, M).

  • only_channels (Union[int, List[int]], optional) – If you only want to filter some of the channels in ‘group’, you can specify the channel index/indices here. Note that the sizes of sev and of have to match the number of channels to be filtered, i.e. if you filter two channels, the of also has to have two channels. Defaults to None, i.e. filtering all channels in ‘group’. Note that if you select a subset of channels here, the arguments filter_groups, max_search and relative_to that you can pass as keyword arguments to cait.versatile.OFPulseHeight (see below) refer to the channel indices of the remaining channels.

  • with_processing (Union[Callable, List[Callable]], optional) – Optional processing to be applied to the event traces before calculating the optimum filter heights. See cait.versatile.iterators.IteratorBaseClass.with_processing() for details. A processing worth mentioning is cait.versatile.TukeyWindow.

  • on_stream (bool, optional) – If True, the event traces are extended on the original stream, adding samples outside the record window. This allows for filtering without edge effects. Of course, this only works if the reference to the stream is saved in the DataHandler, which happens automatically if you use cait.mixins.TriggerCollectionMixin.trigger_of() or cait.mixins.TriggerCollectionMixin.trigger_zscore() for triggering. Adding the reference manually after triggering is painful and is not recommended. If you nevertheless want to perform such a calculation, you can use cait.versatile.OFPulseHeight() and cait.versatile.iterators.StreamIterator.with_extended_window to achieve the same. Defaults to False.

  • tag (str, optional) – A string that is appended to the datasets when they are saved to the DataHandler (e.g. if you want to apply different filters). This string is appended with a hyphen, i.e. for tag="wafer" this would result in datasets like of_ph-wafer. Defaults to an empty string, i.e. no tag.

  • batch_size (int, optional) – The number of events to process at once.

  • preview (bool, optional) – If True, an interactive preview illustrating the filter evaluation using the current input arguments on the event traces opens up. Defaults to False

  • kwargs (any, optional) – Additional keyword arguments passed to cait.versatile.OFPulseHeight. Notable arguments are max_search (to specify where to search for maxima), relative_to (to specify relative to which channels the filtered traces should be evaluated), peak_rms_width (number of samples for peak RMS calculation) and filter_groups (used to specify which channels should be treated together using a 2D filter). Refer to its documentation page for more details.

Note

If you have two channels and you want to search the maximum of the first channel around 1/4th of the record window and evaluate the second channel k samples before the maximum found in the first channel, you have to pass the arguments max_search=[(0.2, 0.4), -k] and relative_to=[None, 0].

Note

To apply a 2d optimum filter to channels 0 and 1 and search for the maximum in channel 2 at most k samples away from the 2d filter maximum of the first two channels, pass arguments filter_groups=[(0, 1), 2], max_search=[(0.2, 0.4), (-k, k)], and relative_to=[None, 0]. Notice here that the elements of relative_to refer to the entries in filter_groups.

Note

If you see weird edge effects in your filtered traces, it might be a good idea to apply the filter on an extended window (on the initial stream) using on_stream=True. This only works if a reference to the initial stream is still available in the DataHandler. If it is not, you can also try adding with_processing=[vai.RemoveBaseline(), vai.TukeyWindow()] which should reduce edge effects (especially for events which do not fully decay within the record window).

Warning

If you start using only_channels, relative_to, max_search, and filter_groups together, you have to be careful with what the indices are referring to: Once you specify only_channels, the entries of the remaining arguments refer to the indices of the remaining channels. I.e. if only_channels=[0, 2], the first entries in relative_to and max_search refer to channel 0, wherease the second entries refer to (the old) channel 2. Likewise, if you already chose only_channels=[0, 2], a valid 2d filter group would be filter_groups=[(0, 1)] (not [(0, 2)]!).

Warning

The output arrays always have as many channels as the events in the specified group. If you select a subset of channels using only_channels, the remaining channels will be filled with -404, indicating missing values. However, if you also set filter_groups (in order to apply a 2D filter), you inherently lose the channel-index correspondence. To keep things consistent, the output values of any 2D filters are duplicated into the channels that were filtered together. E.g. if you choose filter_groups=[(0, 1), 2], the first and second row of the output arrays will have identical data (results of the 2D filtering), while the third row will have the data from channel 2.

Example:

import cait as ai
import cait.versatile as vai

record_length = 2**14

# Generate mock data (two cannels)
md = vai.MockData(record_length=record_length)
it = md.get_event_iterator()
sev, of = md.sev, md.of

# Generate mock stream (two channels)
s = vai.MockStream(rate_Hz=1, seed=137)

# Initialize DataHandler
dh = ai.DataHandler(record_length=record_length,
                    nmbr_channels=2,
                    sample_frequency=s.sample_frequency)
dh.set_filepath(path_h5="",
                fname="apply_ofilter_test",
                appendix=False)
dh.init_empty()

# Trigger to get traces in DataHandler
dh.trigger_zscore(s, trigger_channels=["Ch0"], passive_channels=["Ch1"])

# Calculate filter pulse heights (evaluate channel 1 relative
# to channel 0). For more examples, see vai.OFPulseHeight.
dh.apply_ofilter(
    "events",
    of=of,
    sev=sev,
    max_search=[(0.2, 0.4), (-1000, 0)],
    relative_to=[None, 0],
    # preview=True, # set to True to get a preview before commiting
)
calc_additional_mp(type: str = 'events', path_h5: str = None, down: int = 1, no_of: bool = False, processes: int = -1)[source]

Calculate the additional Main Parameters for the Events in an HDF5 File.

Parameters:
  • type (string) – The group name within the HDF5 file, either events or testpulses.

  • path_h5 (string) – An alternative full path to the hdf5 file, e.g. “data/bck_001.h5”.

  • down (int) – The downsample rate before calculating the parameters.

  • no_of (bool) – Do not use the optimum filter, fill the quantities with zeros instead.

  • processes (int) – The number of processes to use for the calculation. If -1, all available resources are used.

Deprecated since version 1.3.0: This will be removed in 2.0.0. Use DataHandler.cmp() instead.

calc_exceptional_sev(naming, channel=0, type='events', use_prediction_instead_label=False, model=None, correct_label=None, use_idx=None, pulse_height_interval=[0, 10], left_right_cutoff=None, rise_time_interval=None, decay_time_interval=None, onset_interval=None, remove_offset=True, verb=True, scale_fit_height=True, sample_length=None)[source]

Calculate an exceptional Standard Event for a Class in the HDF5 File, for only one specific channel.

Attention! Since v1.0 can as well use the regular calc_sev method for calculating SEVs of different pulse shapes. This method is therefore no longer maintained.

Parameters:
  • naming (string) – Pick a name for the type of event, e.g. ‘carrier’.

  • channel (int) – The number of the channel in the hdf5 file.

  • type (string) – The group name in the HDF5 set, either “events” or “testpulses”.

  • use_prediction_instead_label (bool) – If True then instead of the labels the predictions are used.

  • model (string or None) – If set this is the name of the model whiches predictions are in the h5 file, e.g. “RF” –> look for “RF_predictions”.

  • correct_label (int or None) – Use only events with this label.

  • use_idx (list of ints or None) – If set then only these indices are used for the sev creation.

  • pulse_height_interval (list of length 2 (interval)) – The upper and lower bound for the pulse heights to include into the creation of the SEV.

  • left_right_cutoff (float) – The maximal abs value of the linear slope of events to be included in the Sev calculation. Based on the sample index as x-values.

  • rise_time_interval (lists of length 2 (interval)) – The upper and lower bound for the rise time to include into the creation of the SEV. Based on the sample index as x-values.

  • decay_time_interval (list of length 2 (interval)) – The upper and lower bound for the decay time to include into the creation of the SEV. Based on the sample index as x-values.

  • onset_interval (list of length 2 (interval)) – The upper and lower bound for the onset time to include into the creation of the SEV. Based on the sample index as x-values.

  • remove_offset (bool) – If True the offset is removed before the events are superposed for the sev calculation. Highly recommended!

  • verb (bool) – If True, some verbal feedback is output about the progress of the method.

  • scale_fit_height (bool) – If True the parametric fit to the sev is normalized to height 1 after the fit is done.

  • sample_length (float) – The length of one sample in milliseconds. If None, this is calculated from the sample frequency.

calc_mp(type: str = 'events', path_h5: str = None, processes: int = -1, down: int = 1, max_bounds: Tuple[int] = None)[source]

Calculate the Main Parameters for the Events in an HDF5 File.

This method is described in “CRESST Collaboration, First results from the CRESST-III low-mass dark matter program” (10.1103/PhysRevD.100.102002).

Parameters:
  • type (string) – The group in the HDF5 set, either events or testpulses.

  • path_h5 (string or None) – An alternative full path to a hdf5 file, e.g. “data/bck_001.h5”.

  • processes (int) – The number of processes to use for the calculation. If -1, all available resources are used.

  • down (int) – The events get downsampled by this factor for the calculation of main parameters.

  • max_bounds (tuple of two ints) – The interval of indices to which we restrict the maximum search for the pulse height.

Deprecated since version 1.3.0: This will be removed in 2.0.0. Use DataHandler.cmp() instead.

calc_nps(use_labels=False, down=1, percentile=None, rms_cutoff=None, cut_flag=None, window=True, force_zero=True)[source]

Calculates the mean Noise Power Spectrum with option to use only the baselines that are labeled as noise (label == 3).

Parameters:
  • use_labels (bool) – If True only baselines that are labeled as noise are included.

  • down (int) – A factor by that the baselines are downsampled before the calculation - must be 2^x.

  • percentile (int) – The lower percentile of the fit errors of the baselines that we include in the calculation.

  • rms_cutoff (list of nmbr_channels floats) – Only baselines with a fit rms below this values are included in the NPS calculation. This will overwrite the percentile argument, if it is not set to None.

  • cut_flag (1d bool array) – Only the noise baselines for which the value in this array is True, are used for the calculation.

  • window (bool) – If True, a window function is applied to the noise baselines before the calculation of the NPS.

  • force_zero (bool) – Force the zero coefficient (constant offset) of the NPS to zero.

calc_of(down: int = 1, name_appendix: str = '', window: bool = True, use_this_sev: list = None)[source]

Calculate the Optimum Filer from the NPS and the SEV.

The data format and method was described in “(2018) N. Ferreiro Iachellini, Increasing the sensitivity to low mass dark matter in cresst-iii witha new daq and signal processing”, doi 10.5282/edoc.23762.

Parameters:
  • down (int) – The downsample factor of the optimal filter transfer function.

  • name_appendix (string) – A string that is appended to the group name stdevent and optimumfilter.

  • window (bool) – Include a window function to the standard event before building the filter.

  • use_this_sev (list) – Here you can hand an alternativ list of standard events for all channels, in case you do not want to use one that is stored in the HDF5 set.

calc_peakdet(type='events', lag=1024, threshold=5, look_ahead=1024)[source]

Calculate the number of prominent peaks within the record window. A number > 1 points towards pile up events.

Based on https://stackoverflow.com/a/22640362/15216821.

Parameters:
  • type (str) – The group name of the HDF5 set.

  • lag (int) – The lag value of the algorithm, i.e. the number of samples that are taken to calculate the moving mean and standard deviation.

  • threshold (int)

  • look_ahead (int) – When a sample triggers, we look for even higher samples in the subsequent look_ahead number of samples.

calc_ph_correlated(type='events', dominant_channel=0, offset_to_dominant_channel=None, max_search_range=50)[source]

Calculate the correlated pulse heights of the channels.

Parameters:
  • events (2D array of shape (nmbr_channels, record_length)) – The events of all channels.

  • dominant_channel (int) – Which channel is the one for the primary max search.

  • offset_to_dominant_channel (list of ints) – The expected offsets of the peaks of pulses to the pesk of the dominant channel.

  • max_search_range (int) – The number of samples that are included in the search range of the maximum search in the non-dominant channels.

import cait as ai

path_data = '../CRESST_DATA/run36/run36_Gode1/'
fname = 'stream_bck_003'

dh_stream = ai.DataHandler(channels=[9, 10, 11, ])
dh_stream.set_filepath(path_h5=path_data, fname=fname, appendix=False)

dh_stream.calc_ph_correlated()
calc_sev(type='events', use_labels=False, correct_label=None, use_idx=None, name_appendix='', pulse_height_interval=None, left_right_cutoff=None, rise_time_interval=None, decay_time_interval=None, onset_interval=None, remove_offset=True, baseline_model='constant', verb=True, scale_fit_height=True, scale_to_unit=None, sample_length=None, t0_start=None, opt_start=False, memsafe=True, batch_size=1000, lower_bound_tau=None, upper_bound_tau=None, pretrigger_samples=500)[source]

Calculate the Standard Event for the Events in the HDF5 File.

This method is described in “CRESST Collaboration, First results from the CRESST-III low-mass dark matter program” (10.1103/PhysRevD.100.102002).

Parameters:
  • type (string) – The group name in the HDF5 set, either “events” or “testpulses”.

  • use_labels (bool) – Tf True a labels file must be included in the hdf5 file, then only the events labeled as events or testpulses are included in the calculation.

  • correct_label (int) – The label to be used for the sev generation.

  • use_idx (list of ints) – Only these indices are included for the sev generation.

  • name_appendix (string) – This gets appended to the group name stdevent in the HDF5 set.

  • pulse_height_interval (list of NMBR_CHANNELS lists of length 2 (intervals)) – The upper and lower bound for the pulse heights to include into the creation of the SEV.

  • left_right_cutoff (list of NMBR_CHANNELS floats) – The maximal abs value of the R-L baseline difference of events to be included in the SEV calculation.

  • rise_time_interval (list of NMBR_CHANNELS lists of length 2 (intervals)) – The upper and lower bound for the rise time in ms to include into the creation of the SEV.

  • decay_time_interval (list of NMBR_CHANNELS lists of length 2 (intervals)) – The upper and lower bound for the decay time in ms to include into the creation of the SEV.

  • onset_interval (list of NMBR_CHANNELS lists of length 2 (intervals)) – The upper and lower bound for the onset time in ms to include into the creation of the SEV.

  • remove_offset (bool) – Tf True the offset is removed before the events are superposed for the sev calculation. Highly recommended!

  • baseline_model (str) – Either ‘constant’, ‘linear’ or ‘exponential’. The baseline model substracted from all events.

  • verb (bool) – If True some verbal feedback is output about the progress of the method.

  • scale_fit_height (bool) – If True the parametric fit to the sev is normalized to height 1 after the fit is done.

  • scale_to_unit (bool list of length nmbr_channels or None) – If True corresponding to a channel, the standard event is scaled to 1. Default True. If False for a channel, the parametric fit is not applied but automatically set to values that produce an empty array. In this case, also the scale_fit_height is not done for this channel.

  • sample_length (float) – The length of one sample in milliseconds. If None, this is calculated from the sample frequency.

  • t0_start (2-tupel of floats) – The start values for t0 in the fit.

  • opt_start (bool) – If true, a pre-fit is applied to find optimal start values.

  • memsafe (bool) – Recommended! If activated, not all events get loaded into memory.

  • batch_size (int) – The batch size for the calculation of the SEV.

  • lower_bound_tau (float) – The lower bound for all tau values in the fit.

  • upper_bound_tau (float) – The upper bound for all tau values in the fit.

  • pretrigger_samples (int) – The number of samples from start of the record window that are considered the pre trigger region.

include_values(values: list, naming: str, channel: int, type: str = 'events', delete_old: bool = False)[source]

Include values as a data set in the HDF5 file.

Typically this is used to store values of cuts or calibrated energies.

Parameters:
  • values (list of floats) – The values that we want to include in the file.

  • naming (string) – The name of the data set in the HDF5 file.

  • channel (int) – The channel number to which we want to include the cut values.

  • type (string) – The group name in the HDF5 set.

  • delete_old (bool) – If a set by this name exists already, it gets deleted first.

Deprecated since version 1.2.0: This will be removed in 2.0.0. Use DataHandler.set() instead.