DataHandler

class cait.DataHandler(record_length: int = 16384, sample_frequency: int = 25000, channels: list = None, nmbr_channels: int = None, run: str = None, module: str = None)[source]

Bases: cait.mixins._data_handler_simulate.SimulateMixin, cait.mixins._data_handler_rdt.RdtMixin, cait.mixins._data_handler_plot.PlotMixin, cait.mixins._data_handler_features.FeaturesMixin, cait.mixins._data_handler_analysis.AnalysisMixin, cait.mixins._data_handler_fit.FitMixin, cait.mixins._data_handler_csmpl.CsmplMixin, cait.mixins._data_handler_ml.MachineLearningMixin

A class for the processing of raw data events.

The DataHandler class is one of the core parts of the cait Package. An instance of the class is bound to a HDF5 file and stores all data from the recorded binary files (*.rdt, …), as well as the calculated features (main parameters, standard events, …) in the file.

Parameters
  • record_length (int) – The number of samples in one record window. To ensure performance of all features, this should be a power of 2.

  • sample_frequency (int) – The sampling frequency of the recording.

  • channels (list of integers or None) – The channels in the *.rdt file that belong to the detector module. Attention - the channel number written in the *.par file starts counting from 1, while Cait, CCS and other common software frameworks start counting from 0.

  • nmbr_channels – The total number of channels.

  • run (string or None) – The number of the measurement run. This is a optional argument, to identify a measurement with a given module uniquely. Providing this argument has no effect, but might be useful in case you start multiple DataHandlers at once, to stay organized.

  • module (string or None) – The naming of the detector module. Optional argument, for unique identification of the physics data. Providing this argument has no effect, but might be useful in case you start multiple DataHandlers at once, to stay organized.

Example for the generation of an HDF5 for events from an test *.rdt file:

>>> import cait as ai
>>> test_data = ai.data.TestData(filepath='test_001')
>>> test_data.generate(source='hw')
Rdt file written.
Con file written.
Par file written.
>>> dh = ai.DataHandler()
DataHandler instance created.
>>> dh.convert_dataset(path_rdt='./', fname='test_001', path_h5='./')
Start converting.
READ EVENTS FROM RDT FILE.
Total Records in File:  40
Event Counts:  19
WORKING ON EVENTS WITH TPA = 0.
CREATE DATASET WITH EVENTS.
CALCULATE MAIN PARAMETERS.
WORKING ON EVENTS WITH TPA = -1.
CREATE DATASET WITH NOISE.
WORKING ON EVENTS WITH TPA > 0.
CREATE DATASET WITH TESTPULSES.
CALCULATE MP.
Hdf5 dataset created in  ./
Filepath and -name saved.

Most of the methods are included via parent mixin classes (see folder cait/mixins).

content()[source]

Print the whole content of the HDF5 and all derived properties.

downsample_raw_data(type: str = 'events', down: int = 16, dtype: str = 'float32', name_appendix: str = '', delete_old: bool = True)[source]

Downsample the dataset “event” from a specified group in the HDF5 file.

For large scale analysis and limited server space, the covnerted HDF5 datasets exceed storage space capacities. For this scenario, the raw data events can be downsampled of a given factor. Downsampling to sample frequencies below 1kHz is in many situations sufficient for viewing events and most features calculations.

Parameters
  • type (string) – The group in the HDF5 set from which the events are downsampled, typically ‘events’, ‘testpulses’ or noise.

  • down (int) – The factor by which the data is downsampled. This should be a factor of 2.

  • dtype (string) – The data type of the new stored events, typically you want this to be float32.

  • name_appendix (string) – An appendix to the dataset event in order to keep the old and the new events.

  • delete_old (bool) – If true, the old events are deleted. Deactivate only, if an unique name_appendix is choosen.

>>> dh.downsample_raw_data()
Old Dataset Event deleted from group events.
New Dataset Event with downsample rate 16 created in group events.
drop(group: str, dataset: str = None)[source]

Delete a dataset from a specified group in the HDF5 file.

Parameters
  • group (string) – The name of the group in the HDF5 file.

  • dataset (string) – The name of the dataset in the HDF5 file. If None, the would group is deleted.

drop_raw_data(type: str = 'events')[source]

Delete the dataset “event” from a specified group in the HDF5 file.

For large scale analysis and limited server space, the covnerted HDF5 datasets exceed storage space capacities. For this scenario, the raw data events can be deleted after the calculation of all useful features. At a later point, the events can be included again if needed.

Parameters

type (string) – The group in the HDF5 set from which the events are deleted, typically ‘events’, ‘testpulses’ or noise.

>>> dh.drop_raw_data()
Dataset Event deleted from group events.
generate_startstop()[source]

Generate a startstop data set in the metainfo group from the testpulses time stamps.

get(group: str, dataset: str)[source]

Get a dataset from the HDF5 file with save closing of the file stream.

Parameters
  • group (string) – The name of the group in the HDF5 set.

  • dataset (string) – The name of the dataset in the HDF5 set. There are special key word for calculated properties from the main parameters, namely ‘pulse_height’, ‘onset’, ‘rise_time’, ‘decay_time’, ‘slope’. These are consistent with used in the cut when generating a standard event.

Returns

The dataset from the HDF5 file

Return type

numpy array

get_filehandle(path=None)[source]

Get the opened filestream to the HDF5 file.

This is usually needed for individual feature calculations, plots or cuts, that are not ready-to-play implemented.

Parameters

path (string or None) – Provide an alternative full path to the HDF5 file of that we want to open the file stream.

Returns

The opened file stream. Please look into the h5py Python library for details about the file stream.

Return type

h5py file stream

>>> with dh.get_filehandle() as f:
...     f.keys()
...
<KeysViewHDF5 ['events', 'noise', 'testpulses']>
import_labels(path_labels: str, type: str = 'events', path_h5=None)[source]

Include the *.csv file with the labels into the HDF5 File.

Parameters
  • path_labels (string) – Path to the folder that contains the csv file. E.g. “data/” looks for labels in “data/labels_bck_0XX_<type>”.

  • type (string) – The group name in the HDF5 file of the events, typically “events” or “testpulses”.

  • path_h5 (string or None) – Provide an alternative full path to the HDF5 file to include the labels, e.g. “data/hdf5s/bck_001[…].h5”.

>>> ei = ai.EventInterface()
Event Interface Instance created.
>>> ei.load_h5(path='./',fname='test_001',channels=[0,1])
Nmbr triggered events:  4
Nmbr testpulses:  11
Nmbr noise:  4
Bck File loaded.
>>> ei.create_labels_csv(path='./')
>>> dh.import_labels(path_labels='./')
Added Labels.
import_predictions(model: str, path_predictions: str, type: str = 'events', only_channel: int = None, path_h5: str = None)[source]

Include the *.csv file with the predictions from a machine learning model into the HDF5 File.

Parameters
  • model (string) – The naming for the type of model, e.g. Random Forest –> “RF”.

  • path_predictions (string) – Path to the folder that contains the csv file. E.g. “data/” –> look for predictions in “data/<model>_predictions_<self.fname>_<type>”. If the argument only_channel is not None, then additionally “_Ch<only_channel>” is append to the looked for file.

  • type (string) – The name of the group in the HDF5 file, typically “events” or “testpulses”.

  • only_channel (int or None) – If the labels are only for a specific channel then define here for which channel.

  • path_h5 (string or None) – Provide an alternative (full) path to the HDF5 file, e.g. “data/hdf5s/bck_001[…].h5”.

>>> dh.import_predictions(model='RF', path_predictions='./')
Added RF Predictions.
keys(group: str = None)[source]

Print the keys of the HDF5 file or a group within it.

Parameters

group (string or None) – The name of a group in the HDF5 file of that we print the keys.

set_filepath(path_h5: str, fname: str, appendix: bool = True, channels: list = None)[source]

Set the path to the *.h5 file for further processing.

This function is usually called right after the initialization of a new object. If the intance has already done the conversion from *.rdt to *.h5, the path is already set automatically and the call is obsolete.

Parameters
  • path_h5 (string) – The path to the directory that contains the H5 file, e.g. “data/” –> file name “data/bck_001-P_Ch01-L_Ch02.csv”.

  • fname (string) – The name of the H5 file.

  • appendix (bool) – If true, an appendix like “-P_ChX-[…]” is automatically appended to the path_h5 string.

  • channels (list of integers or None) – The channels in the *.rdt file that belong to the detector module. Attention - the channel number written in the *.par file starts counting from 1, while Cait, CCS and other common software frameworks start counting from 0.

>>> dh.set_filepath(path_h5='./', fname='test_001')
truncate_raw_data(type: str, truncated_idx_low: int, truncated_idx_up: int, dtype: str = 'float32', name_appendix: str = '', delete_old: bool = True)[source]

Truncate the record window of the dataset “event” from a specified group in the HDF5 file.

For measurements with high event rate (above ground, …) a long record window might be counter productive, due to more pile uped events in the window. For this reason, you can truncate the length of the record window with this function.

Parameters
  • type (string) – The group in the HDF5 set from which the events are downsampled, typically ‘events’, ‘testpulses’ or noise.

  • truncated_idx_low (int) – The lower index within the old record window, that becomes the first index in the truncated record window.

  • truncated_idx_up (int) – The upper index winthin the old record window, that becomes the last index in the truncated record window.

  • dtype (string) – The data type of the new stored events, typically you want this to be float32.

  • name_appendix (string) – An appendix to the dataset event in order to keep the old and the new events.

  • delete_old (bool) – If true, the old events are deleted. Deactivate only, if an unique name_appendix is choosen.