Conversion of Hardware-triggered Data

In this tutorial, we generate mock hardware triggered data to test all functionality of Cait. The generated data is in all properties similar to data from the CRESST and COSINUS data aquisitions that work with the program CSS. The only exception is that the noise and pulses are not measured, but generated with parametric descriptions of the pulse shape and normal distributed noise.

We start with importing the library.

import cait as ai

Generate Test Data

The TestData class handles the generation of data.

test_data = ai.data.TestData(filepath='test_data/mock_001', duration=1800)

First we generate an RDT file, that holds all triggered events, test pulses and noise events. Right after the generation we call a check function, that prints the content of the first event, to check if the file is properly written.

test_data._generate_rdt_file()
dh = ai.DataHandler(nmbr_channels=2)
dh.checkout_rdt(path_rdt='test_data/mock_001.rdt', read_events=1, verb=True)
Rdt file written.
DataHandler Instance created.
#############################################################
EVENT NUMBER:  0
detector number (starting at 0):  0
number of coincident pulses in digitizer module:  0
module trigger counter (starts at 0, when TRA or WRITE starts):  1
channel trigger delay relative to time stamp [µs]:  0
absolute time [s] (computer time timeval.tv_sec):  1602879726
absolute time [us] (computer time timeval.tv_us):  0
Delay of channel trigger to testpulse [us]:  0
time stamp of module trigger low word (10 MHz clock, 0 @ START WRITE ):  0
time stamp of module trigger high word (10 MHz clock, 0 @ START WRITE ):  6
number of qdc events accumulated until digitizer trigger:  0
measuring hours (0 @ START WRITE):  0.0016666667070239782
accumulated dead time of channel [s] (0 @ START WRITE):  0.0
test pulse amplitude (0. for pulses, (0.,10.] for test pulses, >10. for control pulses):  0.10000000149011612
DAC output of control program (proportional to heater power):  0.0

The CON file holds the time stamps and pulse heights of the control pulses. Also for the CON file, we call a check function after the data generation.

test_data._generate_con_file()
dh = ai.DataHandler(nmbr_channels=2)
dh.checkout_con(path_con='test_data/mock_001.con', read_events=5)
Con file written.
DataHandler Instance created.
5 control pulses read from CON file.
 	detector_nmbr,	 	pulse_height, 	time_stamp_low, 	time_stamp_high, 	dead_time, 	mus_since_last_tp
1	0		6.56		30000000		0			0.0	[0]
2	1		3.85		30000000		0			0.0	[0]
3	0		6.29		120000000		0			0.0	[0]
4	1		4.1		120000000		0			0.0	[0]
5	0		6.37		210000000		0			0.0	[0]

To every RDT file belongs a PAR file, which is a text file with additional information. The generated PAR file can be checked by opening it with a text editor or with “vim FILE_NAME” in the command line.

test_data._generate_par_file()
# test by looking at the text file!
Par file written.

We repeat the data generation for a second file, this time we call a pre-implemented method that does the steps from above all at once. Notice that we specify the gap in measuring time between the two files.

test_data.update_filepath(file_path='test_data/mock_002')
test_data.generate(start_offset=1.5 * 3600, source='hw')
Rdt file written.
Con file written.
Par file written.

Data Conversion

The Cait library accesses and stores its data in HDF5 files, which are a structured file format and convenient for high-level applications. We are aware, that saving the data twice might be inefficient in terms of storage space. However, as a solution for this we propose to keep the raw data events only so long in the HDF5 files, until all needed high level features of the raw data are calculated. We show below how this is done.

But first, we generate a HDF5 file from the events that are contained in the RDT file and the control pulses from the CON file. For this, the PAR file must be in the same directory as the RDT file.

path_data = 'test_data/'
file_names = ['mock_001',
              'mock_002']
# Conversion from Rdt to HDF5

for file in file_names:
    # --------------------------------------------------
    # Convert Rdt to H5
    # --------------------------------------------------

    dh = ai.DataHandler(channels=[0,1],
                        record_length=16384,
                        sample_frequency=25000)
    
    dh.convert_dataset(
        path_rdt=path_data,
        fname=file,
        path_h5=path_data,
        tpa_list=[0, 1, -1],
        calc_mp=False,
        calc_sev=False,
        calc_nps=False,
        lazy_loading=True,
        event_dtype='float32',
        ints_in_header=7,
        memsafe=True,
        dvm_channels=0,
        batch_size=1000,
        trace=False,
    )

    # --------------------------------------------------
    # Include con file
    # --------------------------------------------------

    dh.include_con_file(path_con_file=path_data + file + '.con')
DataHandler Instance created.
Start converting.

READ EVENTS FROM RDT FILE.
Total Records in File:  800
Getting good idx. (Depending on OS and drive reading speed, this might take some minutes!)
Event Counts Channel 0: 400
Event Counts Channel 1: 400
Getting good tpas.
Good consecutive counts: 400

WORKING ON EVENTS WITH TPA = 0.
CREATE DATASET WITH EVENTS.
WORKING ON EVENTS WITH TPA = -1.
CREATE DATASET WITH NOISE.
WORKING ON EVENTS WITH TPA > 0.
CREATE DATASET WITH TESTPULSES.
Hdf5 dataset created in  test_data/
Filepath and -name saved.
Accessing CON File...
200 Control Pulses for channel 0 in file.
CON File included.
DataHandler Instance created.
Start converting.

READ EVENTS FROM RDT FILE.
Total Records in File:  800
Getting good idx. (Depending on OS and drive reading speed, this might take some minutes!)
Event Counts Channel 0: 400
Event Counts Channel 1: 400
Getting good tpas.
Good consecutive counts: 400

WORKING ON EVENTS WITH TPA = 0.
CREATE DATASET WITH EVENTS.
WORKING ON EVENTS WITH TPA = -1.
CREATE DATASET WITH NOISE.
WORKING ON EVENTS WITH TPA > 0.
CREATE DATASET WITH TESTPULSES.
Hdf5 dataset created in  test_data/
Filepath and -name saved.
Accessing CON File...
200 Control Pulses for channel 0 in file.
CON File included.

Above we called for the first time the DataHandler class. This is a heavy class, that handles all the feature calculations of the raw data. It has stored the path to the HDF5 file as an attribute and saves all calculated properties there.

We are often in the position, that we want to process data from multiple consecutive measurements in common. For this, we can merge two converted files and specify, if we want to keep the indicidual files. For large scale data processing, were often events from more than a hundred RDT files is processed, we propose to call the conversion and merge function in a loop, while always deleting the files that are merged already.

ai.data.merge_h5_sets(path_h5_a=path_data + file_names[0] + '-P_Ch0-L_Ch1.h5', 
                      path_h5_b=path_data + file_names[1] + '-P_Ch0-L_Ch1.h5', 
                      path_h5_merged=path_data + 'test_001.h5', 
                      continue_hours=True,
                      keep_original_files=True,
                      a_name='mock_001',
                      b_name='mock_002',
                     )
Merge done.

Now we kept the original two files, but we don’t need the raw events from both anymore. So we delete them from one of the H5 files.

dh = ai.DataHandler(
    channels = [0, 1],
    record_length = 16384,
    sample_frequency = 25000,
    )
dh.set_filepath(path_h5=path_data, 
                fname=file_names[1], 
                appendix=True)
dh.drop_raw_data(type="events")
DataHandler Instance created.
Dataset Event deleted from group events.

Due to the tree structure of HDF5 files, the dropped/deleted data is still stored in the file, but not anymore accessible. To really delete the data and free the memore (i.e. reduce the size of the HDF5 file), we need to call again the ´h5repack´ method of the HDF5 Tools, see https://support.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Repack.

For this we repack the dataset with a different filename, and then change the filename back to the original one.

! ls test_data/test_*
! h5repack test_data/test_001.h5 test_data/test_001_copy.h5
! ls test_data/test_*
! rm test_data/test_001.h5
! ls test_data/test_*
! mv test_data/test_001_copy.h5 test_data/test_001.h5
! ls test_data/test_*
test_data/test_001.h5
test_data/test_001.h5      test_data/test_001_copy.h5
test_data/test_001_copy.h5
test_data/test_001.h5

When displaying the content of the file, we see that there is no “event” data set in the “events” group.

dh.content()
The following properties are in the HDF5 sets can be accessed through the get(group, dataset) methode.
The following data sets are contained in the the group controlpulses:
dataset: hours, shape: (200,)
dataset: pulse_height, shape: (2, 200)
The following data sets are contained in the the group events:
dataset: hours, shape: (80,)
dataset: time_mus, shape: (80,)
dataset: time_s, shape: (80,)
The following data sets are contained in the the group noise:
dataset: event, shape: (2, 80, 16384)
dataset: hours, shape: (80,)
dataset: time_mus, shape: (80,)
dataset: time_s, shape: (80,)
The following data sets are contained in the the group testpulses:
dataset: event, shape: (2, 240, 16384)
dataset: hours, shape: (240,)
dataset: testpulseamplitude, shape: (240,)
dataset: time_mus, shape: (240,)
dataset: time_s, shape: (240,)

If we need them again at a later point, we can again include them.

dh.include_rdt(
    path_data=path_data, 
    fname=file_names[1], 
    ints_in_header=7,
    tpa_list=[0, 1, -1],
    event_dtype='float32',
    lazy_loading=True,
    origin=None,
    )
Accessing RDT File ...
Total Records in File:  800
Event Counts:  400
Adding 80 triggered Events.
Adding 80 Noise Events.
Adding 240 Testpulse Events.
Done.

The same works for data sets that are merged from multiple rdt files. For this we need the origin data set.

path_data = 'test_data/'
fname = 'test_001'
channels_rdt = [0,1]
dh = ai.DataHandler(channels=channels_rdt)
dh.set_filepath(path_h5=path_data,
                fname=fname,
                appendix=False)
DataHandler Instance created.
dh.drop_raw_data()
Dataset Event deleted from group events.
for file in file_names:
    dh.include_rdt(path_data=path_data, 
                   fname=file, 
                   channels=[0, 1], 
                   origin=file)
Accessing RDT File ...
Total Records in File:  800
Event Counts:  400
Adding 80 triggered Events.
Adding 80 Noise Events.
Adding 240 Testpulse Events.
Done.
Accessing RDT File ...
Total Records in File:  800
Event Counts:  400
Adding 80 triggered Events.
Adding 80 Noise Events.
Adding 240 Testpulse Events.
Done.
dh.content()
The following properties are in the HDF5 sets can be accessed through the get(group, dataset) methode.
The following data sets are contained in the the group controlpulses:
dataset: hours, shape: (400,)
dataset: pulse_height, shape: (2, 400)
The following data sets are contained in the the group events:
dataset: event, shape: (2, 160, 16384)
dataset: hours, shape: (160,)
dataset: origin, shape: (160,)
dataset: time_mus, shape: (160,)
dataset: time_s, shape: (160,)
The following data sets are contained in the the group noise:
dataset: event, shape: (2, 160, 16384)
dataset: hours, shape: (160,)
dataset: origin, shape: (160,)
dataset: time_mus, shape: (160,)
dataset: time_s, shape: (160,)
The following data sets are contained in the the group testpulses:
dataset: event, shape: (2, 480, 16384)
dataset: hours, shape: (480,)
dataset: origin, shape: (480,)
dataset: testpulseamplitude, shape: (480,)
dataset: time_mus, shape: (480,)
dataset: time_s, shape: (480,)

Please forward questions and correspondence about this notebook to felix.wagner(at)oeaw.ac.at.