Conversion of Hardware-triggered Data

Warning

Note that this tutorial describes features and workflows that have since been outdated. We are working on updated tutorials. Coming soon!

In this tutorial, we generate mock hardware triggered data to test all functionality of Cait. The generated data is in all properties similar to data from the CRESST and COSINUS data aquisitions that work with the program CSS. The only exception is that the noise and pulses are not measured, but generated with parametric descriptions of the pulse shape and normal distributed noise.

We start with importing the library.

import cait as ai

Generate Test Data

(You can skip this step if you already have hardware triggered data from your experiment for example.)

The TestData class handles the generation of data.

test_data = ai.data.TestData(filepath='test_data/mock_001', duration=1800)

First we generate an RDT file, that holds all triggered events, test pulses and noise events. Right after the generation we call a check function, that prints the content of the first event, to check if the file is properly written.

test_data._generate_rdt_file()
dh = ai.DataHandler(nmbr_channels=2)
dh.checkout_rdt(path_rdt='test_data/mock_001.rdt', read_events=1, verb=True)

Rdt file written.
DataHandler Instance created.
#############################################################
EVENT NUMBER:  0
detector number (starting at 0):  0
number of coincident pulses in digitizer module:  0
module trigger counter (starts at 0, when TRA or WRITE starts):  1
channel trigger delay relative to time stamp [µs]:  0
absolute time [s] (computer time timeval.tv_sec):  1602879726
absolute time [us] (computer time timeval.tv_us):  0
Delay of channel trigger to testpulse [us]:  0
time stamp of module trigger low word (10 MHz clock, 0 @ START WRITE ):  0
time stamp of module trigger high word (10 MHz clock, 0 @ START WRITE ):  6
number of qdc events accumulated until digitizer trigger:  0
measuring hours (0 @ START WRITE):  0.0016666667070239782
accumulated dead time of channel [s] (0 @ START WRITE):  0.0
test pulse amplitude (0. for pulses, (0.,10.] for test pulses, >10. for control pulses):  0.10000000149011612
DAC output of control program (proportional to heater power):  0.0

The CON file holds the time stamps and pulse heights of the control pulses. Also for the CON file, we call a check function after the data generation.

test_data._generate_con_file()
dh = ai.DataHandler(nmbr_channels=2)
dh.checkout_con(path_con='test_data/mock_001.con', read_events=5)

Con file written.
DataHandler Instance created.
5 control pulses read from CON file.
 	detector_nmbr,	 	pulse_height, 	time_stamp_low, 	time_stamp_high, 	dead_time, 	mus_since_last_tp
1	0		6.27		30000000		0			0.0	[0]
2	1		4.06		30000000		0			0.0	[0]
3	0		6.21		120000000		0			0.0	[0]
4	1		3.96		120000000		0			0.0	[0]
5	0		5.71		210000000		0			0.0	[0]

To every RDT file belongs a PAR file, which is a text file with additional information. The generated PAR file can be checked by opening it with a text editor or with “vim FILE_NAME” in the command line.

test_data._generate_par_file()
# test by looking at the text file!

Par file written.

We repeat the data generation for a second file, this time we call a pre-implemented method that does the steps from above all at once. Notice that we specify the gap in measuring time between the two files.

test_data.update_filepath(file_path='test_data/mock_002')
test_data.generate(start_offset=1.5 * 3600, source='hw')

Rdt file written.
Con file written.
Par file written.

Data Conversion

The Cait library accesses and stores its data in HDF5 files, which are a structured file format and convenient for high-level applications. We are aware, that saving the data twice might be inefficient in terms of storage space. However, as a solution for this we propose to keep the raw data events only so long in the HDF5 files, until all needed high level features of the raw data are calculated. We show below how this is done.

But first, we generate a HDF5 file from the events that are contained in the RDT file and the control pulses from the CON file. For this, the PAR file must be in the same directory as the RDT file.

path_data = 'test_data/'
file_names = ['mock_001',
              'mock_002']

# Conversion from Rdt to HDF5
for file in file_names:
    dh = ai.DataHandler(channels=[0,1],
                        record_length=16384,
                        sample_frequency=25000)
    
    dh.convert_dataset(
        path_rdt=path_data,
        fname=file,
        path_h5=path_data,
        tpa_list=[0, 1, -1],
        calc_mp=False,
        calc_sev=False,
        calc_nps=False,
        lazy_loading=True,
        event_dtype='float32',
        ints_in_header=7,
        memsafe=True,
        dvm_channels=0,
        batch_size=1000,
        trace=False,
    )

    dh.include_con_file(path_con_file=path_data + file + '.con')

DataHandler Instance created.
Start converting.

READ EVENTS FROM RDT FILE.
Total Records in File:  800
Getting good idx. (Depending on OS and drive reading speed, this might take some minutes!)

Event Counts Channel 0: 400
Event Counts Channel 1: 400
Getting good tpas.
Good consecutive counts: 400

WORKING ON EVENTS WITH TPA = 0.
CREATE DATASET WITH EVENTS.

WORKING ON EVENTS WITH TPA = -1.
CREATE DATASET WITH NOISE.

WORKING ON EVENTS WITH TPA > 0.
CREATE DATASET WITH TESTPULSES.

Hdf5 dataset created in  test_data/
Filepath and -name saved.
Accessing CON File...
200 Control Pulses for channel 0 in file.
CON File included.
DataHandler Instance created.
Start converting.

READ EVENTS FROM RDT FILE.
Total Records in File:  800
Getting good idx. (Depending on OS and drive reading speed, this might take some minutes!)

Event Counts Channel 0: 400
Event Counts Channel 1: 400
Getting good tpas.
Good consecutive counts: 400

WORKING ON EVENTS WITH TPA = 0.
CREATE DATASET WITH EVENTS.

WORKING ON EVENTS WITH TPA = -1.
CREATE DATASET WITH NOISE.

WORKING ON EVENTS WITH TPA > 0.
CREATE DATASET WITH TESTPULSES.

Hdf5 dataset created in  test_data/
Filepath and -name saved.
Accessing CON File...
200 Control Pulses for channel 0 in file.
CON File included.

Above we called for the first time the DataHandler class. This is a heavy class, that handles all the feature calculations of the raw data. It has stored the path to the HDF5 file as an attribute and saves all calculated properties there. You can get an overview of what data is stored in the DataHandler by calling its content method:

dh.content()

controlpulses
  hours                       (200,)    float64
  pulse_height                (2, 200)  float64
events
  dac_output                  (80,)           float64
  event                       (2, 80, 16384)  float32
  hours                       (80,)           float64
  time_mus                    (80,)           int32
  time_s                      (80,)           int32
noise
  dac_output                  (80,)           float64
  event                       (2, 80, 16384)  float32
  hours                       (80,)           float64
  time_mus                    (80,)           int32
  time_s                      (80,)           int32
testpulses
  dac_output                  (240,)           float64
  event                       (2, 240, 16384)  float32
  hours                       (240,)           float64
  testpulseamplitude          (240,)           float64
  time_mus                    (240,)           int32
  time_s                      (240,)           int32

Combine multiple files

We are often in the position, that we want to process data from multiple consecutive measurements in common. For this, we can merge two converted files and specify, if we want to keep the individual files. For large scale data processing, were often events from more than a hundred RDT files are processed, it makes sense to only virtually link the files without copying any data. This can be achieved as follows:

ai.data.combine_h5(fname="combined_file-P_Ch0-L_Ch1",
                   files=[fn+'-P_Ch0-L_Ch1' for fn in file_names],
                   src_dir=path_data,
                   out_dir=path_data,
                   groups_combine=["events", "testpulses", "controlpulses", "noise"]
                  )

Overwriting existing file 'test_data/combined_file-P_Ch0-L_Ch1.h5'.
Successfully combined files ['mock_001-P_Ch0-L_Ch1', 'mock_002-P_Ch0-L_Ch1'] into 'test_data/combined_file-P_Ch0-L_Ch1.h5' (18.0 KiB).
Calculating extended hours for all groups with datasets event, hours, time_s, time_mus:
DataHandler Instance created.
Successfully written hours with shape (160,) and dtype 'float32' to group events.

Successfully written hours with shape (160,) and dtype 'float32' to group noise.

Successfully written hours with shape (480,) and dtype 'float32' to group testpulses.

Here, we keep the original files and only create a third one (with very little disk space usage) that links to the original ones. If you really want to merge the files, i.e. copy all the original data into one file, use ai.data.merge_h5 instead.

Now we create a DataHandler for the combined file:

dh_combined = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
dh_combined.set_filepath(path_h5=path_data, fname="combined_file")
dh_combined.content()

DataHandler Instance created.
controlpulses
  hours                   (v) (400,)    float64
  pulse_height            (v) (2, 400)  float64
events
  dac_output              (v) (160,)           float64
  event                   (v) (2, 160, 16384)  float32
  hours                       (160,)           float32
  time_mus                (v) (160,)           int32
  time_s                  (v) (160,)           int32
noise
  dac_output              (v) (160,)           float64
  event                   (v) (2, 160, 16384)  float32
  hours                       (160,)           float32
  time_mus                (v) (160,)           int32
  time_s                  (v) (160,)           int32
testpulses
  dac_output              (v) (480,)           float64
  event                   (v) (2, 480, 16384)  float32
  hours                       (480,)           float32
  testpulseamplitude      (v) (480,)           float64
  time_mus                (v) (480,)           int32
  time_s                  (v) (480,)           int32

Notice that the datasets have a (v) marker now which tells us that we are looking at a virtual dataset, i.e. that it is only a reference to the original data.

Delete event traces

Once you are done with your raw data analysis (usually once you have your final energy spectra), you might wish to delete the raw event voltage traces because they take up a lot of disk space. You can do this (in this example already for all files) like so:

for fn in file_names:
    dh = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
    dh.set_filepath(path_h5=path_data, fname=fn)
    dh.drop_raw_data("events")
    dh.drop_raw_data("testpulses")
    dh.drop_raw_data("noise")
    # Dropping datasets does NOT decrease the size of an HDF5 file on disk because
    # of its file structure. To actually reduce the size, you have to repackage it
    dh.repackage()

DataHandler Instance created.
Dataset event deleted from group events.
Dataset event deleted from group testpulses.
Dataset event deleted from group noise.
Successfully repackaged 'test_data/mock_001-P_Ch0-L_Ch1.h5'. Memory saved: 50.0 MiB
DataHandler Instance created.
Dataset event deleted from group events.
Dataset event deleted from group testpulses.
Dataset event deleted from group noise.
Successfully repackaged 'test_data/mock_002-P_Ch0-L_Ch1.h5'. Memory saved: 50.0 MiB

If you have a file that combines all the original files, you would need to run the cait.data.combine_h5 function again!

ai.data.combine_h5(fname="combined_file-P_Ch0-L_Ch1",
                   files=[fn+'-P_Ch0-L_Ch1' for fn in file_names],
                   src_dir=path_data,
                   out_dir=path_data,
                   groups_combine=["events", "testpulses", "controlpulses", "noise"]
                  )
dh_combined = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
dh_combined.set_filepath(path_h5=path_data, fname="combined_file")

Overwriting existing file 'test_data/combined_file-P_Ch0-L_Ch1.h5'.
Successfully combined files ['mock_001-P_Ch0-L_Ch1', 'mock_002-P_Ch0-L_Ch1'] into 'test_data/combined_file-P_Ch0-L_Ch1.h5' (13.2 KiB).
Calculating extended hours for all groups with datasets event, hours, time_s, time_mus:
DataHandler Instance created.
Successfully written hours with shape (160,) and dtype 'float32' to group events.

Successfully written hours with shape (160,) and dtype 'float32' to group noise.

Successfully written hours with shape (480,) and dtype 'float32' to group testpulses.

DataHandler Instance created.

If we need the events again at a later point, we can include them.

for fn in file_names:
    dh = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
    dh.set_filepath(path_h5=path_data, fname=fn)
    dh.include_rdt(
        path_data=path_data, 
        fname=fn, 
        ints_in_header=7,
        tpa_list=[0, 1, -1],
        event_dtype='float32',
        lazy_loading=True,
        origin=None,
        )

DataHandler Instance created.
Accessing RDT File ...
Total Records in File:  800
Event Counts:  400
Adding 80 triggered Events.
Adding 80 Noise Events.
Adding 240 Testpulse Events.
Done.
DataHandler Instance created.
Accessing RDT File ...
Total Records in File:  800
Event Counts:  400
Adding 80 triggered Events.
Adding 80 Noise Events.
Adding 240 Testpulse Events.
Done.

Again, you also have to update the combined file:

ai.data.combine_h5(fname="combined_file-P_Ch0-L_Ch1",
                   files=[fn+'-P_Ch0-L_Ch1' for fn in file_names],
                   src_dir=path_data,
                   out_dir=path_data,
                   groups_combine=["events", "testpulses", "controlpulses", "noise"]
                  )
dh_combined = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
dh_combined.set_filepath(path_h5=path_data, fname="combined_file")

Overwriting existing file 'test_data/combined_file-P_Ch0-L_Ch1.h5'.
Successfully combined files ['mock_001-P_Ch0-L_Ch1', 'mock_002-P_Ch0-L_Ch1'] into 'test_data/combined_file-P_Ch0-L_Ch1.h5' (18.0 KiB).
Calculating extended hours for all groups with datasets event, hours, time_s, time_mus:
DataHandler Instance created.
Successfully written hours with shape (160,) and dtype 'float32' to group events.

Successfully written hours with shape (160,) and dtype 'float32' to group noise.

Successfully written hours with shape (480,) and dtype 'float32' to group testpulses.

DataHandler Instance created.

Please forward questions and correspondence about this notebook to felix.wagner(at)oeaw.ac.at.