Conversion of Hardware-triggered Data
Warning
Note that this tutorial describes features and workflows that have since been outdated. We are working on updated tutorials. Coming soon!
In this tutorial, we generate mock hardware triggered data to test all functionality of Cait. The generated data is in all properties similar to data from the CRESST and COSINUS data aquisitions that work with the program CSS. The only exception is that the noise and pulses are not measured, but generated with parametric descriptions of the pulse shape and normal distributed noise.
We start with importing the library.
import cait as ai
Generate Test Data
(You can skip this step if you already have hardware triggered data from your experiment for example.)
The TestData class handles the generation of data.
test_data = ai.data.TestData(filepath='test_data/mock_001', duration=1800)
First we generate an RDT file, that holds all triggered events, test pulses and noise events. Right after the generation we call a check function, that prints the content of the first event, to check if the file is properly written.
test_data._generate_rdt_file()
dh = ai.DataHandler(nmbr_channels=2)
dh.checkout_rdt(path_rdt='test_data/mock_001.rdt', read_events=1, verb=True)
Rdt file written.
DataHandler Instance created.
#############################################################
EVENT NUMBER: 0
detector number (starting at 0): 0
number of coincident pulses in digitizer module: 0
module trigger counter (starts at 0, when TRA or WRITE starts): 1
channel trigger delay relative to time stamp [µs]: 0
absolute time [s] (computer time timeval.tv_sec): 1602879726
absolute time [us] (computer time timeval.tv_us): 0
Delay of channel trigger to testpulse [us]: 0
time stamp of module trigger low word (10 MHz clock, 0 @ START WRITE ): 0
time stamp of module trigger high word (10 MHz clock, 0 @ START WRITE ): 6
number of qdc events accumulated until digitizer trigger: 0
measuring hours (0 @ START WRITE): 0.0016666667070239782
accumulated dead time of channel [s] (0 @ START WRITE): 0.0
test pulse amplitude (0. for pulses, (0.,10.] for test pulses, >10. for control pulses): 0.10000000149011612
DAC output of control program (proportional to heater power): 0.0
The CON file holds the time stamps and pulse heights of the control pulses. Also for the CON file, we call a check function after the data generation.
test_data._generate_con_file()
dh = ai.DataHandler(nmbr_channels=2)
dh.checkout_con(path_con='test_data/mock_001.con', read_events=5)
Con file written.
DataHandler Instance created.
5 control pulses read from CON file.
detector_nmbr, pulse_height, time_stamp_low, time_stamp_high, dead_time, mus_since_last_tp
1 0 6.27 30000000 0 0.0 [0]
2 1 4.06 30000000 0 0.0 [0]
3 0 6.21 120000000 0 0.0 [0]
4 1 3.96 120000000 0 0.0 [0]
5 0 5.71 210000000 0 0.0 [0]
To every RDT file belongs a PAR file, which is a text file with additional information. The generated PAR file can be checked by opening it with a text editor or with “vim FILE_NAME” in the command line.
test_data._generate_par_file()
# test by looking at the text file!
Par file written.
We repeat the data generation for a second file, this time we call a pre-implemented method that does the steps from above all at once. Notice that we specify the gap in measuring time between the two files.
test_data.update_filepath(file_path='test_data/mock_002')
test_data.generate(start_offset=1.5 * 3600, source='hw')
Rdt file written.
Con file written.
Par file written.
Data Conversion
The Cait library accesses and stores its data in HDF5 files, which are a structured file format and convenient for high-level applications. We are aware, that saving the data twice might be inefficient in terms of storage space. However, as a solution for this we propose to keep the raw data events only so long in the HDF5 files, until all needed high level features of the raw data are calculated. We show below how this is done.
But first, we generate a HDF5 file from the events that are contained in the RDT file and the control pulses from the CON file. For this, the PAR file must be in the same directory as the RDT file.
path_data = 'test_data/'
file_names = ['mock_001',
'mock_002']
# Conversion from Rdt to HDF5
for file in file_names:
dh = ai.DataHandler(channels=[0,1],
record_length=16384,
sample_frequency=25000)
dh.convert_dataset(
path_rdt=path_data,
fname=file,
path_h5=path_data,
tpa_list=[0, 1, -1],
calc_mp=False,
calc_sev=False,
calc_nps=False,
lazy_loading=True,
event_dtype='float32',
ints_in_header=7,
memsafe=True,
dvm_channels=0,
batch_size=1000,
trace=False,
)
dh.include_con_file(path_con_file=path_data + file + '.con')
DataHandler Instance created.
Start converting.
READ EVENTS FROM RDT FILE.
Total Records in File: 800
Getting good idx. (Depending on OS and drive reading speed, this might take some minutes!)
Event Counts Channel 0: 400
Event Counts Channel 1: 400
Getting good tpas.
Good consecutive counts: 400
WORKING ON EVENTS WITH TPA = 0.
CREATE DATASET WITH EVENTS.
WORKING ON EVENTS WITH TPA = -1.
CREATE DATASET WITH NOISE.
WORKING ON EVENTS WITH TPA > 0.
CREATE DATASET WITH TESTPULSES.
Hdf5 dataset created in test_data/
Filepath and -name saved.
Accessing CON File...
200 Control Pulses for channel 0 in file.
CON File included.
DataHandler Instance created.
Start converting.
READ EVENTS FROM RDT FILE.
Total Records in File: 800
Getting good idx. (Depending on OS and drive reading speed, this might take some minutes!)
Event Counts Channel 0: 400
Event Counts Channel 1: 400
Getting good tpas.
Good consecutive counts: 400
WORKING ON EVENTS WITH TPA = 0.
CREATE DATASET WITH EVENTS.
WORKING ON EVENTS WITH TPA = -1.
CREATE DATASET WITH NOISE.
WORKING ON EVENTS WITH TPA > 0.
CREATE DATASET WITH TESTPULSES.
Hdf5 dataset created in test_data/
Filepath and -name saved.
Accessing CON File...
200 Control Pulses for channel 0 in file.
CON File included.
Above we called for the first time the DataHandler class. This is a heavy class, that handles all the feature calculations of the raw data. It has stored the path to the HDF5 file as an attribute and saves all calculated properties there. You can get an overview of what data is stored in the DataHandler by calling its content method:
dh.content()
controlpulses
hours (200,) float64
pulse_height (2, 200) float64
events
dac_output (80,) float64
event (2, 80, 16384) float32
hours (80,) float64
time_mus (80,) int32
time_s (80,) int32
noise
dac_output (80,) float64
event (2, 80, 16384) float32
hours (80,) float64
time_mus (80,) int32
time_s (80,) int32
testpulses
dac_output (240,) float64
event (2, 240, 16384) float32
hours (240,) float64
testpulseamplitude (240,) float64
time_mus (240,) int32
time_s (240,) int32
Combine multiple files
We are often in the position, that we want to process data from multiple consecutive measurements in common. For this, we can merge two converted files and specify, if we want to keep the individual files. For large scale data processing, were often events from more than a hundred RDT files are processed, it makes sense to only virtually link the files without copying any data. This can be achieved as follows:
ai.data.combine_h5(fname="combined_file-P_Ch0-L_Ch1",
files=[fn+'-P_Ch0-L_Ch1' for fn in file_names],
src_dir=path_data,
out_dir=path_data,
groups_combine=["events", "testpulses", "controlpulses", "noise"]
)
Overwriting existing file 'test_data/combined_file-P_Ch0-L_Ch1.h5'.
Successfully combined files ['mock_001-P_Ch0-L_Ch1', 'mock_002-P_Ch0-L_Ch1'] into 'test_data/combined_file-P_Ch0-L_Ch1.h5' (18.0 KiB).
Calculating extended hours for all groups with datasets event, hours, time_s, time_mus:
DataHandler Instance created.
Successfully written hours with shape (160,) and dtype 'float32' to group events.
Successfully written hours with shape (160,) and dtype 'float32' to group noise.
Successfully written hours with shape (480,) and dtype 'float32' to group testpulses.
Here, we keep the original files and only create a third one (with very little disk space usage) that links to the original ones. If you really want to merge the files, i.e. copy all the original data into one file, use ai.data.merge_h5 instead.
Now we create a DataHandler for the combined file:
dh_combined = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
dh_combined.set_filepath(path_h5=path_data, fname="combined_file")
dh_combined.content()
DataHandler Instance created.
controlpulses
hours (v) (400,) float64
pulse_height (v) (2, 400) float64
events
dac_output (v) (160,) float64
event (v) (2, 160, 16384) float32
hours (160,) float32
time_mus (v) (160,) int32
time_s (v) (160,) int32
noise
dac_output (v) (160,) float64
event (v) (2, 160, 16384) float32
hours (160,) float32
time_mus (v) (160,) int32
time_s (v) (160,) int32
testpulses
dac_output (v) (480,) float64
event (v) (2, 480, 16384) float32
hours (480,) float32
testpulseamplitude (v) (480,) float64
time_mus (v) (480,) int32
time_s (v) (480,) int32
Notice that the datasets have a (v) marker now which tells us that we are looking at a virtual dataset, i.e. that it is only a reference to the original data.
Delete event traces
Once you are done with your raw data analysis (usually once you have your final energy spectra), you might wish to delete the raw event voltage traces because they take up a lot of disk space. You can do this (in this example already for all files) like so:
for fn in file_names:
dh = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
dh.set_filepath(path_h5=path_data, fname=fn)
dh.drop_raw_data("events")
dh.drop_raw_data("testpulses")
dh.drop_raw_data("noise")
# Dropping datasets does NOT decrease the size of an HDF5 file on disk because
# of its file structure. To actually reduce the size, you have to repackage it
dh.repackage()
DataHandler Instance created.
Dataset event deleted from group events.
Dataset event deleted from group testpulses.
Dataset event deleted from group noise.
Successfully repackaged 'test_data/mock_001-P_Ch0-L_Ch1.h5'. Memory saved: 50.0 MiB
DataHandler Instance created.
Dataset event deleted from group events.
Dataset event deleted from group testpulses.
Dataset event deleted from group noise.
Successfully repackaged 'test_data/mock_002-P_Ch0-L_Ch1.h5'. Memory saved: 50.0 MiB
If you have a file that combines all the original files, you would need to run the cait.data.combine_h5 function again!
ai.data.combine_h5(fname="combined_file-P_Ch0-L_Ch1",
files=[fn+'-P_Ch0-L_Ch1' for fn in file_names],
src_dir=path_data,
out_dir=path_data,
groups_combine=["events", "testpulses", "controlpulses", "noise"]
)
dh_combined = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
dh_combined.set_filepath(path_h5=path_data, fname="combined_file")
Overwriting existing file 'test_data/combined_file-P_Ch0-L_Ch1.h5'.
Successfully combined files ['mock_001-P_Ch0-L_Ch1', 'mock_002-P_Ch0-L_Ch1'] into 'test_data/combined_file-P_Ch0-L_Ch1.h5' (13.2 KiB).
Calculating extended hours for all groups with datasets event, hours, time_s, time_mus:
DataHandler Instance created.
Successfully written hours with shape (160,) and dtype 'float32' to group events.
Successfully written hours with shape (160,) and dtype 'float32' to group noise.
Successfully written hours with shape (480,) and dtype 'float32' to group testpulses.
DataHandler Instance created.
If we need the events again at a later point, we can include them.
for fn in file_names:
dh = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
dh.set_filepath(path_h5=path_data, fname=fn)
dh.include_rdt(
path_data=path_data,
fname=fn,
ints_in_header=7,
tpa_list=[0, 1, -1],
event_dtype='float32',
lazy_loading=True,
origin=None,
)
DataHandler Instance created.
Accessing RDT File ...
Total Records in File: 800
Event Counts: 400
Adding 80 triggered Events.
Adding 80 Noise Events.
Adding 240 Testpulse Events.
Done.
DataHandler Instance created.
Accessing RDT File ...
Total Records in File: 800
Event Counts: 400
Adding 80 triggered Events.
Adding 80 Noise Events.
Adding 240 Testpulse Events.
Done.
Again, you also have to update the combined file:
ai.data.combine_h5(fname="combined_file-P_Ch0-L_Ch1",
files=[fn+'-P_Ch0-L_Ch1' for fn in file_names],
src_dir=path_data,
out_dir=path_data,
groups_combine=["events", "testpulses", "controlpulses", "noise"]
)
dh_combined = ai.DataHandler(channels=[0, 1], record_length=16384, sample_frequency=25000)
dh_combined.set_filepath(path_h5=path_data, fname="combined_file")
Overwriting existing file 'test_data/combined_file-P_Ch0-L_Ch1.h5'.
Successfully combined files ['mock_001-P_Ch0-L_Ch1', 'mock_002-P_Ch0-L_Ch1'] into 'test_data/combined_file-P_Ch0-L_Ch1.h5' (18.0 KiB).
Calculating extended hours for all groups with datasets event, hours, time_s, time_mus:
DataHandler Instance created.
Successfully written hours with shape (160,) and dtype 'float32' to group events.
Successfully written hours with shape (160,) and dtype 'float32' to group noise.
Successfully written hours with shape (480,) and dtype 'float32' to group testpulses.
DataHandler Instance created.
Please forward questions and correspondence about this notebook to felix.wagner(at)oeaw.ac.at.