cait.data¶

class cait.data.TestData(filepath: str, duration: float = 92, pulser_interval: float = 3, sample_frequency: int = 25000, channels: list = [0, 1], tpas: list = [20, 0.1, - 1.0, 20, 0.3, 0, 20, 0.5, 1, 20, 3, - 1, 20, 0, 10], event_tpa: list = [1, 1], baseline_resolution: list = [0.002, 0.003], slopes: list = [1, 1.5], scales: list = [12, 8], record_length: int = 16384, dvm_channels: int = 0, start_s: int = 1602879720, offset: list = [0, 0], fitpar: list = [[- 1.11, 4, 0.02, 4.15, 2.1, 53.06], [0.77, 51.76, 50.81, 37.24, 8.33, 8.59]], fitpar_carrier: list = [[- 2.38, 1.73, 1.65, 136, 0.38, 2.13], [0, 0, 0, 1, 1, 1]], include_carriers: bool = True, relative_ph_sigma=0.1, eventsize: int = 2081024, samplesdiv: int = 2, cdaq_offset: int = 30000, types: list = [0, 1], clock: int = 10000000)[source]¶

Bases: object

A class for the generation of *.rdt, *.par, *.con, *.csmpl, *.db, *.dig_stamps and *.test_stamps files for the testing of all data processing routines.

Parameters

filepath (string) – The path to the location of the generated data, including file name without appendix, e.g. “../data/run01_Test/mock_001”.
duration (float, > 0) – The duration of the generated measurement time in seconds.
pulser_interval (float, > 0) – The interval in which test pulses are sent.
sample_frequency (int, > 0) – The sample frequency of the measurement in Hz.
channels (list of integers > 0) – A list of the channel numbers, corresponding to channels in RDT or CSMPL files.
tpas (list of floats) – A list of the Test Pulse Amplitudes that are sent in this order. A TPA > 10 is set to 10 and counted as control pulse. A TPA < 0 is set to 0 and counted as noise baseline recording. A TPA of 0 is counted as triggered event and set to the height event_tpa.
event_tpa (list of nmbr_channels floats) – The height of a TPA 0 event is sampled from a uniform distribution with with maximum and minimum zero, before application of the saturation curve.
baseline_resolution (list of nmbr_channels floats) – The standard deviations of the noise, before application of the saturation function.
k – The slope parameter of the logistics function, that is used to describe the saturation.
l – The maximal height of the logistics function, that is used to describe the saturation.
record_length (int > 0, should be power of 2) – The number of samples in a record window.
dvm_channels (just put this to 0) – The number of DVM channels in the RDT file. This feature is currently not implemented, please stick with the standard value of 0.
start_s (int > 0, standard value: 16.10.2020 22:22:00, the time of the first cait commit) – The linux time stamp in seconds of the start of the measurement.
offset (list of two floats) – The baseline offset of the channels.
fitpar (list of nmbr_channels 1D numpy arrays, containing the 6 fit parameters, consistent with the fit_pulse_model) – The parameters of the Proebst-pulse shape for the alternative events of all channels.
fitpar_carr (list of nmbr_channels 1D numpy arrays, containing the 6 fit parameters, consistent with the fit_pulse_model) – The parameters of the Proebst-pulse shape for the events of all channels.
include_carriers (bool) – If true, every second event is a carrier event.
relative_ph_sigma (float > 0 and < 1) – The relative variation of the pulse height.
eventsize (int > 0) – The number of samples per bankswitch in 50kHz.
samplesdiv (int > 0) – The factor with that eventsize has to be divided, to match it with the sample_frequency.
cdaq_offset (int, > 0 but < event_size/samplesdiv) – The offset of the cdaq to the hardware daq, in samples with length 1/sample_frequency.
types (int) – If 0 it is a phonon channel, if 1 a light channel. Other types ambiguous.
clock (int > 0) – The frequency of the clock for the cdaq, for CRESST it is 10MHz.

>>> import cait as ai
>>> test_data = test_data = ai.data.TestData(filepath='test_001')
>>> test_data.generate()
Rdt file written.
Con file written.
Par file written.
Csmpl Files for all Channels written.
Sql file written.
Dig_stamps file written.
Test_stamps file written.

generate(start_offset: int = 0, source: bool = None)[source]¶

Generate all files from a measurement file (rdt, con, par, csmpl, sql, dig, test).

Please be careful with the generation and merge of two test data files: You should set the start_offset of the second file such, that the record time of both files are well separated (>1 minute). In the process of triggering and determination of trigger times the start time of a file is extracted from the time stamps of test pulses and might therefore be wrong for intervals of several seconds. As this error is consistently done for all timestamps in the second file, it does not influence the analysis - however, if the start_offset of simulated data is too close to the end of a previous file, the events will overlap.

Parameters

start_offset (float >= 0) – The time elapsed from start of measurement to start of this file in seconds.
source (string or None) – If this argument is passed, it must be either ‘hw’ to simulate the files from a hardware data aquisition (RDT, PAR, CON) or ‘stream’ to simulate the files from stream data (CSMPL, SQL, DIG_STAMPS, TEST_STAMPS)

update_duration(new_duration: float)[source]¶

Update the duration of a measurement file for the next data generation.

Parameters: new_duration (float) – The new duration in seconds.

update_filepath(file_path: str)[source]¶

Update the file path of a measurement file for the next generation.

Parameters: file_path (string) – The new file path.

cait.data.convert_h5_to_root(path_h5, path_root, nmbr_channels)[source]¶

Convert a HDF5 file to a ROOT file.

Parameters

path_h5 – The path to the hdf5 file that is read.
path_root (string) – The path to the root file that is created.
nmbr_channels (int) – The number of channels of the module.

cait.data.convert_to_V(event, bits=16, max=10, min=- 10, offset=0)[source]¶

Converts an event from int to volt.

Parameters

event (1D array) – The event we want to convert.
bits (int,) – Nnumber of bits in each sample.
max (int) – The max volt value.
min (int) – The min volt value.
offset (int) – The offset of the volt signal.

Returns

The converted event array.

Return type

1D array

cait.data.gen_dataset_from_rdt(path_rdt, fname, path_h5, channels, tpa_list=[0.0], calc_mp=True, calc_fit=False, calc_sev=False, calc_nps=True, processes=4, event_dtype='float32', ints_in_header=7, sample_frequency=25000, lazy_loading=True)[source]¶

Generates a HDF5 File from an RDT File, optionally MP, Fit, SEV Calculation.

Parameters

path_rdt (string) – Path to the rdt file e.g. “data/bcks/”.
fname (string) – Name of the file e.g. “bck_001”.
path_h5 (string) – Path where the h5 file is saved e.g. “data/hdf5s%”.
channels (list) – the numbers of the channels in the hdf5 file that we want to include in rdt
tpa_list (list) – The test pulse amplitudes to save, if 1 is in the list, all positive values are included.
calc_mp (bool) – If True the main parameters for all events are calculated and stored.
calc_fit (bool) – Not recommended! If True the parametric fit for all events is calculated and stored.
calc_sev (bool) – Not recommended! If True the standard event for all event channels is calculated.
calc_nps (bool) – If True the main parameters for all events are calculated and stored.
processes (int) – The number of processes that is used for the code execution.
event_dtype (string) – Datatype to save the events with.
ints_in_header (int) – The number of ints in the header of the events in the RDF file. This should be either 7 or 6!
sample_frequency (int) – The sample frequency of the records.
lazy_loading (bool) – Recommended! If true, the data is loaded with memory mapping to avoid memory overflows.

cait.data.gen_dataset_from_rdt_memsafe(path_rdt, fname, path_h5, channels, tpa_list=[0.0, 1.0, - 1.0], event_dtype='float32', ints_in_header=7, dvm_channels=0, record_length=16384, batch_size=1000, trace=False)[source]¶

Generates a HDF5 File from an RDT File, with an memory safe implementation. This is recommended, in case the RDT file is large or the available RAM small.

Parameters

path_rdt (string) – Path to the rdt file e.g. “data/bcks/”.
fname (string) – Name of the file e.g. “bck_001”.
path_h5 (string) – Path where the h5 file is saved e.g. “data/hdf5s%”.
channels (list) – the numbers of the channels in the hdf5 file that we want to include in rdt
tpa_list (list) – The test pulse amplitudes to save, if 1 is in the list, all positive values are included.
event_dtype (string) – Datatype to save the events with.
ints_in_header (int) – The number of ints in the header of the events in the RDF file. This should be either 7 or 6!
dvm_channels (int) – The number of DVM channels, this can be read in the PAR file.
record_length (int) – The number of samples in one record window.
batch_size (int) – The batch size for loading the samples from disk. Usually 1000 is a good value and produces RAM usage around 250 MB.
trace (bool) – Trace the runtime and memory consumption

cait.data.get_cc_noise(nmbr_noise, nps, lamb=0.01)[source]¶

Simulation of a noise baseline, according to Carretoni Cremonesi: arXiv:1006.3289

Parameters

nmbr_noise (int > 0) – Number of noise baselines to simulate.
nps (1D array of odd size) – Noise power spectrum of the baselines, e.g. generated with scipy.fft.rfft().
lamb (integer > 0) – Parameter of the method (overlap between ).

Returns

The simulated baselines.

Return type

2D array of size (nmbr_noise, 2*(len(nps)-1))

cait.data.get_nps(x)[source]¶

Calculates the Noise Power Spectrum (NPS) of a given array.

Parameters: x (1D numpy array of size N) – The time series.
Returns: The noise power spectrum.
Return type: 1D numpy array of size N/2 + 1

cait.data.merge_h5_sets(path_h5_a, path_h5_b, path_h5_merged, groups_to_merge=['events', 'testpulses', 'noise', 'controlpulses', 'stream'], sets_to_merge=['event', 'mainpar', 'true_ph', 'true_onset', 'of_ph', 'sev_fit_par', 'sev_fit_rms', 'hours', 'labels', 'testpulseamplitude', 'time_s', 'time_mus', 'pulse_height', 'pca_error', 'pca_projection', 'tp_hours', 'tp_time_mus', 'tp_time_s', 'tpa', 'trigger_hours', 'trigger_time_mus', 'trigger_time_s'], concatenate_axis=[1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], groups_from_a=[], groups_from_b=[], a_name='keep', b_name='keep', continue_hours=True, second_file_start=None, keep_original_files=True, verb=False, trace=False)[source]¶

Merges two HDF5 files.

Parameters

path_h5_a (string) – Path to the first file to merge.
path_h5_b (string) – Path to the other file to merge.
path_h5_merged (string) – Path where the merged file is saved.
groups_to_merge (list of strings) – The groups that hold the arrays that we want to concatenate.
sets_to_merge (list of strings) – The sets that hold the arrays we want to concatenate, same sets for all groups.
concatenate_axis (list of ints) – The axis along which the arrays are concatenated. Each n’th index in this list corresponds to the n’th string in the sets_to_merge list.
groups_from_a (list of strings) – Which groups are copied from the first HDF5 set.
groups_from_b (list of strings) – Which groups are copied from the second HDF5 set.
a_name (string) – Type a name for the first HDF5 set to identify the data later on with the original data set. This name is stored in the origin data set in the corresponding group. If ‘keep’, the content of the origin data set from the HDF5 set is copied.
b_name (string) – Type a name for the second HDF5 set to identify the data later on with the original data set. This name is stored in the origin data set in the corresponding group. If ‘keep’, the content of the origin data set from the HDF5 set is copied.
continue_hours (bool) – If True, the value of the last hours in a is added to the hours in b.
second_file_start (float or None) – The hours value at which the second file starts. If this is not handed and continue_hours is activated, the value is extracted from the test pulses.
keep_original_files (bool) – If False, the original files are deleted after the merge.
verb (bool) – If True, verbal feedback about the process of the merge is given.
trace (bool) – Traces the memory and runtime consumption.

cait.data.noise_function(nps)[source]¶

Simulates the function f from CC-Noise Algo with a given Noise Power Spectrum (NPS) see arXiv:1006.3289, eq (4) and (5)

Parameters: f (real valued 1D array of size N/2 + 1) – The noise power spectrum.
Returns: Noise Baselines.
Return type: real valued 1D array of size N

cait.data.read_xy_file(filepath, skip_lines=4, separator='\t')[source]¶

Reads a txt file in the XY format.

The XY format is intended for plotting and data sharing. It has in the first line the title of the plot, followed by one line per axis with the axis label. Then the data follows, with one data point per line and one axis in each columns, seperated by tabulators.

Parameters

filepath (string) – The path from where we read the file.
skip_lines (int) – The number of lines at beginning of the file that contain no data (title, axis labels, …).
separator (string) – The unicode of the column separator, default tabulator.

Returns

The data array.

Return type

array

cait.data.write_xy_file(filepath, data, title, axis)[source]¶

Writes a txt file in the XY format.

The XY format is intended for plotting and data sharing. It has in the first line the title of the plot, followed by one line per axis with the axis label. Then the data follows, with one data point per line and one axis in each columns, seperated by tabulators.

Parameters

filepath (string) – The path where we write the file.
data (array of shape (nmbr dimensions, nmbr data points)) – The data that we want to plot
title (string) – The title of the data plot.
axis (list of strings) – The axis labels.