cait.data

class cait.data.TestData(filepath: str, duration: float = 92, pulser_interval: float = 3, sample_frequency: int = 25000, channels: list = [0, 1], tpas: list = [20, 0.1, -1.0, 20, 0.3, 0, 20, 0.5, 1, 20, 3, -1, 20, 0, 10], event_tpa: list = [1, 1], baseline_resolution: list = [0.002, 0.003], slopes: list = [1, 1.5], scales: list = [12, 8], record_length: int = 16384, dvm_channels: int = 0, start_s: int = 1602879720, offset: list = [0, 0], fitpar: list = [[-1.11, 4, 0.02, 4.15, 2.1, 53.06], [0.77, 51.76, 50.81, 37.24, 8.33, 8.59]], fitpar_carrier: list = [[-2.38, 1.73, 1.65, 136, 0.38, 2.13], [0, 0, 0, 1, 1, 1]], include_carriers: bool = True, relative_ph_sigma=0.1, eventsize: int = 2081024, samplesdiv: int = 2, cdaq_offset: int = 30000, types: list = [0, 1], clock: int = 10000000)[source]

Bases: object

A class for the generation of *.rdt, *.par, *.con, *.csmpl, *.db, *.dig_stamps and *.test_stamps files for the testing of all data processing routines.

Parameters:

filepath (string) – The path to the location of the generated data, including file name without appendix, e.g. “../data/run01_Test/mock_001”.
duration (float, > 0) – The duration of the generated measurement time in seconds.
pulser_interval (float, > 0) – The interval in which test pulses are sent.
sample_frequency (int, > 0) – The sample frequency of the measurement in Hz.
channels (list of integers > 0) – A list of the channel numbers, corresponding to channels in RDT or CSMPL files.
tpas (list of floats) – A list of the Test Pulse Amplitudes that are sent in this order. A TPA > 10 is set to 10 and counted as control pulse. A TPA < 0 is set to 0 and counted as noise baseline recording. A TPA of 0 is counted as triggered event and set to the height event_tpa.
event_tpa (list of nmbr_channels floats) – The height of a TPA 0 event is sampled from a uniform distribution with with maximum and minimum zero, before application of the saturation curve.
baseline_resolution (list of nmbr_channels floats) – The standard deviations of the noise, before application of the saturation function.
k – The slope parameter of the logistics function, that is used to describe the saturation.
l – The maximal height of the logistics function, that is used to describe the saturation.
record_length (int > 0, should be power of 2) – The number of samples in a record window.
dvm_channels (just put this to 0) – The number of DVM channels in the RDT file. This feature is currently not implemented, please stick with the standard value of 0.
start_s (int > 0, standard value: 16.10.2020 22:22:00, the time of the first cait commit) – The linux time stamp in seconds of the start of the measurement.
offset (list of two floats) – The baseline offset of the channels.
fitpar (list of nmbr_channels 1D numpy arrays, containing the 6 fit parameters, consistent with the fit_pulse_model) – The parameters of the Proebst-pulse shape for the alternative events of all channels.
fitpar_carr (list of nmbr_channels 1D numpy arrays, containing the 6 fit parameters, consistent with the fit_pulse_model) – The parameters of the Proebst-pulse shape for the events of all channels.
include_carriers (bool) – If true, every second event is a carrier event.
relative_ph_sigma (float > 0 and < 1) – The relative variation of the pulse height.
eventsize (int > 0) – The number of samples per bankswitch in 50kHz.
samplesdiv (int > 0) – The factor with that eventsize has to be divided, to match it with the sample_frequency.
cdaq_offset (int, > 0 but < event_size/samplesdiv) – The offset of the cdaq to the hardware daq, in samples with length 1/sample_frequency.
types (int) – If 0 it is a phonon channel, if 1 a light channel. Other types ambiguous.
clock (int > 0) – The frequency of the clock for the cdaq, for CRESST it is 10MHz.

>>> import cait as ai
>>> test_data = test_data = ai.data.TestData(filepath='test_001')
>>> test_data.generate()
Rdt file written.
Con file written.
Par file written.
Csmpl Files for all Channels written.
Sql file written.
Dig_stamps file written.
Test_stamps file written.

generate(start_offset: int = 0, source: bool = None)[source]

Generate all files from a measurement file (rdt, con, par, csmpl, sql, dig, test).

Please be careful with the generation and merge of two test data files: You should set the start_offset of the second file such, that the record time of both files are well separated (>1 minute). In the process of triggering and determination of trigger times the start time of a file is extracted from the time stamps of test pulses and might therefore be wrong for intervals of several seconds. As this error is consistently done for all timestamps in the second file, it does not influence the analysis - however, if the start_offset of simulated data is too close to the end of a previous file, the events will overlap.

Parameters:

start_offset (float >= 0) – The time elapsed from start of measurement to start of this file in seconds.
source (string or None) – If this argument is passed, it must be either ‘hw’ to simulate the files from a hardware data aquisition (RDT, PAR, CON) or ‘stream’ to simulate the files from stream data (CSMPL, SQL, DIG_STAMPS, TEST_STAMPS)

update_duration(new_duration: float)[source]

Update the duration of a measurement file for the next data generation.

Parameters:: new_duration (float) – The new duration in seconds.

update_filepath(file_path: str)[source]

Update the file path of a measurement file for the next generation.

Parameters:: file_path (string) – The new file path.

cait.data.combine_h5(fname: str, files: List[str], src_dir: str = '', out_dir: str = '', groups_combine: List[str] = ['events', 'testpulses', 'noise'], groups_include: List[str] = [], extend_hours: bool = True)[source]

Combines multiple HDF5 files into a single file using virtual datasets, i.e. none of the data is actually copied, yet it can be accessed as if it was stored in the same file. It is important that the initial HDF5 files have the same structure, at least for the data groups handed by groups_merge. Otherwise the function might crash or, even worse, yield nonsensical data combinations.

Be aware, that the files are combined in the order as they are specified in ‘files’. If you want to make sure that they are in (temporally) increasing order, you have to sort the list accordingly.

Parameters:

fname (str) – The name of the output file (without the .h5 extension). If it already exists, the file content is overwritten.
files (List[str]) – List of HDF5 files to be combined (without the .h5 extension)
src_dir (str) – The directory of the HDF5 files you wish to combine. Default: current directory
out_dir (str) – The directory where the output HDF5 file will be saved. Default: current directory
groups_combine (List[str]) – Groups in the HDF5 files you wish to combine. The function will loop through all the datasets within that group and combine them along the first dimension (for 1-dimensional data), or along the second dimension (for 2- and 3-dimensional data). This is because the datasets have shape (events, data) or (channels, events, data) and we want to append them along the events-dimension.
groups_include (List[str]) – Groups you just wish to copy from one representative file, i.e. the data will not be appended. This can be useful for SEVs or optimum filter, etc.
extend_hours (bool) – If True, the hours dataset of all groups is updated in the final file such that it does not restart at 0 after every file but continuously increases. This requires the existence of datasets event, time_s, and time_mus in the respective groups.

cait.data.convert_h5_to_root(path_h5, path_root, nmbr_channels)[source]

Convert a HDF5 file to a ROOT file.

Parameters:

path_h5 – The path to the hdf5 file that is read.
path_root (string) – The path to the root file that is created.
nmbr_channels (int) – The number of channels of the module.

cait.data.convert_to_V(event, bits=16, max=10, min=-10, offset=0)[source]

Converts an event from int to volt.

Parameters:

event (1D array) – The event we want to convert.
bits (int) – Number of bits in each sample.
max (int) – The max volt value.
min (int) – The min volt value.
offset (int) – The offset of the volt signal.

Returns:

The converted event array.

Return type:

1D array

cait.data.gen_dataset_from_rdt(path_rdt, fname, path_h5, channels, tpa_list=[0.0], calc_mp=True, calc_fit=False, calc_sev=False, calc_nps=True, processes=4, event_dtype='float32', ints_in_header=7, sample_frequency=25000, lazy_loading=True)[source]

Generates a HDF5 File from an RDT File, optionally MP, Fit, SEV Calculation.

Parameters:

path_rdt (string) – Path to the rdt file e.g. “data/bcks/”.
fname (string) – Name of the file e.g. “bck_001”.
path_h5 (string) – Path where the h5 file is saved e.g. “data/hdf5s%”.
channels (list) – the numbers of the channels in the hdf5 file that we want to include in rdt
tpa_list (list) – The test pulse amplitudes to save, if 1 is in the list, all positive values are included.
calc_mp (bool) – If True the main parameters for all events are calculated and stored.
calc_fit (bool) – Not recommended! If True the parametric fit for all events is calculated and stored.
calc_sev (bool) – Not recommended! If True the standard event for all event channels is calculated.
calc_nps (bool) – If True the main parameters for all events are calculated and stored.
processes (int) – The number of processes that is used for the code execution.
event_dtype (string) – Datatype to save the events with.
ints_in_header (int) – The number of ints in the header of the events in the RDF file. This should be either 7 or 6!
sample_frequency (int) – The sample frequency of the records.
lazy_loading (bool) – Recommended! If true, the data is loaded with memory mapping to avoid memory overflows.

cait.data.gen_dataset_from_rdt_memsafe(path_rdt, fname, path_h5, channels, tpa_list=[0.0, 1.0, -1.0], event_dtype='float32', ints_in_header=7, dvm_channels=0, record_length=16384, batch_size=1000, trace=False, indiv_tpas=False)[source]

Generates a HDF5 File from an RDT File, with an memory safe implementation. This is recommended, in case the RDT file is large or the available RAM small.

Parameters:

path_rdt (string) – Path to the rdt file e.g. “data/bcks/”.
fname (string) – Name of the file e.g. “bck_001”.
path_h5 (string) – Path where the h5 file is saved e.g. “data/hdf5s%”.
channels (list) – the numbers of the channels in the hdf5 file that we want to include in rdt
tpa_list (list) – The test pulse amplitudes to save, if 1 is in the list, all positive values are included.
event_dtype (string) – Datatype to save the events with.
ints_in_header (int) – The number of ints in the header of the events in the RDF file. This should be either 7 or 6!
dvm_channels (int) – The number of DVM channels, this can be read in the PAR file.
record_length (int) – The number of samples in one record window.
batch_size (int) – The batch size for loading the samples from disk. Usually 1000 is a good value and produces RAM usage around 250 MB.
trace (bool) – Trace the runtime and memory consumption
individual_tpas (bool) – Write individual TPAs for the all channels. This results in a testpulseamplitude dataset of shape (nmbr_channels, nmbr_testpulses). Otherwise we have (nmbr_testpulses).

cait.data.get_cc_noise(nmbr_noise, nps, lamb=0.01, force_zero=True, **kwargs)[source]

Simulation of a noise baseline, according to Carretoni Cremonesi: arXiv:1006.3289

Parameters:

nmbr_noise (int > 0) – Number of noise baselines to simulate.
nps (1D array of odd size) – Noise power spectrum of the baselines, e.g. generated with scipy.fft.rfft().
lamb (float > 0) – Parameter of the method (overlap between ).
force_zero (bool) – Force the zero coefficient (constant offset) of the NPS to zero.

Returns:

The simulated baselines.

Return type:

2D array of size (nmbr_noise, 2*(len(nps)-1))

cait.data.get_metainfo(path_par)[source]

Read the metainfo from the PAR file.

Parameters:: path_sql (string) – The path of the PAR file.
Returns:: The metadata.
Return type:: dict

cait.data.get_nps(x)[source]

Calculates the Noise Power Spectrum (NPS) of a given array.

Parameters:: x (1D numpy array of size N) – The time series.
Returns:: The noise power spectrum.
Return type:: 1D numpy array of size N/2 + 1

cait.data.merge_h5(fname: str, files: List[str], src_dir: str = '', out_dir: str = '', groups_merge: List[str] = ['events', 'testpulses', 'noise'], groups_include: List[str] = [], extend_hours: bool = True)[source]

Merges multiple HDF5 files into a single one just like combine_h5() but it actually copies the data.

Be aware, that the files are combined in the order as they are specified in ‘files’. If you want to make sure that they are in (temporally) increasing order, you have to sort the list accordingly.

Parameters:

fname (str) – The name of the output file (without the .h5 extension)
files (List[str]) – List of HDF5 files to be combined (without the .h5 extension)
src_dir (str) – The directory of the HDF5 files you wish to combine. Default: current directory
out_dir (str) – The directory where the output HDF5 file will be saved. Default: current directory
groups_merge (List[str]) – Groups in the HDF5 files you wish to combine. The function will loop through all the datasets within that group and combine them along the first dimension (for 1-dimensional data), or along the second dimension (for 2- and 3-dimensional data). This is because the datasets have shape (events, data) or (channels, events, data) and we want to append them along the events-dimension.
groups_include (List[str]) – Groups you just wish to copy from one representative file, i.e. the data will not be appended. This can be useful for SEVs or optimum filter, etc.
extend_hours (bool) – If True, the hours dataset of all groups is updated in the final file such that it does not restart at 0 after every file but continuously increases. This requires the existence of datasets event, time_s, and time_mus in the respective groups.

cait.data.merge_h5_sets(path_h5_a, path_h5_b, path_h5_merged, groups_to_merge=['events', 'testpulses', 'noise', 'controlpulses', 'stream'], sets_to_merge=['event', 'mainpar', 'true_ph', 'true_onset', 'of_ph', 'sev_fit_par', 'sev_fit_rms', 'hours', 'labels', 'testpulseamplitude', 'time_s', 'time_mus', 'pulse_height', 'pca_error', 'pca_projection', 'tp_hours', 'tp_time_mus', 'tp_time_s', 'tpa', 'trigger_hours', 'trigger_time_mus', 'trigger_time_s'], concatenate_axis=[1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], groups_from_a=[], groups_from_b=[], a_name='keep', b_name='keep', continue_hours=True, second_file_start=None, keep_original_files=True, verb=False, trace=False)[source]

Merges two HDF5 files. This function is deprecated! Use combine_h5() or merge_h5() instead.

Parameters:

path_h5_a (string) – Path to the first file to merge.
path_h5_b (string) – Path to the other file to merge.
path_h5_merged (string) – Path where the merged file is saved.
groups_to_merge (list of strings) – The groups that hold the arrays that we want to concatenate.
sets_to_merge (list of strings) – The sets that hold the arrays we want to concatenate, same sets for all groups.
concatenate_axis (list of ints) – The axis along which the arrays are concatenated. Each n’th index in this list corresponds to the n’th string in the sets_to_merge list. If -1, the set is originally a scalar and gets reshaped to a 1D array after merge.
groups_from_a (list of strings) – Which groups are copied from the first HDF5 set.
groups_from_b (list of strings) – Which groups are copied from the second HDF5 set.
a_name (string) – Type a name for the first HDF5 set to identify the data later on with the original data set. This name is stored in the origin data set in the corresponding group. If ‘keep’, the content of the origin data set from the HDF5 set is copied.
b_name (string) – Type a name for the second HDF5 set to identify the data later on with the original data set. This name is stored in the origin data set in the corresponding group. If ‘keep’, the content of the origin data set from the HDF5 set is copied.
continue_hours (bool) – If True, the value of the last hours in a is added to the hours in b.
second_file_start (float or None) – The hours value at which the second file starts. If this is not handed and continue_hours is activated, the value is extracted from the test pulses.
keep_original_files (bool) – If False, the original files are deleted after the merge.
verb (bool) – If True, verbal feedback about the process of the merge is given.
trace (bool) – Traces the memory and runtime consumption.

Deprecated since version 1.3.0: This will be removed in 2.0.0. Use ‘cait.data.combine_h5’ or ‘cait.data.merge_h5’, which let’s you combine more than two files at a time.

cait.data.noise_function(nps, force_zero=True, size=None)[source]

Simulates the function f from CC-Noise Algo with a given Noise Power Spectrum (NPS) see arXiv:1006.3289, eq (4) and (5)

Parameters:

f (real valued 1D array of size N/2 + 1) – The noise power spectrum.
force_zero (bool) – Force the zero coefficient (constant offset) of the NPS to zero.
size (int) – The number of baselines to simulate. If None, only one is simulated.

Returns:

Noise Baselines.

Return type:

real valued 1D array of size N, if size is None; else 2D

cait.data.read_xy_file(filepath, skip_lines=4, separator='\t')[source]

Reads a txt file in the XY format.

The XY format is intended for plotting and data sharing. It has in the first line the title of the plot, followed by one line per axis with the axis label. Then the data follows, with one data point per line and one axis in each columns, seperated by tabulators.

Parameters:

filepath (string) – The path from where we read the file.
skip_lines (int) – The number of lines at beginning of the file that contain no data (title, axis labels, …).
separator (string) – The unicode of the column separator, default tabulator.

Returns:

The data array.

Return type:

array

cait.data.write_xy_file(filepath, data, title, axis)[source]

Writes a txt file in the XY format.

The XY format is intended for plotting and data sharing. It has in the first line the title of the plot, followed by one line per axis with the axis label. Then the data follows, with one data point per line and one axis in each columns, seperated by tabulators.

Parameters:

filepath (string) – The path where we write the file.
data (array of shape (nmbr dimensions, nmbr data points)) – The data that we want to plot
title (string) – The title of the data plot.
axis (list of strings) – The axis labels.