cait.EvaluationTools¶

class cait.EvaluationTools[source]¶

Bases: object

The Class EvaluationTools provides a easier way to handle the data. Especially for hdf5-files as widely used in this library. Besides that it also provides a easy way to organize predictions as well as visualize the results via confusion matrix and TSNE or PCA plots to depict high dimensional data. How it is used is best seen in the tutorial ‘Machine Learning-based Event Selection’.

add_events_from_file(file, channel, which_data='mainpar', all_labeled=False, only_idx=None, force_add=False, verb=False)[source]¶

Reads in a labels, data from a channel of a given hdf5 file and adds this data to the properties

Parameters

file (string) – Path to hd5 file from which the data should be read.
channel (int) – The number of the channel.
which_data (string or 2D array) – Default ‘mainpar’,select which data should be used as data (e.g. mainparameters, additional mainparameters, timeseries) if set to none then data is keept empty. It is also possible to set this paramater to an array of the length of the labels which are then stored in data.
all_labeled (boolean) – Default False, flag is set, include exactly the events that are labeled.
only_idx (list of int) – Indices only include in the dataset then only use these.
force_add (boolean) – Default False, lets you add a file twice when set to True.
verb (boolean) – Default False, if True additional messages are printed.

add_prediction(pred_method, pred, true_labels=False, verb=False)[source]¶

Adds a new prediction method with labels to the predictions property.

Parameters

pred_method (string) – The name of the model that made the predictions.
pred (list of int) – Contains the predicted labels for the events.
true_labels (boolean) – Default False, set to True when predicted labels correspond to actual label numbers (as in superviced learning methods).
verb (boolean) – Default False, if True addtional output is printed to the console.

confusion_matrix_pred(pred_method, what='all', rotation_xticklabels=0, force_xlabelnbr=False, figsize=None, fig_title=False, verb=False)[source]¶

Plots a confusion matrix to better visualize which labels are better predicted by a certain prediction method. In the (i,j) position the number of i labels which are predicted a j are written. When clicking on a matrix element the event number and from which file is printed out in the console

Parameters

pred_method (str) – Required. Name of the predictions method.
what (str) – Optional, default all. Test or train data or all.
rotation_xticklabels (int) – Optional, default 0. Lets you rotate the x tick labels.
force_xlabelnbr (bool) – Optional, default False. Uses the number instead of the labels for better readability.
figsize (tuple) – Optional, default None. Changes the overall figure size.
verb (bool) – Optional, default False. If True additional information is printed on the console.

convert_to_colors(label_nbrs, verb=False)[source]¶

Converts given label numbers into colors for matplotlib.

Parameters

label_nbrs (list of int) – Contains the label numbers.
verb (boolean) – Default False, if True addtional output is printed to the console.

Returns

Same color for the same labels.

Return type

list of colors

convert_to_labels(label_nbrs, verb=False)[source]¶

Converts given label numbers to the corresponding label.

Parameters

label_nbrs (list of int) – Contains the label numbers.
verb (boolean) – Default False, if True addtional output is printed to the console.

Returns

Labels which correspond to the label numbers.

Return type

list

convert_to_labels_colors(label_nbrs, return_legend=False, verb=False)[source]¶

Converts given label numbers into colors for matplotlib.

Parameters

label_nbrs (list of int) – Contain the label numbers.
return_legend (boolean) – Default False, if True a legend in format for matplotlib is returned additionally.
verb (boolean) – Default False, if True addtional output is printed to the console.

Returns

List of colors, optional legend for matplotlib.

Return type

list of labels

correctly_labeled_events_per_pulse_height(pred_method, what='all', bin_size=4, ncols=2, extend_plot=False, figsize=None, verb=False)[source]¶

Plots the number of correctly predicted labels over volts (pulse height) for events.

Parameters

pred_method (str) – Required. Name of the predictions method.
what (str) – Optional, default all. Test or train data or all.
bin_size (int) – Optional, default 4. Bin size for calculating the average.
ncols (int) – Optional, default 2. Number of plots side by side.
extend_plot (bool) – Optional, default False. If True x limits is set to the same for all subplots.
figsize (tuple) – Optional, default None. Changes the overall figure size.
verb (bool) – Optional, default False. If True additional information is printed on the console.

correctly_labeled_per_v(pred_method, what='all', bin_size=4, ncols=2, figsize=None, extend_plot=False, verb=False)[source]¶

Plots the number of correctly predicted labels over volts (pulse height) for every label.

Parameters

pred_method (str) – Required. Name of the predictions method.
what (str) – Optional, default all. Test or train data or all.
bin_size (int) – Optional, default 4. Bin size for calculating the average.
ncols (int) – Optional, default 2. Number of plots side by side.
figsize (tuple) – Optional, default None. Size of the figure for matplotlib.
extend_plot (bool) – Optional, default False. If True x limits is set to the same for all subplots.
verb (bool) – Optional, default False. If True additional information is printed on the console.

events_saturated_histogram(figsize=None, bins='auto', verb=False, ylog=False)[source]¶

Plots a histogram for all event pulses and strongly saturated event pulses in a single plot.

Parameters

figsize (tuple) – Optional, default None. Changes the overall figure size.
bins (int) – Optional, default auto. Bins for the histograms.
ylog (bool) – Optional, default False. If True the y axis is in log scale.

gen_features()[source]¶: Normalizes the data and saves it into features.

get_data(what='all', verb=False)[source]¶

Getter-function which returns the data

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all data, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of data depending on the parameter what

Return type

array

get_event_nbrs(what='all', verb=False)[source]¶

Getter-function which returns the event_nbrs

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all event_nbrs, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of event_nbrs depending on the parameter what

Return type

array

get_events(what='all', verb=False)[source]¶

Getter-function which returns the events

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all events, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of events depending on the parameter what

Return type

array

get_features(what='all', verb=False)[source]¶

Getter-function which returns the features

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all features, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of features depending on the parameter what

Return type

array

get_file_nbrs(what='all', verb=False)[source]¶

Getter-function which returns the file_nbrs

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all file_nbrs, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of file_nbrs depending on the parameter what

Return type

array

get_filepaths(what='all', verb=False)[source]¶

Getter-function which returns the filepaths for every event.

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all filepaths, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of labels_in_color depending on the parameter what

Return type

array

get_label_nbrs(what='all', verb=False)[source]¶

Getter-function which returns the label_nbrs

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all label_nbrs, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of label_nbrs depending on the parameter what

Return type

array

get_labels_in_color(what='all', verb=False)[source]¶

Getter-function which returns the labels_in_color, which can be usefull for plotting.

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all labels_in_color, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of labels_in_color depending on the parameter what

Return type

array

get_mainpar(what='all', verb=False)[source]¶

Getter-function which returns the mainpar

Parameters

what (str, optional) – Defines what should be returned, which is either ‘all’ for all mainpar, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of mainpar depending on the parameter what

Return type

array

get_pred(pred_method, what='all', verb=False)[source]¶

Getter-function which returns the prediction of a method. The prediction has to be added at first and can be selected by providing the same abbreviation while adding it. The selection is done via the pred_method parameter.

Parameters

pred_method (str) – Parameter to select the prediction from a certain prediction method, which must be the same string as when added.
what (str, optional) – Defines what should be returned, which is either ‘all’ for all prediction, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of prediction depending on the chosen prediction methode (pred_meth) and on the parameter what

Return type

array

get_pred_in_color(pred_method, what='all', verb=False)[source]¶

Getter-function which returns the color coded prediction of a method. The prediction has to be added at first and can be selected by providing the same abbreviation while adding it. The selection is done via the pred_method parameter.

Parameters

pred_method (str) – Parameter to select the prediction from a certain prediction method get color coded, which must be the same string as when added.
what (str, optional) – Defines what should be returned, which is either ‘all’ for all color coded predicitons, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

Returns

Returns the part of the color coded predictions depending on the chosen prediction methode (pred_meth) and on the parameter what

Return type

array

get_pred_true_labels(pred_method)[source]¶

Returns if the labels in the prediction correspond to the actual labels, as in the dict self.labels.

Parameters: pred_method (str) – Abbreviation of the chosen prediction method
Returns: Return True or False if the labels of the chosen prediction method correspond to the actual labels
Return type: bool

get_test(verb=False)[source]¶

Getter-function which returns the data from the test set in the following order: 1. event_nbrs 2. data 3. features 4. file_nbrs 5. label_nbrs

Parameters: verb (bool, optional) – enables addiational output which can be usefull for debugging, defaults to False
Returns: tuple of size 6, where every entry in this tuple is an array
Return type: tuple

get_train(verb=False)[source]¶

Getter-function which returns the data from the training set in the following order: 1. event_nbrs 2. data 3. features 4. file_nbrs 5. label_nbrs

Parameters: verb (bool, optional) – enables addiational output which can be usefull for debugging, defaults to False
Returns: tuple of size 6, where every entry in this tuple is an array
Return type: tuple

plot_event(index, what='all', plot_mainpar=False, text=None, verb=False)[source]¶

Plots a single event from an given index..

Parameters

index (int) – The event index which should be plotted in respect to the what parameter
what (str, optional) – Defines what should be returned, which is either ‘all’ for all filepaths, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
plot_mainpar (str, optional) – If True, it adds main parameters to the plot, default False
text (str, optional) – Adds text to the plot, default None
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False

plot_labels_distribution(figsize=None)[source]¶

Uses a bar graph to visualize how often a label occures in the dataset

Parameters: figsize (tuple) – Optional, default None. Changes the overall figure size.

plt_pred_with_pca(pred_methods, xy_comp=(1, 2), what='all', plt_labels=True, figsize=None, as_cols=False, rdseed=None, dot_size=5, verb=False)[source]¶

Plots data with PCE when given a one or a list of predictions method to compare different labels.

Parameters

pred_methods (list of str) – The prediction method that should be used.
xy_comp (tuple) – Optional, default (1,2) Select with pc’s are used for x and y axis.
what (str) – Required, which data is plotted. Options are ‘all’, ‘test’, ‘train’.
plt_labels (bool) – Adds subplot with labels.
figsize (tuple) – Sets figure size of plot.
as_cols (bool) – Optional, default False. If True subplots are arranged in columns.
rdseed (int) – Optional, default None. Random seed for numpy random.
dot_size (int) – Optional, default 5. Size of the point in the scatter plot.
verb (bool) – Optional, default False Additional output is printed.

plt_pred_with_pca_plotly(pred_methods, xy_comp=(1, 2), what='all', rdseed=None, verb=False)[source]¶

Plots data with PCE when given a one or a list of predictions method to compare different labels.

Parameters

pred_methods (str) – Required. Prediction method that should be used.
xy_comp (tuple) – Optional, default (1,2). Select with pc’s are used for x and y axis.
what (str) – Optional, default ‘all’. Which data is plotted.
rdseed (int) – Optional, default None. Random seed for numpy random.
verb (bool) – Optional, default False. Additional output is printed.

plt_pred_with_tsne(pred_methods, what='all', plt_labels=True, figsize=None, perplexity=30, as_cols=False, rdseed=None, dot_size=5, verb=False)[source]¶

Plots data with TSNE when given a one or a list of predictions method to compare different labels.

Parameters

pred_methods (list of str) – The prediction method that should be used.
what (str) – Required, which data is plotted. Options are ‘all’, ‘test’, ‘train’.
plt_labels (bool) – Adds subplot with labels.
figsize (tuple) – Sets figure size of plot.
perplexity (int) – Optional, default 30. The perplexity parameter for TSNE.
as_cols (bool) – Optional, default False. If True subplots are arranged in columns.
rdseed (int) – Optional, default None. Random seed for numpy random.
dot_size (int) – Optional, default 5. Size of the point in the scatter plot.
verb (bool) – Optional, default False Additional output is printed.

plt_pred_with_tsne_plotly(pred_methods, what='all', perplexity=30, rdseed=None, verb=False)[source]¶

Plots data with TSNE when given a one or a list of predictions method to compare different labels.

Parameters

pred_methods (str) – Required. Prediction method that should be used.
what (str) – Required. Which data is plotted, ‘all’, ‘test’ or ‘train’.
perplexity (int) – Optional, default 30. Perplexity parameter for TSNE.
rdseed (int) – Optional, default None. Random seed for numpy random.
verb (bool) – Optional, default False. Additional output is printed.

pulse_height_histogram(ncols=2, extend_plot=False, figsize=None, bins='auto', verb=False)[source]¶

Plots a histogram for all labels of the pulse hights in different subplots.

Parameters

ncols (int) – Optional, default 2. Number of plots side by side.
extend_plot (bool) – Optional, default False. Sets the x axis of all histograms to the same limits.
figsize (tuple) – Optional, default None. Changes the overall figure size.
bins (int) – Optional, default auto. Bins for the histograms.
verb (bool) – Optional, default False. Ouputs additional information.

save_prediction(pred_method, path, fname, channel)[source]¶

Saves the predictions as a CSV file

Parameters

pred_method (string) – The name of the model that made the predictions.
path (string) – Path to the folder that should contain the predictions, e.g. ‘predictions/’ leads to correct directory.
fname (string) – The name of the file, e.g. “bck_001”.
channel (int) – The number of the channel in the module, e.g. Phonon 0, Light 1.
verb (boolean) – Default False, if True additional ouput is printed.

set_data(data)[source]¶

Replaces mainparameters or timeseries with a chosen data set of data.

Parameters: data (array) – Dataset which is analysed.

set_features(features)[source]¶

If the StandardScaler is not used features have to be set manually, e.g. by using this function.

Parameters: features (array) – Manual generated features.

set_scaler(scaler)[source]¶

Sets the scaler for generating the features from the data set. Per default the sklearn.preprocessing.StandardScaler() is used.

Parameters: scaler (object) – Scaler for normalizing the data.

split_test_train(test_size, verb=False)[source]¶

Seperates the dataset into a training set and a test set with the size determined by the input test_size in percent.

Parameters

test_size (float in (0,1)) – Size of the test set.
verb (boolean) – Default False, if True additional output is printed.