cait.EvaluationTools¶
-
class
cait.EvaluationTools[source]¶ Bases:
objectThe Class EvaluationTools provides a easier way to handle the data. Especially for hdf5-files as widely used in this library. Besides that it also provides a easy way to organize predictions as well as visualize the results via confusion matrix and TSNE or PCA plots to depict high dimensional data. How it is used is best seen in the tutorial ‘Machine Learning-based Event Selection’.
-
add_events_from_file(file, channel, which_data='mainpar', all_labeled=False, only_idx=None, force_add=False, verb=False)[source]¶ Reads in a labels, data from a channel of a given hdf5 file and adds this data to the properties
- Parameters
file (string) – Path to hd5 file from which the data should be read.
channel (int) – The number of the channel.
which_data (string or 2D array) – Default ‘mainpar’,select which data should be used as data (e.g. mainparameters, additional mainparameters, timeseries) if set to none then data is keept empty. It is also possible to set this paramater to an array of the length of the labels which are then stored in data.
all_labeled (boolean) – Default False, flag is set, include exactly the events that are labeled.
only_idx (list of int) – Indices only include in the dataset then only use these.
force_add (boolean) – Default False, lets you add a file twice when set to True.
verb (boolean) – Default False, if True additional messages are printed.
-
add_prediction(pred_method, pred, true_labels=False, verb=False)[source]¶ Adds a new prediction method with labels to the predictions property.
- Parameters
pred_method (string) – The name of the model that made the predictions.
pred (list of int) – Contains the predicted labels for the events.
true_labels (boolean) – Default False, set to True when predicted labels correspond to actual label numbers (as in superviced learning methods).
verb (boolean) – Default False, if True addtional output is printed to the console.
-
confusion_matrix_pred(pred_method, what='all', rotation_xticklabels=0, force_xlabelnbr=False, figsize=None, fig_title=False, verb=False)[source]¶ Plots a confusion matrix to better visualize which labels are better predicted by a certain prediction method. In the (i,j) position the number of i labels which are predicted a j are written. When clicking on a matrix element the event number and from which file is printed out in the console
- Parameters
pred_method (str) – Required. Name of the predictions method.
what (str) – Optional, default all. Test or train data or all.
rotation_xticklabels (int) – Optional, default 0. Lets you rotate the x tick labels.
force_xlabelnbr (bool) – Optional, default False. Uses the number instead of the labels for better readability.
figsize (tuple) – Optional, default None. Changes the overall figure size.
verb (bool) – Optional, default False. If True additional information is printed on the console.
-
convert_to_colors(label_nbrs, verb=False)[source]¶ Converts given label numbers into colors for matplotlib.
- Parameters
label_nbrs (list of int) – Contains the label numbers.
verb (boolean) – Default False, if True addtional output is printed to the console.
- Returns
Same color for the same labels.
- Return type
list of colors
-
convert_to_labels(label_nbrs, verb=False)[source]¶ Converts given label numbers to the corresponding label.
- Parameters
label_nbrs (list of int) – Contains the label numbers.
verb (boolean) – Default False, if True addtional output is printed to the console.
- Returns
Labels which correspond to the label numbers.
- Return type
list
-
convert_to_labels_colors(label_nbrs, return_legend=False, verb=False)[source]¶ Converts given label numbers into colors for matplotlib.
- Parameters
label_nbrs (list of int) – Contain the label numbers.
return_legend (boolean) – Default False, if True a legend in format for matplotlib is returned additionally.
verb (boolean) – Default False, if True addtional output is printed to the console.
- Returns
List of colors, optional legend for matplotlib.
- Return type
list of labels
-
correctly_labeled_events_per_pulse_height(pred_method, what='all', bin_size=4, ncols=2, extend_plot=False, figsize=None, verb=False)[source]¶ Plots the number of correctly predicted labels over volts (pulse height) for events.
- Parameters
pred_method (str) – Required. Name of the predictions method.
what (str) – Optional, default all. Test or train data or all.
bin_size (int) – Optional, default 4. Bin size for calculating the average.
ncols (int) – Optional, default 2. Number of plots side by side.
extend_plot (bool) – Optional, default False. If True x limits is set to the same for all subplots.
figsize (tuple) – Optional, default None. Changes the overall figure size.
verb (bool) – Optional, default False. If True additional information is printed on the console.
-
correctly_labeled_per_v(pred_method, what='all', bin_size=4, ncols=2, figsize=None, extend_plot=False, verb=False)[source]¶ Plots the number of correctly predicted labels over volts (pulse height) for every label.
- Parameters
pred_method (str) – Required. Name of the predictions method.
what (str) – Optional, default all. Test or train data or all.
bin_size (int) – Optional, default 4. Bin size for calculating the average.
ncols (int) – Optional, default 2. Number of plots side by side.
figsize (tuple) – Optional, default None. Size of the figure for matplotlib.
extend_plot (bool) – Optional, default False. If True x limits is set to the same for all subplots.
verb (bool) – Optional, default False. If True additional information is printed on the console.
-
events_saturated_histogram(figsize=None, bins='auto', verb=False, ylog=False)[source]¶ Plots a histogram for all event pulses and strongly saturated event pulses in a single plot.
- Parameters
figsize (tuple) – Optional, default None. Changes the overall figure size.
bins (int) – Optional, default auto. Bins for the histograms.
ylog (bool) – Optional, default False. If True the y axis is in log scale.
-
get_data(what='all', verb=False)[source]¶ Getter-function which returns the data
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all data, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of data depending on the parameter what
- Return type
array
-
get_event_nbrs(what='all', verb=False)[source]¶ Getter-function which returns the event_nbrs
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all event_nbrs, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of event_nbrs depending on the parameter what
- Return type
array
-
get_events(what='all', verb=False)[source]¶ Getter-function which returns the events
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all events, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of events depending on the parameter what
- Return type
array
-
get_features(what='all', verb=False)[source]¶ Getter-function which returns the features
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all features, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of features depending on the parameter what
- Return type
array
-
get_file_nbrs(what='all', verb=False)[source]¶ Getter-function which returns the file_nbrs
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all file_nbrs, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of file_nbrs depending on the parameter what
- Return type
array
-
get_filepaths(what='all', verb=False)[source]¶ Getter-function which returns the filepaths for every event.
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all filepaths, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of labels_in_color depending on the parameter what
- Return type
array
-
get_label_nbrs(what='all', verb=False)[source]¶ Getter-function which returns the label_nbrs
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all label_nbrs, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of label_nbrs depending on the parameter what
- Return type
array
-
get_labels_in_color(what='all', verb=False)[source]¶ Getter-function which returns the labels_in_color, which can be usefull for plotting.
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all labels_in_color, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of labels_in_color depending on the parameter what
- Return type
array
-
get_mainpar(what='all', verb=False)[source]¶ Getter-function which returns the mainpar
- Parameters
what (str, optional) – Defines what should be returned, which is either ‘all’ for all mainpar, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of mainpar depending on the parameter what
- Return type
array
-
get_pred(pred_method, what='all', verb=False)[source]¶ Getter-function which returns the prediction of a method. The prediction has to be added at first and can be selected by providing the same abbreviation while adding it. The selection is done via the pred_method parameter.
- Parameters
pred_method (str) – Parameter to select the prediction from a certain prediction method, which must be the same string as when added.
what (str, optional) – Defines what should be returned, which is either ‘all’ for all prediction, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of prediction depending on the chosen prediction methode (pred_meth) and on the parameter what
- Return type
array
-
get_pred_in_color(pred_method, what='all', verb=False)[source]¶ Getter-function which returns the color coded prediction of a method. The prediction has to be added at first and can be selected by providing the same abbreviation while adding it. The selection is done via the pred_method parameter.
- Parameters
pred_method (str) – Parameter to select the prediction from a certain prediction method get color coded, which must be the same string as when added.
what (str, optional) – Defines what should be returned, which is either ‘all’ for all color coded predicitons, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
- Returns
Returns the part of the color coded predictions depending on the chosen prediction methode (pred_meth) and on the parameter what
- Return type
array
-
get_pred_true_labels(pred_method)[source]¶ Returns if the labels in the prediction correspond to the actual labels, as in the dict self.labels.
- Parameters
pred_method (str) – Abbreviation of the chosen prediction method
- Returns
Return True or False if the labels of the chosen prediction method correspond to the actual labels
- Return type
bool
-
get_test(verb=False)[source]¶ Getter-function which returns the data from the test set in the following order: 1. event_nbrs 2. data 3. features 4. file_nbrs 5. label_nbrs
- Parameters
verb (bool, optional) – enables addiational output which can be usefull for debugging, defaults to False
- Returns
tuple of size 6, where every entry in this tuple is an array
- Return type
tuple
-
get_train(verb=False)[source]¶ Getter-function which returns the data from the training set in the following order: 1. event_nbrs 2. data 3. features 4. file_nbrs 5. label_nbrs
- Parameters
verb (bool, optional) – enables addiational output which can be usefull for debugging, defaults to False
- Returns
tuple of size 6, where every entry in this tuple is an array
- Return type
tuple
-
plot_event(index, what='all', plot_mainpar=False, text=None, verb=False)[source]¶ Plots a single event from an given index..
- Parameters
index (int) – The event index which should be plotted in respect to the what parameter
what (str, optional) – Defines what should be returned, which is either ‘all’ for all filepaths, ‘test’ for the test set and ‘train’ for the trainings set , defaults to ‘all’
plot_mainpar (str, optional) – If True, it adds main parameters to the plot, default False
text (str, optional) – Adds text to the plot, default None
verb (bool, optional) – Enables addiational output which can be usefull for debugging, defaults to False
-
plot_labels_distribution(figsize=None)[source]¶ Uses a bar graph to visualize how often a label occures in the dataset
- Parameters
figsize (tuple) – Optional, default None. Changes the overall figure size.
-
plt_pred_with_pca(pred_methods, xy_comp=(1, 2), what='all', plt_labels=True, figsize=None, as_cols=False, rdseed=None, dot_size=5, verb=False)[source]¶ Plots data with PCE when given a one or a list of predictions method to compare different labels.
- Parameters
pred_methods (list of str) – The prediction method that should be used.
xy_comp (tuple) – Optional, default (1,2) Select with pc’s are used for x and y axis.
what (str) – Required, which data is plotted. Options are ‘all’, ‘test’, ‘train’.
plt_labels (bool) – Adds subplot with labels.
figsize (tuple) – Sets figure size of plot.
as_cols (bool) – Optional, default False. If True subplots are arranged in columns.
rdseed (int) – Optional, default None. Random seed for numpy random.
dot_size (int) – Optional, default 5. Size of the point in the scatter plot.
verb (bool) – Optional, default False Additional output is printed.
-
plt_pred_with_pca_plotly(pred_methods, xy_comp=(1, 2), what='all', rdseed=None, verb=False)[source]¶ Plots data with PCE when given a one or a list of predictions method to compare different labels.
- Parameters
pred_methods (str) – Required. Prediction method that should be used.
xy_comp (tuple) – Optional, default (1,2). Select with pc’s are used for x and y axis.
what (str) – Optional, default ‘all’. Which data is plotted.
rdseed (int) – Optional, default None. Random seed for numpy random.
verb (bool) – Optional, default False. Additional output is printed.
-
plt_pred_with_tsne(pred_methods, what='all', plt_labels=True, figsize=None, perplexity=30, as_cols=False, rdseed=None, dot_size=5, verb=False)[source]¶ Plots data with TSNE when given a one or a list of predictions method to compare different labels.
- Parameters
pred_methods (list of str) – The prediction method that should be used.
what (str) – Required, which data is plotted. Options are ‘all’, ‘test’, ‘train’.
plt_labels (bool) – Adds subplot with labels.
figsize (tuple) – Sets figure size of plot.
perplexity (int) – Optional, default 30. The perplexity parameter for TSNE.
as_cols (bool) – Optional, default False. If True subplots are arranged in columns.
rdseed (int) – Optional, default None. Random seed for numpy random.
dot_size (int) – Optional, default 5. Size of the point in the scatter plot.
verb (bool) – Optional, default False Additional output is printed.
-
plt_pred_with_tsne_plotly(pred_methods, what='all', perplexity=30, rdseed=None, verb=False)[source]¶ Plots data with TSNE when given a one or a list of predictions method to compare different labels.
- Parameters
pred_methods (str) – Required. Prediction method that should be used.
what (str) – Required. Which data is plotted, ‘all’, ‘test’ or ‘train’.
perplexity (int) – Optional, default 30. Perplexity parameter for TSNE.
rdseed (int) – Optional, default None. Random seed for numpy random.
verb (bool) – Optional, default False. Additional output is printed.
-
pulse_height_histogram(ncols=2, extend_plot=False, figsize=None, bins='auto', verb=False)[source]¶ Plots a histogram for all labels of the pulse hights in different subplots.
- Parameters
ncols (int) – Optional, default 2. Number of plots side by side.
extend_plot (bool) – Optional, default False. Sets the x axis of all histograms to the same limits.
figsize (tuple) – Optional, default None. Changes the overall figure size.
bins (int) – Optional, default auto. Bins for the histograms.
verb (bool) – Optional, default False. Ouputs additional information.
-
save_prediction(pred_method, path, fname, channel)[source]¶ Saves the predictions as a CSV file
- Parameters
pred_method (string) – The name of the model that made the predictions.
path (string) – Path to the folder that should contain the predictions, e.g. ‘predictions/’ leads to correct directory.
fname (string) – The name of the file, e.g. “bck_001”.
channel (int) – The number of the channel in the module, e.g. Phonon 0, Light 1.
verb (boolean) – Default False, if True additional ouput is printed.
-
set_data(data)[source]¶ Replaces mainparameters or timeseries with a chosen data set of data.
- Parameters
data (array) – Dataset which is analysed.
-
set_features(features)[source]¶ If the StandardScaler is not used features have to be set manually, e.g. by using this function.
- Parameters
features (array) – Manual generated features.
-