Analysis of the optimization results¶

We can analyse the data stored in optimization_results.csv and gp_model.pkl to find the similarities and differences between optimization methods, as well as check their performances in terms of metric improvements, exploration and exploitation of the hyperparameter space, rate of convergence, improvement trends, etc.

In this example we have performed the optimization of the MST-NectarCam telescope using three different optimization algorithms: random search, tree parzen estimators and gaussian processes. This way, we have obtained from CTLearn Optimizer three optimization_results files, that have been renamed to mstn_random.csv, mstn_tpe.csv and mstn_gp.csv; and one gp_model.pkl file. The main configuration of the optimization runs is show below:

Metric optimized: auc
Number of random evaluations: 20
Model optimized: single_tel
Iterations performed: 100
Space of hyperparameters to optimize:
- layer1_filters: [16, 64]
- layer2_filters: [16, 128]
- layer3_filters: [16, 256]
- layer4_filters: [16, 512]
- layer1_kernel: [2,10]
- layer2_kernel: [2,10]
- layer3_kernel: [2,10]
- layer4_kernel: [2,10].

Data loading and preprocessing¶

On this section we are going to import the required packages and load and preprocess the data for further analysis.

[3]:

# import the required packages
import pickle
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from skopt.plots import plot_evaluations, plot_objective
from ctlearn_optimizer.auxiliar_functions import df2result, plot_convergence


# load the optimization_results files as dataframes
mstn_random = pd.read_csv('mstn_random.csv')
mstn_tpe = pd.read_csv('mstn_tpe.csv')
mstn_gp = pd.read_csv('mstn_gp.csv')

# load the gp_model
gp_model = pickle.load(open('gp_model.pkl', 'rb'))


# prepare the data for plotting by converting dataframes to scipy.optimize.OptimizeResult format
# metric_col (str) – Name of the metric to optimize column
# param_cols (list) – Names of the hyperparameter columns

mstn_random_result = df2result(mstn_random,
                       metric_col = 'auc_val',
                       param_cols = ['layer1_filters', 'layer1_kernel', 'layer2_filters', 'layer2_kernel',
                                    'layer3_filters', 'layer3_kernel', 'layer4_filters', 'layer4_kernel'])

mstn_tpe_result = df2result(mstn_tpe,
                       metric_col = 'auc_val',
                       param_cols = ['layer1_filters', 'layer1_kernel', 'layer2_filters', 'layer2_kernel',
                                    'layer3_filters', 'layer3_kernel', 'layer4_filters', 'layer4_kernel'])
mstn_gp_result = df2result(mstn_gp,
                       metric_col = 'auc_val',
                       param_cols = ['layer1_filters', 'layer1_kernel', 'layer2_filters', 'layer2_kernel',
                                    'layer3_filters', 'layer3_kernel', 'layer4_filters', 'layer4_kernel'])

Fast review of the results¶

We can print the first rows of the dataframes sorted by the metric values in order to check the best results obtained by CTLearn Optimizer

[4]:

# show all the columns
pd.set_option('display.max_columns', None)

[5]:

# random search
mstn_random_sorted = mstn_random.sort_values('auc_val', ascending = False)
mstn_random_sorted.head(5)

[5]:

	loss	iteration	layer1_filters	layer2_filters	layer3_filters	layer4_filters	layer1_kernel	layer2_kernel	layer3_kernel	layer4_kernel	auc_val	acc_val	acc_gamma_val	acc_proton_val	loss_val	run_time
15	0.107249	16	43.0	121.0	249.0	494.0	5.0	6.0	6.0	9.0	0.892751	0.803882	0.813764	0.794211	0.817045	4713.880727
65	0.109556	66	46.0	111.0	130.0	328.0	7.0	3.0	10.0	9.0	0.890444	0.798864	0.769322	0.827776	0.833634	3441.692499
19	0.109655	20	61.0	87.0	233.0	459.0	5.0	9.0	5.0	5.0	0.890345	0.801940	0.804662	0.799276	0.827184	3829.715509
20	0.110015	21	35.0	72.0	166.0	436.0	7.0	7.0	10.0	5.0	0.889985	0.801536	0.815217	0.788147	0.827923	3150.214954
70	0.110091	71	54.0	116.0	136.0	342.0	7.0	9.0	4.0	8.0	0.889909	0.800413	0.859001	0.743075	0.831390	4065.117532

[6]:

# tree parzen estimators
mstn_tpe_sorted = mstn_tpe.sort_values('auc_val', ascending = False)
mstn_tpe_sorted.head(5)

[6]:

	iteration	loss	layer1_filters	layer1_kernel	layer2_filters	layer2_kernel	layer3_filters	layer3_kernel	layer4_filters	layer4_kernel	auc_val	acc_val	acc_gamma_val	acc_proton_val	loss_val	run_time
83	84	0.107111	58.0	5.0	109.0	8.0	183.0	8.0	315.0	4.0	0.892889	0.801051	0.858612	0.747094	0.821363	3167.990778
82	83	0.107507	59.0	6.0	124.0	7.0	189.0	10.0	207.0	4.0	0.892493	0.797940	0.868293	0.731991	0.827550	3159.622981
71	72	0.107807	62.0	6.0	94.0	6.0	204.0	10.0	130.0	4.0	0.892193	0.802589	0.827305	0.779421	0.819639	3043.263056
63	64	0.108042	49.0	5.0	109.0	5.0	204.0	6.0	414.0	4.0	0.891958	0.802729	0.824535	0.782288	0.819942	6947.555477
33	34	0.108045	44.0	10.0	120.0	6.0	194.0	10.0	133.0	10.0	0.891955	0.802461	0.792506	0.811793	0.825479	5820.490232

[7]:

# gaussian processes
mstn_gp_sorted = mstn_gp.sort_values('auc_val', ascending = False)
mstn_gp_sorted.head(5)

[7]:

	loss	iteration	layer1_filters	layer2_filters	layer3_filters	layer4_filters	layer1_kernel	layer2_kernel	layer3_kernel	layer4_kernel	auc_val	acc_val	acc_gamma_val	acc_proton_val	loss_val	run_time
92	0.107388	93	64.0	128.0	256.0	239.0	10.0	5.0	10.0	10.0	0.892612	0.800930	0.777606	0.823755	0.825392	3585.451714
49	0.108346	50	30.0	122.0	253.0	292.0	6.0	6.0	7.0	10.0	0.891654	0.802131	0.800667	0.803563	0.820788	3474.704604
84	0.108458	85	64.0	128.0	256.0	512.0	2.0	6.0	10.0	8.0	0.891542	0.802950	0.844383	0.762401	0.822343	4077.138633
93	0.108585	94	64.0	128.0	256.0	512.0	10.0	6.0	10.0	8.0	0.891415	0.802636	0.828790	0.777040	0.819367	4140.540171
20	0.108707	21	64.0	128.0	256.0	512.0	10.0	10.0	10.0	10.0	0.891293	0.800099	0.774065	0.825577	0.831538	4840.163612

Visualization utilities¶

The following plots are useful to compare the behaviour of the optimization algorithms

Plot metric values versus iteration¶

In the graphic below we can see that the trend of the metric values obtained by the algorithms is positive in the tree parzen estimators and gaussian processes cases, while it is negative for the random search algorithm. This could be because the first two algorithms are taking advantage of an improved surrogate that gets closer to the actual objective function as the optimization process moves on; while the third method is randomly sampling the search space.

[18]:

sns.regplot(mstn_random['iteration'], mstn_random['auc_val'], label = 'mstn_random')
sns.regplot(mstn_tpe['iteration'], mstn_tpe['auc_val'], label = 'mstn_tpe')
sns.regplot(mstn_gp['iteration'], mstn_gp['auc_val'], label = 'mstn_gp')

plt.xlabel('Iteration');
plt.ylabel('ROC AUC' );
plt.title('Validation ROC AUC versus iteration'), plt.legend(loc="best");

Plot convergence¶

It is interesting to visualize the progress of the optimization algorithms by showing the best to date result at each iteration, this way we can know which algorithm is the best in terms of rate of convergence.

[8]:

# results = list of (name, results) tuples
results = [('mstn_random', mstn_random_result), ('mstn_tpe', mstn_tpe_result),('mstn_gp', mstn_gp_result)]
plt.figure(figsize=(14,7))
plot_convergence(*results)
plt.show()

Plot search evolution¶

By using this plot we can see the evolution of the search performed by the optimization algorithm, we see the histograms of explored values and, for each pair of hyperparameters, the scatter plot of sampled values is plotted with the evolution represented by color, from blue to yellow.

In the graphics below we can see that each optimization method converge to certain parts of space that are considered more promising by the algorithm and, therefore, are explored further. These regions are different for each algorithm.

[27]:

# random search
plot_evaluations(mstn_random_result)
plt.show()

[28]:

# tree parzen estimators
plot_evaluations(mstn_tpe_result)
plt.show()

[29]:

# gaussian processes
plot_evaluations(mstn_gp_result)
plt.show()

Plot objective¶

The gaussian processes algorithm allows to plot the pairwise partial dependece of the objective function for each dimension of the space of hyperparameters.

By making these graphs we can gain intuition into the objective function sensitivity with respect to hyperparameters. This way we can decide which parts of the space may require more fine-grained search and which hyperparameters barely affect the score and can potentially be dropped from the search.

For example, in the graphic below we can see from the charts that the changes in the kernel size of the first layer don’t affect the metric score.

[11]:

# dimensions = list of hyperparameter names
plot_objective(gp_model, dimensions = ['layer1_filters', 'layer2_filters', 'layer3_filters', 'layer4_filters',
                                       'layer1_kernel','layer2_kernel','layer3_kernel','layer4_kernel'])
plt.show()