Table 3: For each method (default, CMA-ES, Genetransferer
or ANN), the percentage of instances on which this
method gave the best parameter set. Each cell shows 2 gures:
the rst one considers all occurrences of a method, no
matter if another method also lead an equivalent parameter
set, as good as the rst one. The second gures only considers
the rst method (from left to right) that discovered the
best parameter-set.
ations of CMA-ES, followed by one ANN training and one
Genetransferer. Due to the time constraints, only few iterations
of LaO were run. For example in domain Grid only
10 and CMA-ES was called 50 times in total.
The ANN had 3 fully connected layers, the layers had all
12 neurons, corresponding to the number or parameters and
features, respectively. Standard back-propagation algorithm
was used for learning (the default in FANN). In one iteration
of LaO, the ANN was only trained for 50 iterations (aka
epochs) without reseting the weights, in order to i- avoid
over-training, and ii- making a gradual transition from the
previous best parameter-set to the new best one, and eventually
try some intermediate values. Hence, in domain Grid,
over the 10 iterations of LaO, 500 iterations (epochs) of the
ANN were carried out in total. However, note that the best
parameters were trained with much fewer iterations, depending
on the time of their discovery. In the worst case, if the
best parameter was found in the last iteration of LaO, it was
trained for only 50 epochs (and not used anymore). This explains
why retraining is needed in the end.
A parameter-set in LaO may come from dierent sources,
namely it can be the default parameter-set, or coming from
CMA-ES, the Genetransferer, or as a result of applying
the trained ANN to the instance features. Table 3 shows
how each source contributes to the best overall parametersettings.
For each possible source, the rst number is the
ratio the source contributed to the best result if tie-breaks
are taken into account, the second number shows the same,
if only the rst best parameter-set is taken into account.
Note that the order of the calling of the "sources" in LaO
is the same as it is in the table: for example if CMA-ES
found a dierent parameter-settings with the same tness
than the default, this case is not included in the rst ratio,
but is in the second. Analyzing both numbers leads to
the following conclusions: for domain Mprime, the default
parameter-settings was the optimal for 45% of the instances.
However, only in 2% of the instances there was no other
parameter-setting found with the same quality. The reason
for this is that makespan values for Mprime are mostly single
digit numbers. Consequently, there is no possibility for a
small improvement, an improvement is more rare (55%) but
those improvement are naturally high in ratio. In the domain
Freecell, the share of ANN is quite high (18%), moreover
we can see that in most cases, the other sources did
not nd a parameter-set with the same performance (17%).
While Genetransferer in Freecell take equal share (18%) of
all the best parameters, but only a part of them (8%) were
unique. Note that CMA-ES was returning the rst hint in
each iteration and had 5 times more possibilities than the
ANN. Taking this into account, it is clear that both the
ANN and Genetransferer made an important contribution
to optimization.
LaO has been running for several weeks on a cluster. But
this cluster was not dedicated to our experiments, i.e. only
a small number of 4 or 8-core processors were available for
each domain on average. After stopping LaO, retraining was
made with 300 ANN epochs with the best data, because
the ANN's saved directly from LaO may be under-trained.
The MSE error of the ANN did not decrease using more
epochs, which indicates that 300 iterations are enough at
least for this amount of data and for this size of the ANN.
Tests with 1000 iterations did not produce better results and
neither training the ANN uniquely with the rst found best
parameters.
The controlled parameters of DaE are described in table 2.
For a detailed description of these parameters, see [4]. The
feature-set consists of 12 features. The rst 5 features are
computed from the domain le, after the initial grounding
of YAHSP: number of
uents, goals, predicates, objects and
types. One further feature we think could even be more
important is called mutex-density, which is the number of
mutexes divided by the number of all
uent-pairs. Since
mutexes are kind of obstacles in planning, higher density
indicates more diculty in nding the solution. We also
kept 6 less important features: number of lines, words and
byte-count - obtained by the linux command "wc" - of the
instance and the domain le. These features were kept only
for historical reasons: they were used in the beginning as
some "dummy" features.
Since testing was also carried out on the cluster, the termination
criterion for testing was also the number of evaluations
for each instance. For evaluation the quality-improvement
the quality-ratio metric dened in IPC competitions was
used. The baseline qualities come from the default parametersetting.
The ratio of the tness value for the default parameter
and the tuned parameter was computed and average
was taken over the instances in the train or test-set.
Q =
Fitnessbaseline
Fitnesstuned
(2)
Note that there was no unsolved instance in the training set,
because they were dropped from the experiment if they were
not solved with the default parameters.
Table 1 presents several quality-improvement ratios. Label
"in LaO"means that the best found parameter is compared
to the default. By denition, this ratio can never be less than
1, because the default values are the starting point of the
optimizations. This improvement indicated by high qualityratio
is already useful if the very same instances used in
training have to be optimized. Quality-improvement ratios
for the retrained ANN on both the training-set and the testset
are also presented. In these later cases, numbers less then
1 are possible (the parameters resulting from the retrained
ANN can have worse results than the ones given by the
default parameters), but were rare. As it can be seen in table
1, some quality-gain in training was consistently achieved,
but the transfer of this improvement to the ANN-model was
only partial. The phenomenon can appear because i- of the
unambiguity of the mapping, or because ii- the ANN is not
complex enough for the mapping, or, and most probably,
because the feature-set is not representative enough.