To obtain statistical confidence we performed 250 independent
runs for each experiment. In each run a random initial population
was generated and supplied to all the MuGA versions. Due to the
varied difficulty of the problems, runs finished at 30,000 function
evaluations in problems F3 and F4, 50,000 in Htrap1 and 100,000
in F3S. For each experiment we compute the average of the best
value found and the average number of evaluations to find the
optimum. For the latter we consider the maximum number of
evaluations in case the optimum is not found. We also compute a
revealing result, which is the success rate, meaning the percentage
of runs that reach the optimum