Worse yet, it was not possible to produce the same results locally that were reported from an automated test run. It became evident that the harness itself performed calculations, (it would drop the highest value per page, then report the average for the rest of the cycles) and Graph Server did as well (drop the highest page value, then average the pages together). The end result was that no historical data existed that could provide much value, nor did anybody understand the tests we were running.