We considered sample sizes from 20,000 to 80,000.
Samples of these sizes are large enough to give good
approximations and small enough to be handled in
main memory. Since our approach is probabilistic, we
repeated every experiment 100 times for each parameter
combination. Altogether, over 10,000 trials were
run. We did not experiment with all the frequency
thresholds used in the literature; the repeated trials
would have taken too long. The tests were run on a
PC with 90 MHz Pentium processor and 32 MB main
memory under Linux operating system