Now the issue becomes selecting an appropriate
value for k. A large k is seemingly desirable, since
with a larger k (i) there are more performance estimates,
and (ii) the training set size is closer to the full
data size, thus increasing the possibility that any conclusion
made about the learning algorithm(s) under
test will generalize to the case where all the data is used
to train the learning model. As k increases, however,
the overlap between training sets also increases. For
example, with 5-fold cross-validation, each training
set shares only 3∕4 of its instances with each of the
other four training sets whereas with 10-fold crossvalidation,
each training set shares 8 ∕ 9 of its instances
with each of the other nine training sets. Furthermore,
increasing k shrinks the size of the test set, leading
to less precise, less fine-grained measurements of the
performance metric. For example, with a test set size of
10 instances, one can only measure accuracy to the
nearest 10%, whereas with 20 instances the accuracy
can be measured to the nearest 5%. These competing
factors have all been considered and the general consensus
in the data mining community seems to be
that k = 10 is a good compromise. This value of k is
particularity attractive because it makes predictions
using 90% of the data, making it more likely to be
generalizable to the full data.