Splits into fewer sets are preferable to splits into many sets, since they lead
to simpler and more meaningful decision trees. The number of elements in each
of the sets Si may also be taken into account; otherwise, whether a set Si has 0
elements or 1 element would make a big difference in the number of sets, although
the split is the same for almost all the elements. The information content of a
particular split can be defined in terms of entropy as: