The dataset was extracted from a community hospital in the Midwest for the year 1996. Our
experiments used 10,000 care plans as a training set and 5,000 care plans as a testing set.
We use two different types of evaluation mechanisms, called random selection and greedy
selection. For random selection, we randomly select one item from the remaining items in the
care plan and evaluate its ranking in the ordered list. For greedy selection, we always select the
remaining care-plan item with the highest ranking in the list. Both of these can be seen as
simulating human behavior. When all required items are near the top of the list, human selection
behaves like greedy selection. If all the required items are low in the list, people are not patient
enough to go through the list and would instead select the needed item in an alphabetic list. In
this case human selection behaves more like random selection. Actual human selection is likely
between the results of these two methods.