1. Input the data set - our goal is to see if a new student will like Data Mining or not (i.e., to predict LIKEDM). What should be done with the DATAMINE field?
2. Explore the relationship of LIKEDM to each individual field. What effect does each field seem to have on LIKEDM? Use histogram nodes with LIKEDM as an overlay. For the non-integer fields, you may wish to temporarily convert to integer representations to get histograms or use the distribution node.
3. Create the default C5.0 tree. How many leaves does it have? What are the major predictors of LIKEDM?
4. For each of the following, use the tree of (3) to predict whether the person will like Data Mining
5. Create two alternative decision trees (do each independently) by: (a) using "generality" in the simple options; (b) decreasing pruning severity to 10 with the expert option. Contrast these two trees with that in part (3) and comment on the differences.