Assignment 2
Consider the data collected by a hypothetical video store for 50 regular customers. This data consists of a table which, for each customer, records the following attributes: Gender, Income, Age, Rentals (total number of video rentals in the past year), Avg. per visit (average number of video rentals per visit during the past year), Incidentals (whether the customer tends to buy incidental items such as refreshments when renting a video), and Genre (the customer's preferred movie genre). This data is available as an Excel spreadsheet.
Perform each of the following data preparation tasks:
a. Use smoothing by bin means to smooth the values of the Age attribute. Use a bin depth of 4.
b. Use min-max normalization to transform the values of the Income attribute onto the range [0.0-1.0].
c. Use z-score normalization to standardize the values of the Rentals attribute.
d. Discretize the (original) Income attribute based on the following categories: High = 60K+; Mid = 25K-59K; Low = less than $25K.
e. Convert the original data (not the results of parts a-d) into the standard spreadsheet format (note that this requires that you create, for every categorical attribute, additional attributes corresponding to values of that categorical attribute; numerical attributes in the original data remain unchanged).