Data Preparation
During data collection, the relevant data is
gathered and the quality of data must be verified.
Usually, the assembled data contains of missing or
incomplete attribute, noisy (containing errors, or
outlier values that deviate from expected), and
inconsistent of data are common. Therefore, the
collected data must be cleaned and transformed
before it can be utilized in data mining system since
data mining should process cleaned data in order to
come out with better and or quality results. Data
cleaning involves several of processes such as filling
in missing values; smoothing noisy data, identifying
or removing outliers, and resolving inconsistencies.
Then, the cleaned data are transformed into a form of
table that is suitable for data mining model. The
cleaned data will be divided into two; training or
learning data (60%) and the rest is for validating the
data. These training data is applied to develop the
model while the validated data is used to verify the
chosen model.