based on the MapReduce based procedure outperforms the baseline
and distributed procedure, which is different from the previous
results. In addition, the classification accuracy is the same, no
matter how many computer nodes are used (i.e. 85.71%). On the
other hand, the SVM based on the distributed procedure demonstrates
unstable performance when using different numbers of
computer nodes.
Fig. 11 shows the computational costs of training and testing
the SVM for the distributed and MapReduce based procedures. In
this dataset, the baseline procedure takes 173,911 s (i.e. around
48 h) while about 1–5 h are required for the distributed procedure
during the classifier training and testing steps. However, the
MapReduce based procedure requires the least amount of training
and testing time, especially when 10 computer nodes are used, requiring
only 76 s.
Although the computational costs obtained using different
numbers of computer nodes based on the MapReduce based procedure
are similar (i.e. about 1–2 min), the processing times gradually
increase when the number of computer nodes increases from
10 to 50.
These results indicate that increasing the number of computer
nodes does not necessarily mean that the processing time can be
reduced. This is because in the MapReduce framework using larger
numbers of computer nodes creates a need to allocate the training
set to more workers (i.e. computer nodes) during the computation.
Therefore, more communication between different works are
needed