The HDFS in the CloudBook system has built-in fault tolerance and
guides the request from the user to different servers to achieve the
load balance in the cloud platform. However, a new problem is generated
on the cloud platform due to a poor job scheduling algorithm. If the
required files and the user job are dispatched in the same server, the
performance of vector graphic converter can be increased. Otherwise,
the required files should be transmitted from other data nodes to the
computing node that causes some overhead due to the derived network
delay. Fig. 8 shows the data locality problem in HDFS.
In Fig. 8, if the number of computational slot equals to one, the
optimal scheduler gets more data locality in the resource assignment.
Hence, this study proposes a locality-aware scheduler and applies the
scheduler to the JobTracker.