We found that Hadoop application execution can be further optimized on our system by controlling the number,
and placement of DataNodes in the system. Execution time
can be reduced by configuring the DataNodes only on cluster nodes closer to the storage-FPGA (S-FPGA). There are
three reasons for the reduction in execution time with this
optimization: (a) Running DataNodes on only a few nodes
frees up the total cluster memory used by the DataNode daemons and makes it available for the application. (b) There
are fewer interfering requests made to the virtualized disk
by multiple DataNodes [27]. (c) Unlike traditional clusters where network bandwidth is limited, ample fabric bandwidth provides efficient data transport among cluster nodes