• To our best knowledge, we are the first to consider the cost minimization problem of big data processing with joint consideration of data placement, task assignment and data routing. To describe the rate-constrained computation and transmission in big data processing process, we propose a two-dimensional Markov chain and derive the expected task completion time in closed form.
• Based on the closed-form expression, we formulate the cost minimization problem in a form of mixed-integer nonlinear programming (MINLP) to answer the following questions: 1) how to place these data chunks in the servers, 2) how to distribute tasks onto servers without violating the resource constraints, and 3) how to resize
data centers to achieve the operation cost minimization goal.
• To deal with the high computational complexity of solving MINLP, we linearize it as a mixed-integer linear programming (MILP) problem, which can be solved using commercial solver. Through extensive numerical studies, we show the high efficiency of our proposed joint-optimization based algorithm. The rest of the paper is organized as follows. Section II summariestherelatedwork.SectionIIIintroducesoursystem model. The cost optimization is formulated as an MINLP problem in Section IV and then it is linearized in Section V. The theoretical findings are verified by experiments in Section VI. Finally, Section VII concludes our work.