In this paper, we jointly study the data placement, task assignment,
data center resizing and routing to minimize the overall
operational cost in large-scale geo-distributed data centers for
big data applications. We first characterize the data processing
process using a two-dimensional Markov chain and derive
the expected completion time in closed-form, based on which
the joint optimization is formulated as an MINLP problem.
To tackle the high computational complexity of solving our
MINLP, we linearize it into an MILP problem. Through
extensive experiments, we show that our joint-optimization
solution has substantial advantage over the approach by twostep
separate optimization. Several interesting phenomena are
also observed from the experimental results.