The conventional wisdom for large-scale databases is to always
send the computation to the data, rather than the other way around.
In other words, one should send a small program over the network
to a node, rather than importing a large amount of data from the
node. Parallel DBMSs use knowledge of data distribution and location to their advantage: a parallel query optimizer strives to balance
computational workloads while minimizing the amount data transmitted over the network connecting the nodes of the cluster.