, and use this partial
execution to estimate the application global behavior. The
choice of v is a trade-off between profiling accuracy and cost.
When lowering v below a minimum threshold (platform
dependent), the estimation quality will not be acceptable.
Assuming the profiled workload is WV (WV ¼ n v,
v ¼ w in full profiling), and the profiled data-transfer size is
OV (the full data-transfer size as we run all the computation
on the GPU), we can estimate PG=PC and PG=Q as: