the developer’s efforts while supporting
dynamic, fine-grain parallelism.
Much can be learned from Web and
cloud services where abstraction layers
and domain-specific toolkits allow
developers to deploy custom execution
environments (virtual machines) and
leverage high-level services for reduction
of complex data. The scientific
computing challenge is retaining expressivity
and productivity while also
delivering high performance.
Reformulation of science problems
and refactoring solution algorithms.
Many thousands of person-years have
been invested in current scientific and
engineering codes and in data mining
and learning software. Adapting scientific
codes to billion-way parallelism
will require redesigning, or even reinventing,
the algorithms and potentially
reformulating the science problems.
Integrating data-analytics software
and tools with computation is equally
daunting; programming languages
and models differ, as do the communities
and cultures. Understanding how
to do these things efficiently and effectively
will be key to solving missioncritical
science problems;
Ensuring correctness in the face of
faults, reproducibility, and algorithm
verification. With frequent transient
and permanent faults, lack of reproducibility
in collective communication,
and new mathematical algorithms
with limited verification, computation
validation and correctness assurance
will be much more important for the
next generation of massively parallel
systems, whether optimized for scientific
computing, data analysis, or both;
Mathematical optimization and uncertainty
quantification for discovery,
design, and decision. Large-scale computations
are themselves experiments
that probe the sample space of numerical
models. Understanding the sensitivity
of computational predictions
to model inputs and assumptions,
particularly when they involve complex,
multidisciplinary applications
requires new tools and techniques
for application validation and assessment.
The equally important analogs
in large-scale data analytics and machine
learning are precision (the fraction
of retrieved data that is relevant)
and recall (the fraction of relevant data
retrieved); and