We present Resilient Distributed Datasets (RDDs), a distributed
memory abstraction that lets programmers perform
in-memory computations on large clusters in a
fault-tolerant manner. RDDs are motivated by two types
of applications that current computing frameworks handle
inefficiently: iterative algorithms and interactive data
mining tools. In both cases, keeping data in memory
can improve performance by an order of magnitude.
To achieve fault tolerance efficiently, RDDs provide a
restricted form of shared memory, based on coarsegrained
transformations rather than fine-grained updates
to shared state. However, we show that RDDs are expressive
enough to capture a wide class of computations, including
recent specialized programming models for iterative
jobs, such as Pregel, and new applications that these
models do not capture. We have implemented RDDs in a
system called Spark, which we evaluate through a variety
of user applications and benchmarks.
We present Resilient Distributed Datasets (RDDs), a distributedmemory abstraction that lets programmers performin-memory computations on large clusters in afault-tolerant manner. RDDs are motivated by two typesof applications that current computing frameworks handleinefficiently: iterative algorithms and interactive datamining tools. In both cases, keeping data in memorycan improve performance by an order of magnitude.To achieve fault tolerance efficiently, RDDs provide arestricted form of shared memory, based on coarsegrainedtransformations rather than fine-grained updatesto shared state. However, we show that RDDs are expressiveenough to capture a wide class of computations, includingrecent specialized programming models for iterativejobs, such as Pregel, and new applications that thesemodels do not capture. We have implemented RDDs in asystem called Spark, which we evaluate through a varietyof user applications and benchmarks.
การแปล กรุณารอสักครู่..
