3.3 Data ingestion, distribution, and lifetime
Figure 2 shows the ingestion path of data into Scuba. Facebook’s
code base contains logging calls to import data into Scuba. As
events occur, these calls are executed and (after weeding out entries based on an optional sampling rate) log entries are written to
Scribe. Scribe is an open-source distributed messaging system for
collecting, aggregating, and delivering high volumes of log data
with low latency. It was developed by and is used extensively at
Facebook [5]. A tailer process then subscribes to the Scribe categories intended for Scuba and sends each batch of new rows to
Scuba via Scuba’s Thrift API. (Thrift [7] is a software library that
implements cross-language RPC communication for any interfaces
defined using it.) These incoming rows completely describe themselves, including their schema.
For each batch of incoming rows, Scuba chooses two leaves at
random and sends the batch to the leaf with more free memory.
The rows for each table thus end up partitioned randomly across all
leaves in the cluster. There are no indexes over any table, although
the rows in each batch have timestamps in a very short time window. (These time windows may overlap between batches, however,
since data is generated on many servers.)
The leaf receiving the batch stores a gzip compressed copy of the
batch file to disk for persistence. It then reads the data for the new
rows, compresses each column, and adds the rows to the table in
memory. The elapsed time from an event occuring until it is stored
in memory and available for user queries is usually within a minute.
Memory (not cpu) is the scarce resoure in Scuba. We currently