5. QUERY LANGUAGE
Dremel’s query language is based on SQL and is designed
to be efficiently implementable on columnar nested storage.
Defining the language formally is out of scope of this
paper; instead, we illustrate its flavor. Each SQL statement
(and algebraic operators it translates to) takes as input
one or multiple nested tables and their schemas and
produces a nested table and its output schema. Figure 8
depicts a sample query that performs projection, selection,
and within-record aggregation. The query is evaluated
over the table t = {r1, r2} from Figure 2. The fields are
referenced using path expressions. The query produces a
nested result although no record constructors are present
in the query.
To explain what the query does, consider the selection
operation (the WHERE clause). Think of a nested record
as a labeled tree, where each label corresponds to a field
name. The selection operator prunes away the branches of
the tree that do not satisfy the specified conditions. Thus,
only those nested records are retained where Name.Url is
defined and starts with http. Next, consider projection. Each
scalar expression in the SELECT clause emits a value at the
same level of nesting as the most repeated input field used
in that expression. So, the string concatenation expression
emits Str values at the level of Name.Language.Code in the
input schema. The COUNT expression illustrates withinrecord
aggregation. The aggregation is done WITHIN each
Name subrecord and emits the number of occurrences of
Name.Language.Code for each Name as a nonnegative 64-bit
integer
(uint64).
The language supports nested subqueries, inter- and
intra-record aggregation, top-k, joins, user-defined functions,
etc.; some of these