we say that a student scores at the tth quantile of a standardized exam if
he performs better than the proportion t of the reference group of
students and worse than the proportion (1–t). Thus, half of students
perform better than the median student and half perform worse. Similarly, the
quartiles divide the population into four segments with equal proportions of the
reference population in each segment. The quintiles divide the population into five
parts; the deciles into ten parts. The quantiles, or percentiles, or occasionally
fractiles, refer to the general case. Quantile regression as introduced by Koenker
and Bassett (1978) seeks to extend these ideas to the estimation of conditional
quantile functions—models in which quantiles of the conditional distribution of the
response variable are expressed as functions of observed covariates.
In Figure 1, we illustrate one approach to this task based on Tukey’s boxplot
(as in McGill, Tukey and Larsen, 1978). Annual compensation for the chief
executive officer (CEO) is plotted as a function of firm’s market value of equity. A
sample of 1,660 firms was split into ten groups of equal size according to their
market capitalization. For each group of 166 firms, we compute the three quartiles
of CEO compensation: salary, bonus and other compensation, including stock
options (as valued by the Black-Scholes formula at the time of the grant). For each
group, the bow-tie-like box represents the middle half of the salary distribution
lying between the first and third quartiles. The horizontal line near the middle of
each box represents the median compensation for each group of CEOs, and the
notches represent an estimated confidence interval for each median estimate. The
full range of the observed salaries in each group is represented by the horizontal
bars at the end of the dashed “whiskers.” In cases where the whiskers would
extend more than three times the interquartile range, they are truncated and the
remaining outlying points are indicated by open circles. The mean compensation
for each group is also plotted: the geometric mean as a 1 and the arithmetic mean
as a *.
There is a clear tendency for compensation to rise with firm size, but one can
also discern several other features from the plot. Even on the log scale, there is a
tendency for dispersion, as measured by the interquartile range of log compensation,
to increase with firm size. This effect is accentuated if we consider the upper
and lower tails of the salary distribution. By characterizing the entire distribution of
annual compensation for each group, the plot provides a much more complete
picture than would be offered by simply plotting the group means or medians.
Here we have the luxury of a moderately large sample size in each group. Had we
had several covariates, grouping observations into homogeneous cells, each with a
sufficiently large number of observations, would become increasingly difficult.
In classical linear regression, we also abandon the idea of estimating separate
means for grouped data as in Figure 1, and we assume that these means fall on a line
or some linear surface, and we estimate instead the parameters of this linear model.
Least squares estimation provides a convenient method of estimating such conditional
mean models. Quantile regression provides an equally convenient method
for estimating models for conditional quantile functions.