Empirical asset pricing literature has documented many examples of firm characteristics
being able to predict future stock returns. When not accounted for by standard asset pricing
models, such patterns are often interpreted as anomalous. It is challenging to develop
meaningful theoretical explanations of the observed patterns in returns.1
In contrast, the
long-short portfolios constructed by sorting firms on various characteristics – the “c-factors”,
often named after the sorting variable – provide readily available inputs into empirical factor
models. By searching through the firm characteristics known to be associated with large
spreads is stock returns, it is relatively easy to construct seemingly successful empirical factor
pricing models.
When we hear of a new c-factor model with N factors that “explains” M of the wellknown
anomalies, how should we evaluate such a result? Is there a quantitative threshold
for the M-to-N ratio above which such a result strongly points to an economically important
source of systematic risk, even without a solid theoretical foundation? The ease of construction
of c-factor models and virtually unlimited freedom in selecting test assets provide fertile
ground for data mining.2
In this paper we quantify just how easy it is to generate seemingly
successful empirical c-factor models. Our findings imply that it is extremely difficult
to evaluate factor pricing model based solely on their pricing performance, and one must
emphasize the theoretical and empirical foundation for their economic mechanism.
We systematically mine the 1971-2011 historical sample under a specific set of rules
designed to be representative of the commonly used empirical procedures. We consider 27
firm characteristics proposed in the literature as predictive variables for stock returns (see
section 2 and Appendix A for the list of the characteristics, with references to the relevantliterature). Some of these characteristics have been proposed as candidate empirical proxies
for systematic risk exposures, others as likely proxies for mispricing – we do not discriminate
based on the merits of the original motivation. To qualify as a contender for our data-mining
exercise, a firm characteristic simply needs to be a subject of an academic publication.
We rank firms into ten portfolios based on each of the 27 characteristics and define the
associated return factors as return differences between the tenth and the first decile portfolios.
We then tabulate the pricing performance of all possible three- and four-factor models, each
consisting of the market portfolio and two or three factors respectively, chosen out of the set
of 27. We thus consider a total of 351 alternative three-factor models, and 2,925 four-factor
models.
If a pricing model is not rejected by testing it against a cross-section of portfolios sorted
on a particular firm characteristic, we say that this model matches such a cross-section. We
find that it is relatively easy to construct a three-factor model that match more than half
of the 25 target cross-sections of returns over the full sample (we exclude the cross-sections
used to form the model factors from the set of target cross-sections).
The best-performing model over the entire sample, by the total number of matched crosssections,
includes the factors based on unexpected earnings and the cash flow-to-price ratio.
It matches 15 out of 25 return cross-sections. Each of the top-twenty models reported in
Table 5 matches return cross-sections based on each of 12 or more different characteristics.3
Four-factor models achieve slightly better coverage, with the top model matching 16 out
of 24 cross-sections, and the worst of the top-twenty models matching 14. For comparison,
the CAPM and the Fama and French (1993) three-factor model both match eight out of
27 return cross-sections (we do not exclude any test assets when evaluating these reference
models).
As expected in a data mining exercise, performance of the c-factor models tends to be
fragile. It is highly sensitive to the sample period choice and the details of the factor construction.
In particular, there is virtually no correlation between the relative model performance
in the first and the second halves of the 1971-2011 sample period. Likewise, using a two-way
sort on firm stock market capitalization (size) and characteristics to construct model return
factors, an often used empirical procedure, similarly scrambles the relative model rankings.
Such lack of stability suggests that our data-snooping algorithm tends to pick spurious winners
among the set of all possible models without revealing a robust underlying risk structure
in returns. This does not mean that all of the better-performing models in our analysis are
spurious and theoretically unjustifiable. Some of the many models we enumerate in this
study are likely to capture economically meaningful sources of risk – we just cannot identify
which of them do, based solely on the models’ pricing performance.
This paper is organized as follows. Section 2 describes the data and methodology. Section
3 examines the overall factor structure of characteristic-sorted portfolios and the ability
of c-factor models to capture cross-sectional differences in average returns on various
characteristic-sorted portfolios. Section 4 concludes.