The web provides an unprecedented opportunity to evaluate ideas
quickly using controlled experiments, also called randomized
experiments (single-factor or factorial designs), A/B tests (and
their generalizations), split tests, Control/Treatment tests, and
parallel flights. Controlled experiments embody the best
scientific design for establishing a causal relationship between
changes and their influence on user-observable behavior. We
provide a practical guide to conducting online experiments, where
end-users can help guide the development of features. Our
experience indicates that significant learning and return-oninvestment
(ROI) are seen when development teams listen to their
customers, not to the Highest Paid Person’s Opinion (HiPPO). We
provide several examples of controlled experiments with
surprising results. We review the important ingredients of
running controlled experiments, and discuss their limitations (both
technical and organizational). We focus on several areas that are
critical to experimentation, including statistical power, sample
size, and techniques for variance reduction. We describe
common architectures for experimentation systems and analyze
their advantages and disadvantages. We evaluate randomization
and hashing techniques, which we show are not as simple in
practice as is often assumed. Controlled experiments typically
generate large amounts of data, which can be analyzed using data
mining techniques to gain deeper understanding of the factors
influencing the outcome of interest, leading to new hypotheses
and creating a virtuous cycle of improvements. Organizations that
embrace controlled experiments with clear evaluation criteria can
evolve their systems with automated optimizations and real-time
analyses. Based on our extensive practical experience with
multiple systems and organizations, we share key lessons that will
help practitioners in running trustworthy controlled experiments.