Data pedigree Many statistics textbooks discuss the importance of sample size on statistical analyses and provide formulas for deter-mining appropriate sample size. Unfor-tunately, very little is typically said about data quality. The quote from Box, Hunter and Hunter noted earlier is a welcome but rare exception. The assumption seems to be that “all data are created equal.” If only that were true. Practitioners who have had to collect their own data know how challenging it can be to collect good data. Missing values and variables, poor measurement pro-cesses and collinearity between the inde-pendent variables are just a few problems typically encountered. No amount of data or sophisticated data mining algorithms will salvage a bad data set. The key point is that rather than jump-ing into analyses, practitioners should always carefully consider data quality first: where it came from, how it was collected, who collected it, over what time frame, the measurement system used and the associated science and engineering. This type of information is called the data pedigree because it describes the background and history of the data, much like a show dog’s pedigree documents its credentials. Data should always be considered guilty until proven innocent. In many cases, the data are sufficient to answer some questions, but not to solve the overall problem. More data, collected differently, are often required, based on analysis of the original data. Further, the sophistication of any models developed should be based on the needs of the problem and the data pedigree, and should never be more complex than can be ad-equately supported by the current data