Abstract
Proteomics technologies can be effective for the discovery and assay of protein forms altered with disease. However, few examples of successful biomarker discovery yet exist. Critical to addressing this is the widespread implementation of appropriate QC (quality control) methodology. Such QC should combine the rigour of clinical laboratory assays with a suitable treatment of the complexity of the proteome by targeting separate assignable causes of variation. We demonstrate an approach, metric and example workflow for users to develop such targeted QC rules systematically and objectively, using a publicly available plasma DIGE data set. Hierarchical clustering analysis of standard channels is first used to discover correlated groups of features corresponding to specific assignable sources of technical variation. These effects are then quantified using a statistical distance metric, and followed on control charts. This allows measurement of process drift and the detection of runs that outlie for any given effect. A known technical issue on originally rejected gels was detected validating this approach, and relevant novel effects were also detected and classified effectively.
Our approach was effective for 2-DE QC. Whilst we demonstrated this in a retrospective DIGE experiment, the principles would apply to ongoing QC and other proteomic technologies.