At least 2 reviewers independently extracted information on patients, methods, and results of all included studies by using a pretested form. The Cochrane Collaboration tool was used to assess risk of bias in the included studies.22 Generation of the allocation sequence, concealment of allocation, blinding, incomplete outcome data, and selective outcome reporting were rated as having low, unclear, or high risk of bias. For assessing overall risk of bias in a study, we did not include the blinding item because complete blinding of those providing and receiving psychological treatments is rarely feasible. Even if clinical assessors are blinded, they depend strongly on what patients report. Taking this a priori limitation into account, we considered included trials to have a low risk of bias if none of the remaining 4 items were considered at high risk of bias and not more than 1 item was unclear. If 1 or more items were considered at high risk of bias, the overall risk was considered high. In the remaining studies, risk of bias was considered unclear.
Because the included studies reported results on efficacy in a highly diverse and often incomplete manner, we performed an additional extraction round using a standardized preference approach for extracting or imputing outcome data22,23 for meta-analysis. This additional extraction was done by 1 reviewer (K.L.), while a second (K.S. or K.M.) cross-checked all extracted data against the original publications and recalculated imputations. Even though the prespecified primary efficacy endpoint for our overall multitreatment systematic review was a response defined as at least a 50% score reduction on a depression scale,17 we chose to report posttreatment scores in more detail in this article to make our review better comparable with the available reviews.8,14,15 Whenever possible, we extracted data for the Beck Depression Inventory for effect size calculation (because this instrument was most widely used). If the data were not available, we used other patient-reported depression scores as second preferences. If patient-reported outcome data were not measured, we used data from observer-rated scales (Hamilton Rating Scale for Depression as a third preference, Montgomery-Asberg Depression Rating Scale as a fourth preference, other scales as last options). We also performed analyses on remission (defined as having a symptom score below a fixed threshold). Study discontinuation was used as an indicator of acceptability. If available, we also extracted the number of patients dropping out for adverse events or adverse effects, as well as the number of patients reporting adverse events or adverse effects.