Cumulative meta-analysis (MA) of patient-oriented outcomes is commonly used to determine if the addition of a new trial to a series of the existing trials results in a statistically significant change in the overall treatment effect [1]. This approach implies that repeated significance testing is likely to increase the risk of random error and false-positive (FP) results as the number of studies increases. Recent studies have suggested that apparently conclusive evidence resulting from MAs may be inconclusive [2], [3], [4] and [5].
For evidence obtained from MAs to be categorized as conclusive, the total number of participants should be at least as large as the sample (or information) size of a single optimally powered randomized controlled trial (RCT) [2] and [6]. The optimal information size is calculated using a prespecified event rate in the control group, minimum intervention effect, and desired maximum risk of type I and type II errors. When the observed information size is less than the optimal, the imprecision of the estimate of treatment effect and risk of obtaining FP result (type I error) increase. Accordingly, the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group recommends that if the total number of patients included in a systematic review is less than the optimal information size, it should be rated down for imprecision [7].
Trial sequential analysis (TSA) has been proposed as a method to determine if results of MAs are conclusive by providing the optimal sample size and monitoring boundaries analogous to constructing interim monitoring boundaries for individual RCTs [2]. In TSA, the optimal information size can also be calculated so that the minimum intervention effect is estimated from low-bias trials [2] and by accounting for between-trial diversity, which is defined as a relative reduction in variance because of switching from random-effects model to a fixed-effects model [8].
The TSA methods have been incorporated into publically available software [9]. However, at present, the methods can address binary and continuous outcomes using one spending function (O’Brien-Fleming) but not time-to-event outcomes. These patient-oriented outcomes, such as the overall survival or progression-free survival, are central to studies of many diseases, including cancer and HIV. Therefore, there is a need for TSA that can incorporate both hazard ratios (HRs) for time-to-event outcomes and relative risk, risk difference, or odds ratio for binary outcomes. The Cochrane Handbook recommends that the effect measure for time-to-event outcomes should be expressed as HR [10], and methods exist to extract approximate HRs from published studies that did not explicitly report them [11], [12], [13] and [14]. Time-to-event outcomes can be analyzed as binary outcomes, but it is generally accepted that this practice leads to the loss of power and that it should be avoided [11]. The pooled meta-analytic estimates and heterogeneity will also likely be different for time-to-event outcomes compared with the same outcomes classified as binary, which will yield different TSA results.
Our objective was to present a method for performing TSA for time-to-event outcomes when HRs and their standard errors are available. Furthermore, no study to date has addressed the conclusiveness of evidence in MAs of time-to-event outcomes, which are prevalent in cancer trials. We used these methods to assess the conclusiveness of MAs of treatments for multiple myeloma.