2.2. Randomization test (RT)
RT is a method for variable selection by employing the statistics
of the regression coefficients in the models built with permutation
of the dependent variables y in the training set [27]. In the calculation
of RT, a regular model showing the relationship of y and X
is built for reference and a number (M) of random PLSDA models
are built by randomization, i.e., randomly scrambling the indices
of y while keeping the indices of X. In this study, the number of
the permutations is 1000, as discussed in our previous work [27].
In each of the random models, a regression coefficient can be obtained
for each gene. Clearly, the regression coefficients of each
gene in the random models must be due to chance. Therefore,
the values of the regression coefficients can be referred to as ‘noise
values’.
A statistic, P, is defined as the fraction of the ‘noise values’
exceeding the regression coefficient in the regular PLSDA model,