Fig. 1 shows a scatter-plot of these mock data, relating recovery to hours of treatment per week. Although not particularly
interesting in itself, this figure serves to highlight a number of important issues regarding why logistic regression is to be
preferred in these contexts over other regression procedures. There are a number of things to note immediately based on
an inspection of these data. The first is that the outcome chosen for analysis in this example can only take one of two values
(0 = not recovered and 1 = recovered), any regression technique that has the possibility of predicting any other value is clearly
inappropriate for such data (it should be noted that it is not always the case that the outcome has to be divided into two
categories, sometimes the outcome will be a continuous variable; whether to actually divide this into two will depend on
the design chosen) secondly, the relationship between the predictor (hours per week) and outcome (recovery) cannot be
termed linear, but are best described by an S-shaped (‘sigmoidal’) curve; and thirdly, the variance in the outcomes (recovery)
is much smaller at the extreme values of the predictor (intervention time per week) than it is at the central values. This
tendency can be seen more easily in the plot of these values displayed in Fig. 2. This figure represents the mean recovery rate
at each level of treatment intensity (not a particularly appropriate statistic), but more importantly, it shows the confidence
intervals around those means. Inspection of these confidence intervals reveals much larger intervals (variance) in the middle
values of intervention time per week than at the extreme values. These features, especially the latter concerning unequal
variance in the outcome variable across all values of the predictor variable, make such data typically unsuitable for simple
regression analyses (see Howell, 1997, and section on alternative techniques below).