DISCUSSION
Parametric modeling usually involves making assumptions about the shape of data, or the shape of residuals from a regression fit. Verifying such assumptions can take many forms, but an exploration of the shape using histograms and q-q plots is very effective. The q-q plot does not have any design parameters such as the number of bins for a histogram.
In an advanced treatment, the q-q plot can be used to formally test the null hypothesis that the data are normal. This is done by computing the correlation coefficient of the n points in the q-q plot. Depending upon n, the null hypothesis is rejected if the correlation coefficient is less than a threshold. The threshold is already quite close to 0.95 for modest sample sizes.
We have seen that the q-q plot for uniform data is very closely related to the empirical cumulative distribution function. For general density functions, the so-called probability integral transform takes a random variable X and maps it to the interval (0, 1) through the CDF of X itself, that is,
Y = FX(X)
which has been shown to be a uniform density. This explains why the q-q plot on standardized data is always close to the line y = x when the model is correct.
Finally, scientists have used special graph paper for years to make relationships linear (straight lines). The most common example used to be semi-log paper, on which points following the formula y = aebx appear linear. This follows of course since log(y) = log(a) + bx, which is the equation for a straight line. The q-q plots may be thought of as being “probability graph paper” that makes a plot of the ordered data values into a straight line. Every density has its own special probability graph paper.
DISCUSSION
Parametric modeling usually involves making assumptions about the shape of data, or the shape of residuals from a regression fit. Verifying such assumptions can take many forms, but an exploration of the shape using histograms and q-q plots is very effective. The q-q plot does not have any design parameters such as the number of bins for a histogram.
In an advanced treatment, the q-q plot can be used to formally test the null hypothesis that the data are normal. This is done by computing the correlation coefficient of the n points in the q-q plot. Depending upon n, the null hypothesis is rejected if the correlation coefficient is less than a threshold. The threshold is already quite close to 0.95 for modest sample sizes.
We have seen that the q-q plot for uniform data is very closely related to the empirical cumulative distribution function. For general density functions, the so-called probability integral transform takes a random variable X and maps it to the interval (0, 1) through the CDF of X itself, that is,
Y = FX(X)
which has been shown to be a uniform density. This explains why the q-q plot on standardized data is always close to the line y = x when the model is correct.
Finally, scientists have used special graph paper for years to make relationships linear (straight lines). The most common example used to be semi-log paper, on which points following the formula y = aebx appear linear. This follows of course since log(y) = log(a) + bx, which is the equation for a straight line. The q-q plots may be thought of as being “probability graph paper” that makes a plot of the ordered data values into a straight line. Every density has its own special probability graph paper.
การแปล กรุณารอสักครู่..
