DISCUSSION
Parametric modeling usually involves making assumptions about the shape of data, or the shape of residuals from a regression fit. Verifying such assumptions can take many forms, but an exploration of the shape using histograms and q-q plots is very effective. The q-q plot does not have any design parameters such as the number of bins for a histogram.
In an advanced treatment, the q-q plot can be used to formally test the null hypothesis that the data are normal. This is done by computing the correlation coefficient of the n points in the q-q plot. Depending upon n, the null hypothesis is rejected if the correlation coefficient is less than a threshold. The threshold is already quite close to 0.95 for modest sample sizes.
We have seen that the q-q plot for uniform data is very closely related to the empirical cumulative distribution function. For general density functions, the so-called probability integral transform takes a random variable X and maps it to the interval (0, 1) through the CDF of X itself, that is,
Y = FX(X)
which has been shown to be a uniform density. This explains why the q-q plot on standardized data is always close to the line y = x when the model is correct.
Finally, scientists have used special graph paper for years to make relationships linear (straight lines). The most common example used to be semi-log paper, on which points following the formula y = aebx appear linear. This follows of course since log(y) = log(a) + bx, which is the equation for a straight line. The q-q plots may be thought of as being “probability graph paper” that makes a plot of the ordered data values into a straight line. Every density has its own special probability graph paper.
DISCUSSION
Parametric modeling usually involves making assumptions about the shape of data, or the shape of residuals from a regression fit. Verifying such assumptions can take many forms, but an exploration of the shape using histograms and q-q plots is very effective. The q-q plot does not have any design parameters such as the number of bins for a histogram.
In an advanced treatment, the q-q plot can be used to formally test the null hypothesis that the data are normal. This is done by computing the correlation coefficient of the n points in the q-q plot. Depending upon n, the null hypothesis is rejected if the correlation coefficient is less than a threshold. The threshold is already quite close to 0.95 for modest sample sizes.
We have seen that the q-q plot for uniform data is very closely related to the empirical cumulative distribution function. For general density functions, the so-called probability integral transform takes a random variable X and maps it to the interval (0, 1) through the CDF of X itself, that is,
Y = FX(X)
which has been shown to be a uniform density.นี้อธิบายว่าทำไมครั้งแรกแปลงข้อมูลมาตรฐานมักจะเข้าใกล้เส้น y = x เมื่อรูปแบบถูกต้อง
ในที่สุด นักวิทยาศาสตร์ได้ใช้กระดาษกราฟพิเศษสำหรับปีเพื่อให้ความสัมพันธ์เชิงเส้น ( เส้นตรง ) ตัวอย่างที่พบมากที่สุดที่ใช้เป็นกระดาษบันทึกกึ่งที่จุดต่อไปนี้สูตร Y = aebx ปรากฏเป็นเส้น นี้คือของหลักสูตรตั้งแต่ log ( Y ) = log ( BX ) ,ซึ่งเป็นสมการที่เป็นเส้นตรง ครั้งแรกที่แปลง อาจเรียกได้ว่าเป็น " ความน่าจะเป็นกระดาษกราฟ " ที่ทำให้พล็อตของสั่งค่าข้อมูลในแนวเส้นตรง ทุกๆความหนาแน่นมีความน่าจะเป็นพิเศษ
กราฟกระดาษ
การแปล กรุณารอสักครู่..