2 Poisson regression
Poisson regression is generally used to model counts data. It assumes that the response variable has a Poisson distribution and the logarithm of its expected value
can be modeled by a linear combination of unknown parameters. Let yi be the response variable. We assume that yi follows a Poisson distribution with mean ki, defined as a function of covariates xi . Thus, the Poisson regression is given by the equation below:
PðyiÞ¼
ekikyi i yi! where the conditional mean is specified by ki ¼ Eðyi xi j Þ¼expðx0bÞ. The vector x0 i ¼ xi;1;xi;2;... ;xi;P contains the covariates and b0 ¼ b1;b2;... ;bP ðÞ is the vector of unknown parameters. The number P defines the dimension of the vector of the covariates incorporated in the model. Maximum likelihood techniques may be used to estimate the parameters of the Poisson regression. Given the assumption that the observations (yi xi j ) areindependent, the log-likelihood function is given by: lnL bðÞ¼X n i¼1 ½yix0bexpðx0 ibÞlnðyi!Þ The likelihood equations are: olnL ob ¼X n i¼1 ðyi kiÞxi ¼ 0 Therefore, the Hessian is: o2 lnL obob0 ¼X n i¼1 kixix0 i The Hessian of the model is negative for all x and b. The log-likelihood function is, then, concave. Hence, Newton–Raphson iterative algorithm will converge rapidly and provide unique parameters estimate. The estimator of the asymptotic covariance matrix is given by:
Varð^ bÞ¼ X n i¼1
^ kixix0 i ! 1
The hypothesis tests of the nullity of a single parameter or a set of parameters simultaneously can be carried using Wald test, Lagrange Multiplier test or Likelihood Ratio test. The Wald statistic is given by:
W ¼ ^ b0 X n i¼1
^ kixix0 i ! 1 ^ b
Under the null hypothesis, the Wald statistic follows a Chi-square with one degree of freedom. The Likelihood Ratio statistic is given by: LR ¼ 2X n i¼1 ln ^ Pi ^ PRestricted;i !
The Lagrange Multiplier statistic of Poisson model is given by: LM ¼ X n i¼1 xiðyi ^ kiÞ "# 0 X n i¼1 xix0 iðyi ^ kiÞ2 "# 1 X n i¼1 xiðyi ^ kiÞ "# where k0 i is computed using a restricted model. The LM statistic is to compare with a Chi-square with one degree of freedom. The interpretation of a Poisson model differs according to the goals of the study. A researcher can be interested in the expected counts or in the distribution of counts. When the analysis of the expected value is the aim of the study, several measures namely the partial effects, the factor change and/or the percentage change can be computed to assess the change of the expected value for a change in an independent variable (i.e. a covariate) keeping other variables constant. If the interest is in the distribution of counts or just the probability of a specific count, the probability of a count for a given level of the independent variables can be computed [17]. The partial effect of E(y x j Þ with respect to xk is given by: oEðyx j Þ oxk ¼bk expðx0bÞ¼Eðyx j Þbk It is clear that the partial effects in Poisson models depend on both the coefficient of xk, that is bk, and the value of the expected value of y given x. Therefore, partial effects of non linear models cannot be interpreted as a change of the expected value for a unit change in xk as in linear models. The factor change in E(y x j Þfor a change d in xkholding all other factors constantis given by: Eðyx ;xk þdj Þ E(y x;xk j Þ ¼ expðbkdÞ Therefore,theexpectedvalueofygivenxincreasesbythefactorexpðbkdÞfollowing a change d in xk keeping other variables constant. When d has the specific value one, the expected counts increases by the factor expðbkdÞfollowing a unit change in xk. The percentage change in the expected value of y given x following a d change in xk, holding other variables constant [17] is given by:
100
Eðyx ;xk þdj ÞEðyx ;xkj Þ E(y x,xk j Þ ¼ 100ðexpðbkdÞ1Þ Another way to interpret count model is to compute with the predicted probability:
^ Pr y ¼ mx j ð Þ¼ expð^ kÞ^ km m!
where ^ k¼ expðx0bÞ:
The mean predicted probability for each count m is:
Pr y ¼ m ð Þ¼ 1 NX N i¼1
expð^ kiÞ^ km i m!
This measure is to compare with the observed proportions of the sample at each count. Large differences between the mean probabilities and the observed proportions suggest that the model is inappropriate. However, small differences do not imply that the model is appropriate because an incorrect model can provide predictions close to observed proportions [17].
2 Poisson regressionPoisson regression is generally used to model counts data. It assumes that the response variable has a Poisson distribution and the logarithm of its expected value can be modeled by a linear combination of unknown parameters. Let yi be the response variable. We assume that yi follows a Poisson distribution with mean ki, defined as a function of covariates xi . Thus, the Poisson regression is given by the equation below:PðyiÞ¼ekikyi i yi! where the conditional mean is specified by ki ¼ Eðyi xi j Þ¼expðx0bÞ. The vector x0 i ¼ xi;1;xi;2;... ;xi;P contains the covariates and b0 ¼ b1;b2;... ;bP ðÞ is the vector of unknown parameters. The number P defines the dimension of the vector of the covariates incorporated in the model. Maximum likelihood techniques may be used to estimate the parameters of the Poisson regression. Given the assumption that the observations (yi xi j ) areindependent, the log-likelihood function is given by: lnL bðÞ¼X n i¼1 ½yix0bexpðx0 ibÞlnðyi!Þ The likelihood equations are: olnL ob ¼X n i¼1 ðyi kiÞxi ¼ 0 Therefore, the Hessian is: o2 lnL obob0 ¼X n i¼1 kixix0 i The Hessian of the model is negative for all x and b. The log-likelihood function is, then, concave. Hence, Newton–Raphson iterative algorithm will converge rapidly and provide unique parameters estimate. The estimator of the asymptotic covariance matrix is given by:Varð^ bÞ¼ X n i¼1^ kixix0 i ! 1The hypothesis tests of the nullity of a single parameter or a set of parameters simultaneously can be carried using Wald test, Lagrange Multiplier test or Likelihood Ratio test. The Wald statistic is given by:W ¼ ^ b0 X n i¼1^ kixix0 i ! 1 ^ bUnder the null hypothesis, the Wald statistic follows a Chi-square with one degree of freedom. The Likelihood Ratio statistic is given by: LR ¼ 2X n i¼1 ln ^ Pi ^ PRestricted;i !The Lagrange Multiplier statistic of Poisson model is given by: LM ¼ X n i¼1 xiðyi ^ kiÞ "# 0 X n i¼1 xix0 iðyi ^ kiÞ2 "# 1 X n i¼1 xiðyi ^ kiÞ "# where k0 i is computed using a restricted model. The LM statistic is to compare with a Chi-square with one degree of freedom. The interpretation of a Poisson model differs according to the goals of the study. A researcher can be interested in the expected counts or in the distribution of counts. When the analysis of the expected value is the aim of the study, several measures namely the partial effects, the factor change and/or the percentage change can be computed to assess the change of the expected value for a change in an independent variable (i.e. a covariate) keeping other variables constant. If the interest is in the distribution of counts or just the probability of a specific count, the probability of a count for a given level of the independent variables can be computed [17]. The partial effect of E(y x j Þ with respect to xk is given by: oEðyx j Þ oxk ¼bk expðx0bÞ¼Eðyx j Þbk It is clear that the partial effects in Poisson models depend on both the coefficient of xk, that is bk, and the value of the expected value of y given x. Therefore, partial effects of non linear models cannot be interpreted as a change of the expected value for a unit change in xk as in linear models. The factor change in E(y x j Þfor a change d in xkholding all other factors constantis given by: Eðyx ;xk þdj Þ E(y x;xk j Þ ¼ expðbkdÞ Therefore,theexpectedvalueofygivenxincreasesbythefactorexpðbkdÞfollowing a change d in xk keeping other variables constant. When d has the specific value one, the expected counts increases by the factor expðbkdÞfollowing a unit change in xk. The percentage change in the expected value of y given x following a d change in xk, holding other variables constant [17] is given by:100
Eðyx ;xk þdj ÞEðyx ;xkj Þ E(y x,xk j Þ ¼ 100ðexpðbkdÞ1Þ Another way to interpret count model is to compute with the predicted probability:
^ Pr y ¼ mx j ð Þ¼ expð^ kÞ^ km m!
where ^ k¼ expðx0bÞ:
The mean predicted probability for each count m is:
Pr y ¼ m ð Þ¼ 1 NX N i¼1
expð^ kiÞ^ km i m!
This measure is to compare with the observed proportions of the sample at each count. Large differences between the mean probabilities and the observed proportions suggest that the model is inappropriate. However, small differences do not imply that the model is appropriate because an incorrect model can provide predictions close to observed proportions [17].
การแปล กรุณารอสักครู่..
