ECON 5360 Class Notes
Qualitative Dependent Variable Models
Here we consider models where the dependent variable is discrete in nature.
1 Linear Probability Model
Consider the linear probability (LP) model
yi =
0
xi + i
where E(i
) = 0. The conditional expectation
E(yi
jxi) =
0
xi
is interpreted as the probability of an event occurring given xi
. There are a couple of drawbacks to the LP
model that limits its use:
1. Heteroscedasticity. Given that yi = f0; 1g, the error term can take on two values with probability
i f(i
)
1
0
xi
0
xi
0
xi 1
0
xi
so that the variance is
var(i
) =
0
xi(1
0
xi)
2 + (1
0
xi)(
0
xi)
2
=
0
xi(1
0
xi)
= E(yi)[1
E(yi)]:
2. Predictions outside [0,1]. The predicted probabilities from the LP model, y^i =
0
xi
, can be less than
zero and greater than one.
1
2 Binomial Probit and Logit Models
The drawbacks of the LP model are solved by letting the probability of an event (i.e., y = 1) be given by a
well-deÖned cumulative density function
P rob(yi = 1jx) = Z x
0
1
f(t)dt = F(x
0): (1)
In this manner, the predicted probabilities will always be bounded between zero and one. If F(x
0) is the
cdf for a standard normal random variable, we get the probit model. If
F(x
0) = e
x
0
1 + e
x0
;
then we get the logit model. Estimates from the logit and probit models often give similar results. The
logit model is less computationally intense because F(x
0) has a closed form, however, the logistic pdf f()
has fatter tails than the standard normal pdf. Because yi = f0; 1g is discrete, while (1) implies continuity,
we replace yi with the latent variable y
i
. This produces
y
i =
0
xi + i
.
y
i
can be interpreted as an unobservable index function that measures individual iís propensity to choose
y = 1. For example, y
i
could be the net beneÖts (beneÖts less costs) of selecting option A. Alternatively,
y
i
could be interpreted as the di§erence in utility derived from choosing option A less the utility of choosing
option B. Therefore, we assume
if y
i > 0 then yi = 1
if y
i 0 then yi = 0.
The choice of zero as a threshold is innocuous if the vector xi
includes a constant term.
2.1 Estimation
The parameters of the model are estimated via maximum likelihood. The relevant probability can be written
as
P rob(yi = 1jx) = P rob(y
i > 0jx) = P rob(
0
xi + i > 0jx) = P rob(i >
0
xi
jx):
Assuming a symmetric, mean-zero pdf for i
, we have
P rob(i >
0
xi
jx) = P rob(i < 0
xi
jx):
2
It will be convenient to standardize i
, which gives
P rob(
i
< (
)
0xi
jx) = ((
)
0xi),
where () and are the cdf and standard deviation for i
, respectively. Therefore, the parameters are
only identiÖable up to a scalar , which is commonly set to unity (i.e., = 1). The likelihood function is
given by
L =
Yn
i=1
yi
i
f1
ig
1yi
and the log-likelihood function is given by
lnL() = Xn
i=1
fyi
ln(i) + (1
yi) ln(1
i)g: (2)
Maximization of (2) will require nonlinear optimization methods, such as Newtonís algorithm.
2.2 Marginal E§ects
The estimated coe¢ cients, ^ML, are problematic in two senses:
1. The true s are not identiÖed. Recall, that all we can really estimate is =.
2. Aside from problem #1, we know that
^
k =
@y
i
@xi;k
.
Because y
i
is an unobservable index function, it is di¢ cult to interpret this derivative.
A simple solution is to calculate
^i;k =
@P rob(yi = 1)
@xi;k
= ((
)
0xi)
k
(3)
where () is the pdf for i
. The advantage of the estimated marginal e§ect, ^i;k, is that it only depends
on = (so that it is identiÖable) and it is easy to interpret. Note that ^i;k depends on the entire vectors
for xi and . The standard errors for ^i;k can be calculated using the delta method, which is based on a
Örst-order Taylor approximation. We have
asy:var:(
^) =
@^
@^
0
!
V
@^
@^
0
!0
where V is the variance-covariance matrix for ^ML.
3