As the outcomes analysed in this paper are binary indicator variables (0/1), they are
modelled as probits and the parameters estimated by maximum likelihood. We
estimate separate equations for each year, rather than pool the data. The data are,
however, pooled across the states, state fixed effects being included to allow for all
state-level unobservables. This will include political-economic variables, historically
determined attitudes to education and initial conditions. The socio-economic status of
the household is captured by wealth indicators, adult education and demographics. For
rural areas, we include indicators of the supply of schooling at the village level. Since
no similar information is available for urban regions, these variables appear in
interaction with a dummy for whether the household lives in a rural area. If a variable
has a sufficiently large number of missing values then, rather than discard all
observations with any missing data, we create a dummy to indicate missing values
and include this in the model as an additional regressor; this is the case for caste and
religion.