2008. The measure of income inequality employed is the Gini index. As
scaled in the WIID, the Gini index has a theoretical range from zero, which
indicates that each reference unit receives an equal share of income, to one
hundred, indicating that a single reference unit receives all income and all
others receive nothing. Next, two series of inequality observations—providing
information about inequality in gross and net income, respectively—from the
LIS are added to the dataset. As the quality and comparability of these data
are unparalleled, these observations serve as the baseline to which the WIID
data are standardized.
Once the LIS data are added, the first step in standardizing the inequality
data is to eliminate those observations that do not provide coverage of all or
nearly all of a country’s population. Many of the WIID observations cover
only urban or rural residents or otherwise omit significant parts of the population.
These observations were generally excluded. However, in the absence
of any WIID observations with complete coverage for Argentina or Uruguay,
and in light of their very high rates of urbanization (approximately 90%),
I follow Babones and Alvarez-Rivadulla (2007) in retaining the urban-only
observations for these two countries. Historical inequality data predating
1960, which are often based on unreliable surveys, were also removed from
the sample.
Next, the data were sorted according to their reference unit and income
definition. The WIID dataset contains over two dozen different reference-unit
codes, but, as previous researchers have noted (e.g., Babones and AlvarezRivadulla
2007, 11), many of these are essentially equivalent. Five distinct
reference units can be identified: (1) household per capita, (2) household
adult equivalent, (3) household without adjustment, (4) employee, and (5)
person.5 Similarly, although the WIID data are classified into 26 income
definitions, these are easily grouped into just four: (1) net income, (2) gross
income, (3) expenditures, and (4) unidentified. Rather than assume con-
5Observations using undocumented country-specific reference units, such as the social
assistance household or national scale household equivalent, were disregarded. It is also
worth mentioning here that several different definitions of “household adult equivalent”
appear in the WIID dataset, including the square root of household size (the definition
preferred by the LIS) and the OECD scale. The differences in the Gini indices based
on these different definitions of adult equivalent, however, are typically quite small, less
than one point on the zero to one hundred scale. I have therefore opted to treat them
as a single group to facilitate the standardization process, although at the cost of slightly
greater uncertainty.
6
stant differences across reference units for various income definitions and
vice versa, the data were classified according to the combination of reference
unit and income definition. This yields nineteen categories (no observations
provide information about the distribution of consumption per employee).
Due to their superior quality, the two series of LIS data, which are based
on household adult-equivalent net and gross income, are considered separate
categories, bringing the total number of categories of data to twenty-one.
Rather than choose among sources, when more than one observation was
available within a category for a particular country and year, these observations
were averaged.
This provides a dataset of country-year observations, each of which has
data on inequality in one or more of the twenty-one categories. What is
needed to generate a series with data on all countries and years from the
incomplete inequality variables in twenty-one categories are the ratios between
each pair of variables. If the ratio ρab between the Gini index data in
categories a and b were known, missing observations in a could be replaced
simply by multiplying available data in b by ρab. But as noted previously, the
relationship between Gini indices with different reference units and income
definitions will vary considerably from country to country and also over time
depending on the extent of redistributive policies, details of tax law, patterns
of consumption and savings, family structure, and other factors. In
other words, ρab is not constant but varies across countries i and years t.
Further, ρabit is only directly calculable for those pairs of categories in those
countries and years for which it is not immediately useful, that is, only when
data is already available in both categories for that observation.
Those ratios ρabit that are directly calculable are valuable nevertheless
because they provide information about what the ratios that are missing
are likely to be. Because the factors that affect these ratios—redistributive
policies, patterns of consumption, and so on—tend to change only slowly over