Log-Linear Regression Model for Pneumonia Incidence of Children
Aged under Five Years in Surat Thani, Thailand 1999-2007
1. INTRODUCTION
Pneumonia is estimated to be the leading cause of mortality in the world among children
less than 5 years of age, with more than 95 % of all clinically-diagnosed episodes occurring
in developing countries [1]. It is caused by viruses, bacteria or other infective agents entering
the respiratory tract. Although originally regarded as an infectious disease, pneumonia is now
classified by ICD10 as a disease of the respiratory system. Respiratory tract infections are
not only more prevalent but more severe, accounting for more than 4 million deaths annually.
Pneumonia is the number one killer of children in developing societies [2].
KONGCHOUY N ET AL.
1.1. Disease etiology and severity in Thailand
In Thailand, all hospital-diagnosed infectious disease cases are routinely recorded by the
Ministry of Public Health in each of its 12 administrative zones, and these records include
pneumonia. In the seven provinces of the upper southern zone, pneumonia accounted for 6 %
of all disease cases over the nine-year period 1999-2007, and was thus the fourth most common
disease reported after diarrhea (51.2 % of cases), pyrexia of unknown origin (10.4 %), and
conjunctivitis (6.4 %). Among the diseases reported, pneumonia was by far the most lethal,
accounting for 47.7 % of all deaths from hospital-diagnosed cases of infectious diseases in the
region during the same period. However, while 59 % of these pneumonia cases occurred among
children aged less than 5 years, 89 % of the deaths occurred among older persons.
Of the seven provinces in the zone, Surat Thani province recorded the highest average
incidence rate of pneumonia cases (8.3 %) during the nine years. This province is the largest
in area and the second largest in population.
Previous publications on pneumonia mortality and morbidity in Thailand are not extensive.
They include a study by Brady et al [3] of pneumonia cases reported in 1999-2001 by the
Ministry of Public Health surveillance system in Sakaeo province near the Cambodian border.
They found that pneumonia deaths were under-reported, compared to data available from
death certificates. Suwanjutha et al [4] studied risk factors associated with mortality and
morbidity of community acquired pneumonia in Thai children younger than 5 years of age.
Based on a logistic regression model they found factors associated with severe pneumonia were
underlying heart disease, enlarged liver and cyanosis, and recommended that these findings
should be recognised by physicians treating young children with pneumonia. Reechaipichitkul
and Tantiwong [5] studied clinical features of community acquired pneumonia among patients
treated at Srinagarind Hospital in Khon Kaen province in the north-eastern region.
1.2. Objectives
While it is important to identify risk factors for pneumonia disease and thus provide a scientific
basis for setting up more effective prevention programs, our scientific objective in this study was
to identify a method to better understand the extent and patterns of temporal (seasonal and
trend) and regional variation for the disease incidence among young children in a province of
Thailand. Such knowledge can provide an effective basis for prevention when limited available
resources need to be allocated to places and in periods of increased risk. Our statistical
objective was to develop appropriate methods for the data analysis of such disease incidence.
Disease counts in individual cells, defined by period and district of illness, are mostly small
and often zero, so Poisson and negative binomial generalized linear models are often considered
most statistically appropriate, and can be used to identify cells with unusually high disease
occurrences [3], [6], [7], [8]. However, other models based on simple logarithmic transformations
of normal distributions have also been used, particularly for modeling biological counts (see,
for example,[9], [10]), and these models have the advantage that software for handling spatial
and time series correlations are more readily available (see, for example, a recent review by
[11], [12]).
In this study the methods used were based on logarithmic transformations of incidence rates
and negative binomial generalized linear models [13] and we compared results obtained from
applying these methods. We examined the quarterly incidence rates of childhood pneumonia
by age group and gender in districts of Surat Thani province of Thailand over the period
2. METHODS
2.1. Data management
Data used in the current study were taken from a registry of hospital-diagnosed infectious
disease cases collected routinely in each of Thailand’s 76 provinces by the Ministry of Public
Health. Data for each year are available in computer files with records for individual disease
cases and fields comprising characteristics of the subject and the disease, including dates of
sickness and disease diagnosis, the subject’s age, gender, and address, and the severity of the
illness including date of death for mortality cases. After extensive cleaning to correct or impute
data entry errors, the records for Surat Thani province for the nine years from 1999 to 2007
were stored in an SQL database. Pneumonia disease counts aggregated over age group (less
than 1 or 1-4), month and district were then obtained. Surat Thani province is divided into
19 districts. Incidence rates were computed as the number of cases per 1000 residents in each
demographic group and district according to the 2000 Thai Population and Housing Census.
2. METHODS
2.1. Data management
Data used in the current study were taken from a registry of hospital-diagnosed infectious
disease cases collected routinely in each of Thailand’s 76 provinces by the Ministry of Public
Health. Data for each year are available in computer files with records for individual disease
cases and fields comprising characteristics of the subject and the disease, including dates of
sickness and disease diagnosis, the subject’s age, gender, and address, and the severity of the
illness including date of death for mortality cases. After extensive cleaning to correct or impute
data entry errors, the records for Surat Thani province for the nine years from 1999 to 2007
were stored in an SQL database. Pneumonia disease counts aggregated over age group (less
than 1 or 1-4), month and district were then obtained. Surat Thani province is divided into
19 districts. Incidence rates were computed as the number of cases per 1000 residents in each
demographic group and district according to the 2000 Thai Population and Housing Census.
2.2. Statistical methods
We first calculated disease incidence in children aged less than five years in cells defined by
demographic group i, region j, period q and year t as the ratio of the number of reported cases
nijqt to Pij , the corresponding population at risk in 1000s.
The negative binomial GLM [13] is an extension of the Poisson regression model that allows
for over-dispersion. If λijqt denotes the mean incidence rate in demographic group i, region j,
period q and year t, an additive model with this distribution is expressed as
ln (λijqt) = ln(Pij ) + µ + αi + βj + ηq + γt. (1)
The terms αi
, βj , ηq and γt represent demographic group, region, period and year effects,
respectively, and are centred at 0, so that µ is a constant encapsulating the overall incidence.
The variance of this distribution is λijqt(1+λijqt/θ) with the Poisson model arising in the limit
as θ → ∞. The model fit is assessed by comparing deviance residuals with normal quantiles, and
it is also informative to plot observed counts and appropriately scaled incidence rates against
corresponding fitted values based on the model. The model also gives adjusted incidence rates
for each factor of interest, obtained by suppressing the subscripts in Equation (1) corresponding
to the other factors and replacing these terms with a constant satisfying the condition that
the sum of the disease counts based on the adjusted incidence rates matches the total. Sum
contrasts [13] were used to obtain confidence intervals for comparing the adjusted incidence
rates within each factor with the overall incidence rate. An advantage of these confidence
intervals is that they provide a simple criterion for classifying levels of a factor into three
groups according to whether each corresponding confidence interval exceeds, crosses, or is
below the overall mean.
The alternative additive log-linear model for the incidence rates with normally distributed
errors is