2.7. Multivariate modeling
We built multivariate linear regression models, using a manual forward-stepwise procedure to determine outdoor infiltration and indoor sources of pollution. Outliers outside of the mean ± 3 × standard deviations were removed prior to model building; one outlier home in the summer was removed where PM2.5 was greater than 121 μg/m3, where residents reported smoking. Covariates significant at p < = 0.20 in the bivariate analysis were considered candidate covariates, and individually incorporated into each model. Given our interest in assessing the impact of outdoor concentrations in industrial communities on indoor exposures, we first incorporated the location- and week-specific (LUR-based) outdoor concentration estimate into each model, and examined effect modification by the ventilation proxies (I/O sulfur ratio, and percent of time windows were open). We then tested each of the additional candidate outdoor source term, ordered by descending strength of the bivariate correlation, then tested significant source terms for effect modification by ventilation. Finally, we tested indoor source terms, ordered by descending strength of the bivariate correlation, and tested for effect modification by ventilation on each term (Baxter et al., 2007a).
Model fit was assessed at each stage, using the coefficient of determination (R2) and root mean square error (RMSE). For a covariate to be retained at each stage, we required p-value < 0.10, an increase in R2 of at least 0.01, a decrease in RMSE, and VIF < 2.0 for all model terms. At each stage, non-significant covariates were individually removed by descending p-value, and the model re-fit. In Table 3 and Table 4, we report the pollutant concentration increases associated with an interquartile range (IQR) increase in each source indicator (β × IQR) from each multivariate model.