abstract
To overcome the challenging task to select an appropriate pathlength for wastewater chemical oxygen
demand (COD) monitoring with high accuracy by UV–vis spectroscopy in wastewater treatment process,
a variable pathlength approach combined with partial-least squares regression (PLSR) was developed in
this study. Two new strategies were proposed to extract relevant information of UV–vis spectral data
from variable pathlength measurements. The first strategy was by data fusion with two data fusion
levels: low-level data fusion (LLDF) and mid-level data fusion (MLDF). Predictive accuracy was found to
improve, indicated by the lower root-mean-square errors of prediction (RMSEP) compared with those
obtained for single pathlength measurements. Both fusion levels were found to deliver very robust PLSR
models with residual predictive deviations (RPD) greater than 3 (i.e. 3.22 and 3.29, respectively). The
second strategy involved calculating the slopes of absorbance against pathlength at each wavelength to
generate slope-derived spectra. Without the requirement to select the optimal pathlength, the predictive
accuracy (RMSEP) was improved by 20–43% as compared to single pathlength spectroscopy. Comparing
to nine-factor models from fusion strategy, the PLSR model from slope-derived spectroscopy was found
to be more parsimonious with only five factors and more robust with residual predictive deviation (RPD)
of 3.72. It also offered excellent correlation of predicted and measured COD values with R2 of 0.936.
In sum, variable pathlength spectroscopy with the two proposed data analysis strategies proved to be
successful in enhancing prediction performance of COD in wastewater and showed high potential to be
applied in on-line water quality monitoring.