Abstract
BACKGROUND & OBJECTIVE: The use of machine learning for air pollution modelling is rapidly increasing. We conducted a systematic review of studies comparing statistical and machine learning models predicting the spatiotemporal variation of ambient nitrogen dioxide (NO 2), ultrafine particles (UFPs) and black carbon (BC) to determine whether and in which scenarios machine learning generates more accurate predictions.
METHODS: Web of Science and Scopus were searched up to June 13, 2024. All records were screened by two independent reviewers. Differences in the coefficient of determination (R 2) and Root Mean Square Error (RMSE) between best statistical and machine learning methods were compared across categories of methodological elements.
RESULTS: A total of 38 studies with 46 model comparisons (30 for NO 2, 8 for UFPs and 8 for BC) were included. Linear non-regularized methods and Random Forest were most frequently used. Machine learning outperformed statistical models in 34 comparisons. Mean differences (95% confidence intervals) in R 2 and RMSE between best machine learning and statistical models were 0.12 (0.08, 0.17) and 20% (11%, 29%) respectively. Tree-based methods performed best in 12 of 17 multi-model comparisons. Nonlinear or regularization regression methods were used in only 12 comparisons and provided similar performance to machine learning methods.
CONCLUSION: This systematic review suggests that machine learning methods, especially tree-based methods, may be superior to linear non-regularized methods for predicting ambient concentrations of NO 2, UFPs and BC. Additional comparison studies using nonlinear, regularized and a wider array of machine learning methods are needed to confirm their relative performance. Future air pollution studies would also benefit from more explicit and standardized reporting of methodologies and results.
Original language | English |
---|---|
Article number | 119751 |
Number of pages | 13 |
Journal | Environmental Research |
Volume | 262 |
Issue number | Pt 2 |
Early online date | 6 Aug 2024 |
DOIs | |
Publication status | Published - Dec 2024 |
Bibliographical note
Publisher Copyright:© 2024 The Author(s)
Funding
This work was supported by a scholarship from the Canadian Institutes of Health Research [Funding Reference Number 181362]
Funders | Funder number |
---|---|
Canadian In-stitutes of Health Research | 181362 |
Keywords
- BC
- Exposure assessment
- Machine learning
- NO
- Spatial-temporal prediction
- UFPs