Do Machine Learning Methods Improve Prediction of Ambient Air Pollutants with High Spatial Contrast? A Systematic Review

Julien Vachon, Jules Kerckhoffs, Stéphane Buteau, Audrey Smargiassi*

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

Abstract

BACKGROUND & OBJECTIVE: The use of machine learning for air pollution modelling is rapidly increasing. We conducted a systematic review of studies comparing statistical and machine learning models predicting the spatiotemporal variation of ambient nitrogen dioxide (NO 2), ultrafine particles (UFPs) and black carbon (BC) to determine whether and in which scenarios machine learning generates more accurate predictions.

METHODS: Web of Science and Scopus were searched up to June 13, 2024. All records were screened by two independent reviewers. Differences in the coefficient of determination (R 2) and Root Mean Square Error (RMSE) between best statistical and machine learning methods were compared across categories of methodological elements.

RESULTS: A total of 38 studies with 46 model comparisons (30 for NO 2, 8 for UFPs and 8 for BC) were included. Linear non-regularized methods and Random Forest were most frequently used. Machine learning outperformed statistical models in 34 comparisons. Mean differences (95% confidence intervals) in R 2 and RMSE between best machine learning and statistical models were 0.12 (0.08, 0.17) and 20% (11%, 29%) respectively. Tree-based methods performed best in 12 of 17 multi-model comparisons. Nonlinear or regularization regression methods were used in only 12 comparisons and provided similar performance to machine learning methods.

CONCLUSION: This systematic review suggests that machine learning methods, especially tree-based methods, may be superior to linear non-regularized methods for predicting ambient concentrations of NO 2, UFPs and BC. Additional comparison studies using nonlinear, regularized and a wider array of machine learning methods are needed to confirm their relative performance. Future air pollution studies would also benefit from more explicit and standardized reporting of methodologies and results.

Original languageEnglish
Article number119751
Number of pages13
JournalEnvironmental Research
Volume262
Issue numberPt 2
Early online date6 Aug 2024
DOIs
Publication statusPublished - Dec 2024

Bibliographical note

Publisher Copyright:
© 2024 The Author(s)

Funding

This work was supported by a scholarship from the Canadian Institutes of Health Research [Funding Reference Number 181362]

FundersFunder number
Canadian In-stitutes of Health Research181362

    Keywords

    • BC
    • Exposure assessment
    • Machine learning
    • NO
    • Spatial-temporal prediction
    • UFPs

    Fingerprint

    Dive into the research topics of 'Do Machine Learning Methods Improve Prediction of Ambient Air Pollutants with High Spatial Contrast? A Systematic Review'. Together they form a unique fingerprint.

    Cite this