TY - JOUR
T1 - Associations between the urban exposome and type 2 diabetes
T2 - Results from penalised regression by least absolute shrinkage and selection operator and random forest models
AU - Ohanyan, Haykanush
AU - Portengen, Lützen
AU - Kaplani, Oriana
AU - Huss, Anke
AU - Hoek, Gerard
AU - Beulens, Joline W J
AU - Lakerveld, Jeroen
AU - Vermeulen, Roel
N1 - Funding Information:
This work is supported by EXPOSOME-NL. EXPOSOME-NL is funded through the Gravitation program of the Dutch Ministry of Education, Culture, and Science and the Netherlands Organization for Scientific Research (NWO grant number 024.004.017 ).
Publisher Copyright:
© 2022 The Authors
PY - 2022/12
Y1 - 2022/12
N2 - BACKGROUND: Type 2 diabetes (T2D) is thought to be influenced by environmental stressors such as air pollution and noise. Although environmental factors are interrelated, studies considering the exposome are lacking. We simultaneously assessed a variety of exposures in their association with prevalent T2D by applying penalised regression Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Artificial Neural Networks (ANN) approaches. We contrasted the findings with single-exposure models including consistently associated risk factors reported by previous studies.METHODS: Baseline data (n = 14,829) of the Occupational and Environmental Health Cohort study (AMIGO) were enriched with 85 exposome factors (air pollution, noise, built environment, neighbourhood socio-economic factors etc.) using the home addresses of participants. Questionnaires were used to identify participants with T2D (n = 676(4.6 %)). Models in all applied statistical approaches were adjusted for individual-level socio-demographic variables.RESULTS: Lower average home values, higher share of non-Western immigrants and higher surface temperatures were related to higher risk of T2D in the multivariable models (LASSO, RF). Selected variables differed between the two multi-variable approaches, especially for weaker predictors. Some established risk factors (air pollutants) appeared in univariate analysis but were not among the most important factors in multivariable analysis. Other established factors (green space) did not appear in univariate, but appeared in multivariable analysis (RF). Average estimates of the prediction error (logLoss) from nested cross-validation showed that the LASSO outperformed both RF and ANN approaches.CONCLUSIONS: Neighbourhood socio-economic and socio-demographic characteristics and surface temperature were consistently associated with the risk of T2D. For other physical-chemical factors associations differed per analytical approach.
AB - BACKGROUND: Type 2 diabetes (T2D) is thought to be influenced by environmental stressors such as air pollution and noise. Although environmental factors are interrelated, studies considering the exposome are lacking. We simultaneously assessed a variety of exposures in their association with prevalent T2D by applying penalised regression Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Artificial Neural Networks (ANN) approaches. We contrasted the findings with single-exposure models including consistently associated risk factors reported by previous studies.METHODS: Baseline data (n = 14,829) of the Occupational and Environmental Health Cohort study (AMIGO) were enriched with 85 exposome factors (air pollution, noise, built environment, neighbourhood socio-economic factors etc.) using the home addresses of participants. Questionnaires were used to identify participants with T2D (n = 676(4.6 %)). Models in all applied statistical approaches were adjusted for individual-level socio-demographic variables.RESULTS: Lower average home values, higher share of non-Western immigrants and higher surface temperatures were related to higher risk of T2D in the multivariable models (LASSO, RF). Selected variables differed between the two multi-variable approaches, especially for weaker predictors. Some established risk factors (air pollutants) appeared in univariate analysis but were not among the most important factors in multivariable analysis. Other established factors (green space) did not appear in univariate, but appeared in multivariable analysis (RF). Average estimates of the prediction error (logLoss) from nested cross-validation showed that the LASSO outperformed both RF and ANN approaches.CONCLUSIONS: Neighbourhood socio-economic and socio-demographic characteristics and surface temperature were consistently associated with the risk of T2D. For other physical-chemical factors associations differed per analytical approach.
KW - Deep learning
KW - Machine learning
KW - Neighbourhood socio-demographic characteristics
KW - Neighbourhood socio-economic position
KW - Temperature
UR - http://www.scopus.com/inward/record.url?scp=85140730197&partnerID=8YFLogxK
U2 - 10.1016/j.envint.2022.107592
DO - 10.1016/j.envint.2022.107592
M3 - Article
C2 - 36306550
SN - 0160-4120
VL - 170
JO - Environment International
JF - Environment International
M1 - 107592
ER -