Skip to content
Scientific journal publication

Evaluating the role of low-cost sensors in machine learning based European PM2.5 monitoring

Shetty, Shobitha; Hassani, Amir; Hamer, Paul David; Stebel, Kerstin; Salamalikis, Vasileios; Berntsen, Terje Koren; Castell, Núria Balaguer; Schneider, Philipp

Publication details

Journal: Environmental Research, vol. 291, 123558, 2026

Doi: doi.org/10.1016/j.envres.2025.123558
Archive: nva.sikt.no/registration/019b69e39445-471dd000-4962-4c5a-a70c-029e94819080

Summary:
We evaluate the added value of integrating validated Low-Cost Sensor (LCS) data into a Machine Learning (ML) framework for providing surface PM2.5 estimates over Central Europe at 1 km spatial resolution. The synergistic ML-based S-MESH (Satellite and ML-based Estimation of Surface air quality at High resolution) approach is extended, to incorporate LCS data through two strategies: using validated LCS data as a target variable (LCST) and as an input feature via an inverse distance weighted spatial convolution layer (LCSI). Both strategies are implemented within a stacked XGBoost model that ingests satellite-derived aerosol optical depth, meteorological variables, and CAMS (Copernicus Atmospheric Monitoring Service) regional forecasts. Model performance for 2021–2022 is evaluated against a baseline trained on air quality monitoring stations without any form of LCS integration. Our results indicate that the LCSI approach consistently outperforms both the baseline and LCST models, particularly in urban areas, with RMSE reductions of up to 15–20 %. It also exhibits higher accuracy than the CAMS regional interim reanalysis with a lower annual mean absolute error (MAE) of 2.68 μg/m3 compared to 3.32 μg/m3. SHapley Additive exPlanations based analysis indicates that LCSI information improves both spatial and temporal representativeness, with the LCSI strategy better capturing localized pollution dynamics. However, the LCSI's dependency on the spatial LCS layer limits its ability to capture inter-urban pollution transport in regions with sparse or no LCS data. These findings highlight the value of large-scale sensor networks in addressing spatial coverage gaps in official air quality monitoring stations and advancing high-resolution air quality modeling.