Long-term evaluation of commercial air quality sensors: an overview from the QUANT (Quantification of Utility of Atmospheric Network Technologies) study
Leveraging Low-Cost Sensor Technology for Enhanced Air Quality Monitoring
In times of growing concern about the impacts of air pollution across the globe, lower-cost sensor technology is giving the first steps in helping to enhance our understanding and ability to manage air quality issues, particularly in regions without established monitoring networks. While the benefits of greater spatial coverage and real-time measurements that these systems offer are evident, challenges still need to be addressed regarding sensor reliability and data quality.
Given the limitations imposed by intellectual property, commercial implementations are often ‘black boxes’, which represents an extra challenge as it limits end users’ understanding of the data production process. In this article, we present an overview of the QUANT (Quantification of Utility of Atmospheric Network Technologies) study, a comprehensive 3-year assessment across a range of urban environments in the United Kingdom, evaluating 43 sensor devices, including 119 gas sensors and 118 particulate matter (PM) sensors, from multiple companies.
QUANT stands out as one of the most comprehensive studies of commercial air quality sensor systems carried out to date, encompassing a wide variety of companies in a single evaluation and including two generations of sensor technologies. Integrated into an extensive dataset open to the public, it was designed to provide a long-term evaluation of the precision, accuracy and stability of commercially available sensor systems.
To attain a nuanced understanding of sensor performance, we have complemented commonly used single-value metrics (e.g. coefficient of determination, R2; root mean square error, RMSE; mean absolute error, MAE) with visual tools. These include regression plots, relative expanded uncertainty (REU) plots and target plots, enhancing our analysis beyond traditional metrics.
This overview discusses the assessment methodology and key findings showcasing the significance of the study. While more comprehensive analyses are reserved for future detailed publications, the results shown here highlight the significant variation between systems, the incidence of corrections made by manufacturers, the effects of relocation to different environments and the long-term behaviour of the systems.
Additionally, the importance of accounting for uncertainties associated with reference instruments in sensor evaluations is emphasised. Practical considerations in the application of these sensors in real-world scenarios are also discussed, and potential solutions to end-user data challenges are presented.
Offering key information about the sensor systems’ capabilities, the QUANT study will serve as a valuable resource for those seeking to implement commercial solutions as complementary tools to tackle air pollution.
Addressing the Challenges of Lower-Cost Air Quality Sensors
Emerging lower-cost sensor systems1 offer a promising alternative to the more expensive and complex monitoring equipment traditionally used for measuring air pollutants such as PM2.5, NO2 and O3 (Okure et al., 2022). These innovative devices hold the potential to expand spatial coverage (Malings et al., 2020) and deliver real-time air pollution measurements (Tanzer-Gruener et al., 2020). However, concerns regarding the variable quality of the data they provide still hinder their acceptance as reliable measurement technologies (Karagulian et al., 2019; Zamora et al., 2020).
Sensors2 face key challenges such as cross-sensitivities (Bittner et al., 2022; Cross et al., 2017; Levy Zamora et al., 2022; Pang et al., 2018), internal consistency (Feenstra et al., 2019; Ripoll et al., 2019), signal drift (Miech et al., 2023; Li et al., 2021; Sayahi et al., 2019), long-term performance (Bulot et al., 2019; Liu et al., 2020) and data coverage (Brown and Martin, 2023; Duvall et al., 2021; Feinberg et al., 2018). Additionally, environmental factors such as temperature and humidity (Bittner et al., 2022; Farquhar et al., 2021; Crilley et al., 2018; Williams, 2020) can significantly influence sensor signals.
In recent years, manufacturers of both sensing elements (Han et al., 2021; Nazemi et al., 2019) and sensor systems have made significant technological advances (Chojer et al., 2020). For example, there are now commercial and non-commercial systems equipped with multiple detectors to measure distinct pollutants (Buehler et al., 2021; Hagan et al., 2019; Pang et al., 2021), helping to mitigate the effects of cross-interference. Additionally, enhancements in electrochemical OEMs have been demonstrated in terms of their specificity (Baron and Saffell, 2017; Ouyang, 2020).
However, the complex nature of their responses, coupled with their dependence on local conditions, means sensor performance can be inconsistent (Bi et al., 2020). This complicates the comparison of results or anticipating future sensor performance across different studies. Moreover, assessments of sensor performance found in the academic literature often rely on a range of protocols (e.g. CEN, 2021, and Duvall et al., 2021) and data quality metrics (e.g. Spinelle et al., 2017, and Zimmerman et al., 2018), with many studies limited to a single-site co-location and/or short-term evaluations that do not fully account for broader environmental variations (Karagulian et al., 2019).
Calibration: A Critical Component for Sensor Reliability
The calibration of any instrument used to measure atmospheric composition is fundamental to guarantee their accuracy (Alam et al., 2020; Long et al., 2021; Wu et al., 2022). Using out-of-the-box sensor data without fit-for-purpose calibration can produce misleading results (Liang and Daniels, 2022). An effective calibration involves not only identifying but also compensating for estimated systematic effects in the sensor readings, a process defined as a correction (for a detailed definition and differentiation of calibration and correction, see JCGM, 2012).
For standard air pollution measurement techniques, calibration is often performed in a controlled laboratory environment (Liang, 2021). For example, for gases, a known concentration is sampled from a certified standard. Similarly, for particulate matter (PM), particles of known density and size are generated. Both gases and PM calibration are conducted under controlled airflow conditions. Yet, the aforementioned challenges with lower-cost sensor-based devices suggest that such calibrations may not always accurately reflect real-world conditions (Giordano et al., 2021).
A frequent approach involves co-locating sensors alongside regulatory instruments in their intended deployment areas and/or conditions and using data-driven methods to match the reference data (Liang and Daniels, 2022). Numerous studies have investigated the effectiveness of calibration methods for sensors (e.g. Bigi et al., 2018; Bittner et al., 2022; Malings et al., 2020; Spinelle et al., 2017; Zimmerman et al., 2018), including selecting appropriate reference instruments (Kelly et al., 2017), the need for regular calibration to maintain accuracy (Gamboa et al., 2023), the necessity of rigorous calibration protocols to ensure consistency (Kang et al., 2022) and transferability (Nowack et al., 2021) of results.
Ultimately, the reliability and associated uncertainty of any applied calibration will influence the final sensor data quality. For end users to make informed decisions on the applicability of air pollution sensors, a realistic understanding of the expected performance in their chosen application is necessary (Rai et al., 2017).
The QUANT Study: Comprehensive Evaluation of Commercial Sensor Systems
Despite this, there has been relatively little progress in clarifying the performance of sensors for air pollution measurements outside of the academic arena. This is largely due to the significant variability in both the number of sensors and the variety of applications tested, compounded by the proliferation of commercially available sensors/sensor systems with different configurations. Furthermore, access to highly accurate measurement instrumentation and/or regulatory networks remains limited for those outside of the atmospheric measurement academic field (e.g. Lewis and Edwards, 2016, and Popoola et al., 2018).
From a UK clean air perspective, this ambiguity represents a major problem. The lack of a consistent message undermines the exploitation of these devices’ unique strengths, notably their capability to form spatially dense networks with rapid time resolution. Consequently, there is potential for a mismatch in users’ expectations of what sensor systems can deliver and their actual operating characteristics, eroding trust and reliability.
In this work, as part of the QUANT project funded by the UK Clean Air programme, we deployed a variety of sensor technologies (43 commercial devices, 119 gas and 118 PM measurements) at three representative UK urban sites – Manchester, London and York – alongside extensive reference measurements to generate the data for a comprehensive in-depth performance assessment.
This project aims to not only evaluate the performance of sensor devices in a UK urban climatological context but also provide critical information for the successful application of these technologies in various environmental settings. To our knowledge, QUANT is the most extensive and longest-running evaluation of commercial sensor systems globally to date.
Furthermore, we tested multiple manufacturers’ data products, such as out-of-the-box data versus locally calibrated data, for a significant number of these sensors to understand the implications of local calibration. This comprehensive approach offers unprecedented insights into the operational capabilities and limitations of these sensors in real-world conditions.
Significantly, some of the insights gathered during QUANT have contributed to the development of the Publicly Available Specification (PAS 4023, 2024), which provides guidelines for the selection, deployment, maintenance and quality assurance of air quality sensor systems.
While this paper serves as an initial overview, detailed analyses of the measured pollutants and study phases, offering a more comprehensive perspective on sensor performance, are planned for future publications.
Methodology: Capturing Urban Air Quality Dynamics
To capture the variability in UK urban environments, identical units were installed at three carefully selected field sites. Two of these sites are highly instrumented urban background measurement supersites: the London Air Quality Supersite (LAQS) and the Manchester Air Quality Supersite (MAQS), located in densely populated urban areas with unique air quality challenges. The third site is a roadside monitoring site in York, which is part of the Automatic Urban and Rural Network (AURN), representing an urban environment more influenced by traffic.
This selection strategy ensures that the QUANT study’s findings reflect the dynamics of urban air quality across different UK settings while providing comprehensive reference measurements. Further details about each site can be found in Sect. S1 in the Supplement.
The main QUANT assessment study aimed to perform a transparent long-term (19 December 2019–31 October 2022) evaluation of commercially available sensor technologies for outdoor air pollution monitoring in UK urban environments. Four units of five different commercial sensor devices (Table 1) were purchased in September 2019 for inclusion in the study, with the selection criteria being market penetration and/or previous performance reported in the literature, ability to measure pollutants of interest (e.g. NO2, NO, O3 and PM2.5), and capacity to run continuously reporting high-time-resolution data (1–15-min data) ideally in near-real time (i.e. available within minutes of measurement) with data accessible via an application programming interface (API).
Table 1: Main QUANT devices description.
Initially, all the sensors were deployed in Manchester for approximately 3 months (mid-December 2019 to mid-March 2020) before being split up amongst the three sites (Fig. 1). At least one unit per brand was re-deployed to the other two sites (mid-March 2020 to early July 2022), leaving two devices per company in Manchester to assess inter-device consistency. In the final 4 months of the study, all the sensor systems were relocated back to Manchester (early July 2022 to the end of October 2022).
Figure 1: Main QUANT and Wider Participation Study (WPS) timeline.
The Wider Participation Study (WPS) was a no-cost complementary extension of the QUANT assessment, specifically designed to foster innovation within the air pollution sensors domain. This segment of the study took place entirely at the MAQS from 10 June 2021 to 31 October 2022 (Fig. 1). It included a wider array of commercial platforms (nine different sensor system brands) and offered manufacturers the opportunity to engage in a free-of-charge impartial evaluation process.
Although participation criteria matched those of the main QUANT study, a key distinction lay in the voluntary nature of participation: manufacturers were invited to contribute multiple sensor devices throughout the WPS study (see Table 2). Participants were able to demonstrate their systems’ performance against collocated high-resolution (1-min) reference data at a state-of-the-art measurement site such as the Manchester supersite.
Table 2: The 23 WPS devices deployed at the Manchester supersite.
All sensor devices were installed at the measurement sites as per manufacturer recommendations, adhering strictly to manufacturers’ guidelines for electrical setup, mounting, cleaning and maintenance. Since all deployed systems were designed for outdoor use, no additional protective measures were necessary.
Each of the systems were mounted on poles acquired specifically for the project or on rails at the co-location sites, without the need for special protections. Following the manufacturer’s suggestions, sensors were positioned within 3 m of the reference instruments’ inlets.
Custom electrical setups were developed for each sensor type, incorporating local energy sources and weather-resistant safety features, alongside security measures to deter vandalism and ensure uninterrupted operation. Routine maintenance was conducted monthly, although the COVID-19 pandemic necessitated longer intervals between visits.
Despite these obstacles, efforts to maintain sensor security and functionality continued unabated, employing both physical safeguards and remote monitoring to preserve data integrity. In addition to the device supplier’s own cloud storage (accessed on-demand via each supplier’s web portals), an automated daily scraping of each company’s API was performed to save data onto a secure server at the University of York to ensure data integrity.
Unlike other brands that utilise mobile data connections, PurpleAir sensors rely on wi-fi for data transmission. Due to the poor internet signal at the sites, we locally collected and manually uploaded readings for these units. Minor pre-processing was applied at this stage, including temporal harmonisation to ensure that all measurements had a minimum sampling period of 1 min, ensuring consistency in measurement units and labels and coercing them into the same format to allow for full compatibility across sensor units.
No additional modifications to the original measurements were applied; missing values were kept as missing, and no additional flags were created based on the measurements beyond those provided by the manufacturers. For an overview of the sensor measurands and their corresponding data time resolutions as provided by the companies participating in the main QUANT study and the WPS, please see Sects. S3 and S4 (Tables S4 and S5 in the Supplement) respectively.
In addition to providing an independent assessment of sensor performance, QUANT also aimed to collaborate with device manufacturers to help advance the field of air pollution sensors. During QUANT, device calibrations were performed solely at the discretion of the manufacturers without any intervention from our team, thus limiting the involvement of manufacturers in the provision of standard sensor outputs and unit maintenance as would be required by any standard customer.
This approach enabled manufacturers to independently assess and benchmark their sensors’ performance, using provided reference data to potentially develop calibrated data products. It is noteworthy that not all manufacturers chose to utilise these data for corrections or enhancements. However, those who did were expected to create and submit calibrated data products, subsequently named as ‘out-of-box’ (initial data product), ‘cal1’ (first calibrated product) and ‘cal2’ (second calibrated product).
This differentiation highlighted the varying degrees of engagement and application of the reference data by different manufacturers. Figures S2 and S3 (Sects. S3 and S4 respectively) show a timeline of the different data products. To this end, three separate 1-month periods of reference data, spaced every 6 months, were shared with each supplier, including provisional data soon after each period and ratified data when available.
All reference data were embargoed until they were released to all manufacturers simultaneously to ensure consistency across manufacturers. For an overview of reference and equivalent-to-reference instrumentation, as defined in the European Union Air Quality Directive 2008/50/EC (hereafter referred to as EU AQ Directive), at each site, please refer to Sect. S2 (Table S1). For details on the quality assurance procedures applied to the reference instruments, see Table S2. To see the dates and periods of the shared reference data refer to Table S3.
Evaluating Sensor Performance: Beyond Traditional Metrics
A key challenge in sensor performance evaluation is the high-spatial- and high-temporal-variability errors that impact the accuracy of their readings, making the application of laboratory corrections more challenging. Furthermore, the overreliance on global performance metrics is a significant concern in sensor assessment.
The coefficient of determination (R2), root mean squared error (RMSE) and mean absolute error (MAE) are among the most popular single-value metrics for evaluating sensor performance, alongside others (e.g. the bias, the slope and the intercept of the regression fit). However, while single-value metrics offer an overview of performance, they can be limiting or