Technical note: An assessment of the performance of … – Atmospheric Chemistry and Physics

Technical note: An assessment of the performance of … – Atmospheric Chemistry and Physics

Introduction

State-of-the-art chemistry-climate models (CCMs) still show biases compared to ground-level ozone observations, illustrating the difficulties and challenges remaining in the simulation of atmospheric processes governing ozone production and loss. Therefore, CCM output is frequently bias-corrected in studies seeking to explore the health or environmental impacts from changing air quality burdens.

Here, we assess four statistical bias correction techniques of varying complexities and their application to surface ozone fields simulated with four CCMs and evaluate their performance against gridded observations in the EU and US. We focus on two time periods (2005–2009 and 2010–2014), where the first period is used for development and training and the second to evaluate the performance of techniques when applied to model projections.

We find that all methods are capable of significantly reducing the model bias. However, biases are lowest when we apply more complex approaches such as quantile mapping and delta functions. We also highlight the sensitivity of the correction techniques to individual CCM skill at reproducing the observed distributional change in surface ozone. Ensemble simulations available for one CCM indicate that model ozone biases are likely more sensitive to the process representation embedded in chemical mechanisms than to meteorology.

Data and Methods

Observational ozone dataset

Surface ozone (O3) is both an air pollutant and a greenhouse gas, formed in photochemical reactions involving precursor substances such as nitrogen oxides (NOx) and volatile organic compounds (VOCs) of both anthropogenic origin and non-anthropogenic origin (e.g., Checa-Garcia et al., 2018; Lelieveld and Dentener, 2000; Monks et al., 2015). In addition to the availability of precursor gases, the NOx to VOC ratio, solar radiation and ambient air temperature, controlling emissions of biogenic VOCs (BVOCs), and chemical reaction rates play a crucial role for O3 formation (Chameides et al., 1988; Sillman, 1999; Sillman et al., 1990).

Tropospheric O3 abundance is also substantially influenced by stratospheric intrusions, which can, in certain regions or during specific events, alter concentrations significantly (Akritidis et al., 2010; Lin et al., 2015; Stohl et al., 2003). O3 is associated with a variety of detrimental human health effects, especially in the context of the respiratory and cardiovascular systems, resulting in about 5–20% of premature deaths attributable to ambient air pollution (Gu et al., 2023; Malashock et al., 2022; Monks et al., 2015; Murray et al., 2020; Pozzer et al., 2023; Zhang et al., 2019).

For the analysis here, we obtain observed MDA8 O3 with a spatial resolution of 1°×1° per grid cell for both the European domain and the US domain using an extended dataset constructed using the methods of Schnell et al. (2014, 2015) and Schnell and Prather (2017), which was designed specifically to compare against gridded CCMs. The dataset was constructed using an inverse distance weighted interpolation method that includes a de-clustering component similar to kriging; i.e., clustered (within 100 km) observation weights are reduced such that those stations (often located around urban centers) are not disproportionately used in the interpolation.

Model data

The O3 datasets explored in our analysis are hourly surface O3 outputs from three CCMs (GFDL-ESM4, UKESM1-0-LL and EC-Earth3) contributing to CMIP6 and a 13-member ensemble simulation created with CESM2-WACCM6. For most of our study, we use only the first ensemble member of CESM2-WACCM6 to be analogous with the other CCMs, given the overall heterogeneity in the number of members available per model. In Sect. 3.4, we focus on the chemical vs. meteorological driving of model biases and utilize the entire CESM2-WACCM6 ensemble.

To allow for an optimal comparison, the model data are re-gridded using an ordinary inverse distance weighting algorithm to match the spatial extent of the observations. In addition, all datasets are harmonized regarding their temporal resolution by removing days not included in any of the other datasets, resulting in a 358-d calendar (30-d per month except for February). MDA8 O3 is derived for each dataset and time step according to the European nomenclature.

Bias correction techniques

For statistical bias correction, we apply four different techniques that are detailed below. Here, Mq and Oq denote quantiles (q=1,…,N⊥1=min, N=max) of the model and observational distributions, respectively. The running index j marks individual MDA8 O3 model values. Additionally, we use the indices ‘hist’ and ‘proj’ to differentiate between historical and projected data. Primed terms indicate the bias-corrected model outputs.

Mean Bias (MB):
The MB is a commonly used approach assuming a constant offset between the model and observations. As an initial step, we derive the average difference of the historical model and observational percentiles. Alternatively, the difference between the mean values of both empirical cumulative distribution functions (ECDFs) can be computed. Subsequently, we subtract the result of Eq. (1) from each quantile of the projected model distribution to retrieve a bias-corrected model ECDF (Eq. 2).

Relative Bias (RB):
In contrast to the MB correction, the RB method derives the average of the relative deviation of the historic model and observational percentiles (Eq. 3). The bias-corrected model projection (Eq. 4) is then calculated as the difference between the raw model and the observed quantiles times the correction term established in Eq. (3).

Delta Correction (DC):
The DC approach follows the methodology detailed in Rieder et al. (2018). In contrast to the MB and RB methods, it is assumed that while the individual model values may be biased, the system response (i.e., change between two time periods) is represented adequately by the model. Therefore the deviation between future and base period model data is calculated for all quantiles individually (Eq. 5). Finally, the corrected model projection is derived as the observed distribution plus the initially computed model change (Eq. 6).

Quantile Mapping (QM):
The QM is a multistep approach. The first steps, illustrated in Eqs. (7) to (9), consist of the computation of a bias-corrected historic model distribution. Next, the result is used to create a bias-corrected future ECDF, similar to the DC method (Eqs. 10 and 11), which is then employed to derive the bias-corrected future model data (Eqs. 12 to 14).

All four methods are applied to the ECDFs of the individual CCM datasets (1) on a monthly basis within the base time interval, (2) for each grid cell individually, and (3) for both the EU domain and US domain.

Results

Evaluation of raw model performance

We start by evaluating the performance of the global models in representing the MDA8 O3 burden for the historical time period (2005–2009). Figure 1a and b show the pooled MDA8 O3 probability density function for the models and gridded observations for the EU and US domains. Pronounced differences emerge between the individual models and observations for both domains. Generally, the models show a high bias compared to observations, and the amplitude of the bias varies substantially among models.

One exception in this regard is the EC-Earth3 model, which shows a high bias compared to observations across the majority of the MDA8 O3 distribution but in contrast to other models has a low bias at the upper tail. We further investigate the magnitude of the model biases in Fig. 1c and d by contrasting the annual average number (and seasonal partitioning) of days above the target value to protect human health, defined as 60 and 70 ppb for the EU and US domains, respectively.

Our findings indicate slightly better agreement in CCMs regarding the policy-relevant metrics in the US than in the EU, a fact which has to be taken with caution also given the regional difference in the MDA8 O3 target value. Assuming the same target threshold as for Europe, we find that the number of exceedance days ranges between 20 and 174.

The spatial distribution of differences confirms the biases detailed above, showing regionally varying but distinct biases of the models examined. Of the models examined, the EC-Earth3 model performs best in both domains, with a domain average bias of +7 (EU) and +3 d (US). While pronounced differences in the magnitude of the bias between individual models occur, the spatial patterns in biases are quite similar.

Bias correction performance

To investigate the consistency of the spatial bias in models compared to observations, we expand the analysis to the 2010–2014 time period. Although slight variations are found for individual seasons, overall the result for this time period resembles the results obtained for 2005–2009 in both the US domain and the EU domain.

All methods reduce the bias substantially. The MB and RB methods yield similar results. Both methods tend to overcorrect the bias, yielding residual biases for individual grid cells varying between −22 to +8 d (EU) and −10 to +6 d (US), with MB performing slightly better. In contrast, the QM method yields almost perfect agreement (comparable to the DC method as detailed above) with observations. Residual biases are between −2 and +1 d for Europe and 0 d for the US.

Spatial distributions of the anomaly on exceedance days are illustrated in Figs. S3 and S4. We find that the application of a particular method yields similar spatial patterns of improvement independent of the model to which it is applied and independent of the initial model bias.

Applying the adjustment methods to the MDA8 O3 outputs of the individual models during the evaluation time period (2010–2014) yields a larger residual bias in the European domain, ranging between −17 and +11 exceedance days, than in the US (−5 to +5 d) across grid cells. Furthermore, contrasting the performance of the individual bias correction techniques yields a curious result, as we no longer identify an individual correction technique as optimal across models and spatial domains.

Our findings show that the correction approach yielding the lowest residual bias varies strongly across models and spatial domains. These results are supported by the analysis of the PDFs of the bias-corrected model output (Fig. S8). While conformity with observations remains widely similar for the majority of the distribution, the adjustment of the high tail yields slightly better results in the context of the MB and RB methods when compared to the base period. Contrarily, the distributions of both the DC and QM methods show good agreement with the low tail and the midsection of the observational PDF. The performance, however, deteriorates towards the high tail, partially resulting in an overestimation of the monitored distribution, especially in the European domain.

Understanding the error sources

To further investigate this curious result, we examine, on a quantile basis across the MDA8 O3 distributions, (i) the error resulting from the initial bias correction of the base period (EB) and (ii) the error resulting from the deviation of the model change between the base and evaluation periods when compared to observations (EΔ).

For the base period, it is apparent that the QM correction technique, in contrast to the RB and MB corrections, yields only minor differences across the MDA8 O3 distribution when compared to the observations in both spatial domains. For the evaluation period, we see that the difference in response between models and observations dominates the raw performance of the individual correction techniques and that the residual bias depends strongly on the region and model concerned.

Given this result, we assume that the correction performance depends strongly on models being able to represent precursor emission changes over time as seen in observations. All models show distinct biases in reproducing observed ozone changes between the two time periods, with a particularly pronounced magnitude in the tails of the distributions. Although both error terms and the resulting net error are found to be rather small in the domain average (roughly ±5 ppb), they might have a strong influence on the individual grid-cell level.

Especially for the MB and RB techniques, the individual errors might compensate for each other, as illustrated by the improved results relative to the base period. The DC and QM approaches, on the other hand, strongly depend on the quality of the model response in time. Here, we find that pronounced errors in the model change offset (at least in part) the benefits illustrated for the base period.

To shed light on the underlying cause of biased MDA8 O3 model outputs, we analyze the 13 members of the CESM2-WACCM6 ensemble in more detail. Our analysis showed only small variations within the CESM2-WACCM6 ensemble for core meteorological drivers (and chemical covariates) of surface ozone. This suggests, given that emissions are consistent across models, a dominant influence of the chemical mechanism on the bias in the O3 fields rather than a prominent role of model meteorology.

Conclusions

We have evaluated the bias in surface ozone burdens for four global CCMs contributing to CMIP6 and presented a comprehensive comparison of the performance of four different statistical bias correction techniques. While all models show biases when compared to observations, the bias magnitude of the raw, uncorrected MDA8 O3 outputs differs strongly within the pool of models analyzed.

Our results illustrate that technique performance depends strongly on the model selected and its MDA8 O3 evolution over time and thus on the response to boundary condition changes. This at first surprising result can be explained by the examination of the composition of the residual model error, which is comprised of two parts: (1) the residual error of the base period EB and (2) the error attributable to the model response to changes in boundary conditions (emissions, climate, etc.) between both time periods, EΔ. The magnitude of EΔ was found to exert a dominant influence on the overall correction performance, which raises some concerns regarding the robustness of model responses and thus the reliability of model projections.

Ensemble simulations available for one CCM indicate the ozone bias arises from sensitivities in chemical mechanisms or emissions rather than driving meteorology. Future work should confirm that this finding holds for other global models, and thus an ensemble strategy for model experiments is recommended for future model intercomparison activities.

Facebook
Pinterest
Twitter
LinkedIn

Newsletter

Signup our newsletter to get update information, news, insight or promotions.

Latest Post