Networking Support

Combining neural networks and data assimilation to enhance … – OS

November 10, 2024

1. Introduction

Biogeochemical-Argo (BGC-Argo) float profiles provide substantial information on key vertical biogeochemical dynamics and have been successfully integrated in biogeochemical models via data assimilation (DA) approaches. Although BGC-Argo assimilation results have been encouraging, data scarcity remains a limitation with respect to their effective use in operational oceanography.

To address availability gaps in the BGC-Argo profiles, an observing system experiment (OSE) that combines a neural network (NN) and data assimilation (DA) was performed. A NN was used to reconstruct nitrate profiles, starting from oxygen profiles and associated Argo variables (pressure, temperature, and salinity), while a variational data assimilation scheme (3DVarBio) was upgraded to integrate BGC-Argo and reconstructed observations in the Copernicus Mediterranean operational forecast system (MedBFM).

To ensure the high quality of oxygen data, a post-deployment quality control method was developed with the aim of detecting and eventually correcting potential sensors drift. The Mediterranean OSE features three different set-ups: a control run without assimilation; a multivariate run with assimilation of BGC-Argo chlorophyll, nitrate, and oxygen; and a multivariate run that also assimilates reconstructed observations.

The general improvement in the skill performance metrics demonstrated the feasibility of integrating new variables (oxygen and reconstructed nitrate). Major benefits have been observed with respect to reproducing specific biogeochemical-process-based dynamics such as the nitracline dynamics, primary production, and oxygen vertical dynamics.

The assimilation of BGC-Argo nitrate corrects a generally positive bias of the model in most of the Mediterranean areas, and the addition of reconstructed profiles makes the corrections even stronger. The impact of enlarged nitrate assimilation propagates to ecosystem processes (e.g. primary production) at a basin-wide scale, demonstrating the importance of the assimilation of BGC-Argo profiles in forecasting the biogeochemical ocean state.

2. Materials and methods

2.1 The MedBFM system

The MedBFM consists of the tracer transport OGS Transport Model (OGSTM), based on the OPA 8.1 system (Foujols et al., 2000) and updated according to the Lazzari et al. (2012) and Lazzari et al. (2016) versions; the Biogeochemical Flux Model (BFM) described in Vichi et al. (2007a) and Vichi et al. (2007b); and the 3DVarBio variational assimilation scheme as in Teruzzi et al. (2014) and Teruzzi et al. (2018).

OGSTM solves for advection, diffusion, and sinking terms as well as considering the effects of the free surface and variable volume-layer effects on tracer transport (Salon et al., 2019). It is forced by output variables such as current, temperature (T), salinity (S), and sea surface height from the NEMO3.6 model (Clementi et al., 2017).

BFM is a biomass- and functional-group-based marine ecosystem model. It solves governing equations for nine living organic state variables, diatoms, autotrophic nanoflagellates, picophytoplankton, dinoflagellates, carnivorous and omnivorous mesozooplankton, bacteria, heterotrophic nanoflagellates, and microzooplankton; macro-nutrients (nitrate, phosphate, silicate, and ammonium); and labile, semi-labile, and refractory organic matter and oxygen.

Based on 3DVarBio (Teruzzi et al., 2014, 2018; Cossarini et al., 2019; Teruzzi et al., 2021), the assimilation module adopted in the present work integrates oxygen, chlorophyll, and nitrate to update all of the assimilated variables as well as all of the phytoplankton biomass and phosphate.

The 3DVarBio is a variational DA scheme (Teruzzi et al., 2014) based on the minimization of a cost function (J). This function comprises two terms: (i) the misfit between the model background (xb) and the model control state variable or analysis (i.e., the assimilation result xa) and (ii) the mismatch between the observations (y) and the analysis (xa). Both terms are weighted by their respective error covariance matrices (B and R) as follows:

J(x) = 1/2 (x – xb)T B-1 (x – xb) + 1/2 (y – Hx)T R-1 (y – Hx)

Here, the observation operator (H) maps the values of the model background state in the observation space. Following Dobricic et al. (2007), the background error covariance matrix, B, is factorized as B=VVT with V=VVVHVB. The V operators describe different aspects of the error covariances: the vertical error covariance (VV), the horizontal error covariance (VH), and the state variable error covariance (VB).

VV is defined by a set of reconstructed profiles evaluated by means of an empirical orthogonal function (EOF) decomposition applied to a validated multi-year (1998-2015) run (Teruzzi et al., 2018). EOFs are computed for 12 months and 30 coastal and open-sea subregions in order to account for the variability in BGC anomaly fields. VH is built using a Gaussian filter whose correlation radius modulates the smoothing intensity. As in Cossarini et al. (2019), the correlation radius in this work is non-uniform, direction-dependent, and ranges between 12 and 20 km (16 km on average).

The VB operator consists of prescribed monthly and subregion varying covariances among the BGC variables (e.g., nitrate to phosphate). Specifically, for the assimilation of chlorophyll, the VB operator includes a balance scheme that maintains the ratio among the phytoplankton groups and preserves the physiological status of the phytoplankton cells (i.e., preserves the internal ratios between the chlorophyll, carbon, and nutrients, as described in Teruzzi et al., 2014).

The operators VV and VB of 3DVarBio have been updated for the assimilation of oxygen. VV involved the calculation of specific EOF profiles for oxygen, including a localization function to avoid unrealistic corrections due to possible spurious error covariances in the deepest part of the water column. VB included only a new direct relation for oxygen (i.e., oxygen assimilation updated only the oxygen itself), given that it has been shown that it barely affects other variables (Skakala et al., 2021).

2.2 Assimilation of BGC-Argo data

The assimilated observations consist of the quality-controlled BGC-Argo dataset listed in Table 1. Oxygen and nitrate profiles in the 0-600 m layer are used in the assimilation, while chlorophyll is assimilated in the 0-200 m layer.

The observation error covariance matrix R is diagonal with a monthly varying error in chlorophyll (Cossarini et al., 2019). In both the nitrate BGC-Argo profiles and the reconstructed nitrate profiles, the observation error remains constant over time and increases along the vertical direction. Within the 0-450 m layer, the error is set at 0.24 mmol m-3, as in Mignot et al. (2019), and the linearly then increases up to 0.35 mmol m-3 between 450 and 600 m (the maximum assimilation depth). This adjustment aims to prevent inconsistencies between the lower part of the assimilated layer (450-600 m) and the deeper layer of the water column (below 600 m).

Although the accuracy of the reconstruction of profiles is 0.87 mmol m-3 (Pietropolli et al., 2023), we decided to not use different values of error for the two nitrate subsets in order to show the highest potential impact of the OSE. Observation error for oxygen is set to 5 mmol m-3 in the upper 200 m of depth and gradually increases to 20 mmol m-3 in correspondence with the maximum assimilation depth. These values correspond to the uncertainty associated with the oxygen dataset described in Feudale et al. (2022).

2.3 Neural network for nitrate reconstruction

NN-MLP-MED (Pietropolli et al., 2023) is the evolution of previous MLP architectures developed to predict variables sampled with low frequency (e.g., nutrients) starting from variables sampled with high frequency (e.g., temperature) (Sauzède et al., 2017; Bittig et al., 2018b; Fourrier et al., 2020). NN-MLP-MED is a deterministic feed-forward neural network based on an MLP structure. It consists of the merging of 10 different MLP architectures, each one with the same input and output features, composed by two hidden layers with varying numbers of neurons per layer. The final prediction resulting from NN-MLP-MED is the mean of all of the predictions of these components.

In our OSE experiment, the trained NN-MLP-MED reconstructs nitrate profiles from sets of temperature, salinity, oxygen, date, latitude, and longitude BGC-Argo profiles.

2.4 BGC-Argo data and post-deployment oxygen quality control

The BGC-Argo profiles from 2017 to 2018 were downloaded from the Coriolis GDAC (Argo, 2022; last visited in July 2022). We collected both adjusted and delayed mode data for oxygen and chlorophyll. For nitrate, we selected delayed mode data, while adjusted mode data were incorporated after undergoing correction via the CANYON-b NN method or using the World Ocean Atlas (WOA18) collection (Garcia et al., 2019), as explained in Johnson et al. (2021).

Oxygen sensors may drift and lose accuracy over time, and the accurate determination of dissolved oxygen is typically more challenging and requires some form of correction (Johnson et al., 2015). Here, the optode drift is evaluated using nonparametric methods (the random sample consensus, RANSAC, and Theil–Sen methods) at two different depths (600 and 800 m) to avoid possible fake drift detection because of changes in the water masses. Tests are applied when the life of a float is longer than 1 year. Conversely, if the available float time series is less than 1 year, the profiles are not corrected because the float lifetime is considered to be too short to account for in situ sensor drift.

The presence of a drift is established when all four drift estimates (RANSAC at 600 and 800 m and Theil–Sen at 600 and 800 m) agree with respect to their sign and their average value (D_avg) exceeds 1 mmol m-3 yr-1. This threshold is chosen on the basis of results in Bittig et al. (2018a). Subsequently, the identified drift is removed from the oxygen profiles. This is achieved by setting the D_avg at 600 m and linearly interpolating toward the surface, where drift is set equal to zero.

2.5 Experimental setup

Three numerical experiments are performed to analyse the impact of different assimilation set-ups. The simulated period is 1 January 2017–31 December 2018, and the MedBFM module set-up mostly corresponds to the standard adopted in the Mediterranean Analysis and Forecast biogeochemical system of the Copernicus Marine Service.

This set-up includes the following: open boundary conditions in the Atlantic; climatological input of nutrients, carbon, and alkalinity for 39 rivers and the Dardanelles Strait; initial conditions from the EMODnet dataset (details are provided in Salon et al., 2019); and a 3-year spin-up using the 2017 forcings in perpetual mode.

The three simulations, which share the same set-up except for the assimilated datasets, are as follows: (1) control run without assimilation (HIND); (2) assimilation of BGC-Argo chlorophyll, nitrate, and oxygen (DAfl); and (3) assimilation of additional reconstructed nitrate profiles used to enhance the DAfl assimilative set-up (DAnn).

Before integrating data in 3DVarBio, the same pre-assimilation assessment described in Teruzzi et al. (2021) is applied to the chlorophyll profiles. Nitrate profiles are rejected if the concentration at the surface is higher than 3 mmol m-3. At the surface, the oxygen profile exclusion is evaluated by calculating the difference between the uppermost oxygen measurement and the oxygen saturation (derived from temperature and salinity data from the Argo dataset, as in Garcia et al., 2019). Profiles are excluded when this difference reaches the threshold of 10 mmol m-3. At 600 m, the difference between oxygen and a climatological reference oxygen at depth is calculated. Profiles are excluded when the difference reaches the threshold of 2 times the standard deviation of the same reference dataset.

3. Results

3.1 Oxygen quality control

The QC O2 module enabled the automatic correction of in situ sensor drifts. Of the 40 floats available between 2017 and 2018, we performed the drift analysis on 16 floats, while 24 floats remained unanalysed due to the limited length of the time series. Of these 16 floats, we found a drift in 13: 4 with a positive drift and 9 with a negative drift. For the remaining three floats, the drift values were below the prescribed threshold. At a depth of 600 m, the absolute average correction for the 13 floats is approximately 4.3 mmol m-3 yr-1. This value aligns with the ranges expressed in terms of sensor drift percentage in Bittig et al. (2018a) (1%-1.5%).

3.2 Assimilation skill

The performance skill of the simulations listed in Table 1 is evaluated by comparing model results with (i) the satellite Copernicus Marine Service OC product (i.e., non-gap-filled L3 product OCEANCOLOUR_MED_BGC_L3_MY_009_143 from https://marine.copernicus.eu, last access: 17 July 2023) of chlorophyll and (ii) BGC-Argo profiles of chlorophyll, nitrate, and oxygen (Argo, 2022).

The winter RMSE concerning the OC chlorophyll in HIND spans between approximately 0.09 and 0.21 mg m-3 with a maximum in the alb region (Fig. 5). The inclusion of multivariate DA (in DAfl) positively impacts the model performance, reducing surface errors by 6.5%, as mainly observed in the eastern subbasins. A further reduction in the RMSE (up to 10%) with respect to HIND is then obtained with DAnn, highlighting that enlarging the nitrate float network leads to improvements in the reproduction of surface phytoplankton dynamics.

The RMSE metrics based on BGC-Argo are computed for the six selected aggregated macro-basins and in selected layers (0-10, 10-30, 30-60, 60-100, 100-150, 150-300, and 300-600 m), and they are shown for nitrate (Fig. 6a, b), chlorophyll (Fig. 6c, d), and oxygen (Fig. 6e, f).

The assimilation of in situ BGC-Argo considerably improves the quality of modelled nitrate with respect to the HIND run. During winter, the average RMSE reduction is 40% in DAfl and increases to 46% in DAnn, whereas the average reduction reaches 59% in DAfl and 63% in DAnn in summer (Fig. 6a, b).

Assimilating oxygen profiles enables the reduction of the model–BGC float RMSE by about 30% during winter and summer. In winter, the correction involves the whole water column in the east (Lev and Ion) and deeper layers (150-600 m) in the west (Swm, Nwm) and Adr (Fig. 6e, f). In summer, the impact is mainly observed in Tyr, Ion, and Lev.