A 10 km daily-level ultraviolet-radiation-predicting dataset … – ESSD

A 10 km daily-level ultraviolet-radiation-predicting dataset … – ESSD

Overview of Ultraviolet Radiation Measurement and Prediction

Ultraviolet (UV) radiation is a crucial environmental factor closely linked to human health. Previous studies have highlighted the hazardous effects of UV radiation on skin cancer, while the relationship between UV exposure and eye diseases has shown inconsistent results. Further research is needed to fully understand the impacts of UV radiation on public health. However, the lack of highly accurate and comprehensive UV radiation data has hindered such health-related investigations.

Existing methods for assessing UV radiation exposure have limitations. The UV index, a commonly used proxy, provides a general scale but loses numerical details. Satellite remote sensing data, such as erythemal UV irradiance from the Total Ozone Mapping Spectrometer (TOMS) and erythemal daily dose (EDD) from the Ozone Monitoring Instrument (OMI), offer better spatial and temporal resolution but still face challenges, including lower accuracy and missing data, respectively. Personal dosimeters provide high-quality data but are costly, making them impractical for large-scale population studies.

The development of machine learning algorithms has enabled the integration of various data sources to predict environmental factors with high accuracy. While empirical or statistical models have been widely used for UV radiation prediction, recent studies have started to explore the application of machine learning techniques in this field. However, these studies have limitations, such as relatively low spatial resolution or significant missing data in key predictors.

To address these gaps, this study aimed to develop a random forest model to predict daily UV radiation in mainland China at a 10 km spatial resolution from 2005 to 2020. The model incorporated multiple predictors, including satellite-based UV radiation data, UV radiation simulations, and meteorological parameters. The missing satellite-based UV radiation data were filled to improve the spatial coverage of the final UV radiation predictions. This comprehensive dataset can support future health-related research on the effects of UV radiation exposure.

Data Sources and Preprocessing

Ground-based UV Radiation Measurements

The Chinese Ecosystem Research Network (CERN) has been monitoring UV radiation since 2004. This study collected hourly UV radiation data from 40 ground-based stations between 2005 and 2015, and 36 stations between 2016 and 2020, covering various land-cover types across China. Daily UV radiation values were calculated by summing the 24-hour measurements for each day, and days with continuous 2-hour missing data were excluded.

Satellite-based and Reanalysis Data

The main predictors used in this study were:

  1. OMI EDD: Level-2 OMI EDD (v.003) data, with a daily temporal resolution and a 0.25°×0.25° spatial resolution, were utilized as a direct measurement of UV radiation from satellites.

  2. ERA5 UV: Downward UV radiation at the surface from the fifth-generation European Center for Medium-Range Weather Forecasts Reanalysis (ERA5), with an hourly temporal resolution and a 0.25°×0.25° spatial resolution, was also included.

OMI EDD and ERA5 UV data were interpolated to 10 km grid cells using the inverse distance weighting (IDW) method.

Meteorological and Auxiliary Predictors

Other predictors incorporated in the model included:

  • Meteorological parameters from ERA5 and ERA5-Land products, such as total cloud cover, total column water vapor, relative humidity, total precipitation, and temperature at 2 m.
  • Elevation data from the Advanced Spaceborne Thermal Emission and Radiometer (ASTER) Global Digital Elevation Map.
  • Solar zenith angle (SZA) data from the Aqua satellite.
  • Ground-level ozone (O3) concentrations from a random forest model.
  • Aerosol optical depth (AOD) data from the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm based on the Moderate Resolution Imaging Spectroradiometer (MODIS).

All meteorological and auxiliary predictors were preprocessed to match the 10 km spatial resolution.

Filling Missing Satellite-based UV Radiation Data

OMI EDD data had non-random missing values due to cloud cover and a technical issue with the instrument since 2008, with an average missing rate of 23.04% during the study period. To address this, the study employed a 3-day moving average method to fill in the missing OMI EDD values on grid days with available data from the 2 preceding days. This method reduced the missing rate to 0.62% on average.

The accuracy of the gap-filling method was evaluated through 10-fold cross-validation (CV), which showed R2 values ranging from 0.85 to 0.90 in 2005-2020, indicating a relatively high accuracy.

Model Development and Evaluation

This study utilized the random forest algorithm to develop a model for predicting daily UV radiation in China from 2005 to 2020. The ground-measured UV radiation data were used as the dependent variable, while the satellite-based, reanalysis, and auxiliary data were used as independent variables.

To assess the model performance, several cross-validation methods were employed:

  1. Overall 10-fold CV: Randomly dividing the dataset into 10 parts, with 9 parts used for model training and 1 part for testing.
  2. Temporal 10-fold CV: Randomly dividing the dataset by days, with 90% of the days used for training and 10% for testing.
  3. Spatial 10-fold CV: Randomly dividing the dataset by monitoring station locations, with 90% of the sites used for training and 10% for testing.
  4. By-year temporal CV: Leaving an entire year of data as the testing dataset while using the remaining years for training.

The model performance was evaluated using the coefficient of determination (R2) and root mean square error (RMSE) between the measured and predicted UV radiation.

Results and Discussion

The overall R2 and RMSE between measured and predicted UV radiation from model development were 0.97 and 15.64 W/m2, respectively, at the daily level. The CV results showed R2 values of 0.83 (37.44 W/m2) for overall CV, 0.75 (45.56 W/m2) for spatial CV, 0.83 (37.48 W/m2) for temporal CV, and 0.82 (38.86 W/m2) for by-year temporal CV at the daily level.

The inclusion of OMI EDD as a predictor improved the model’s predictive accuracy compared to a model without it. The random forest feature importance analysis and the SHAP method also identified ERA5 UV, OMI EDD, and aerosol optical depth (MAIAC AOD) as the most important predictors for UV radiation predictions.

The spatial distribution of annual average UV radiation based on model predictions showed an uneven pattern across China, with higher levels in the southern and western regions due to factors such as latitude, elevation, and meteorological conditions. The eastern areas of China, with high population density, were identified as potential hotspots for UV radiation exposure.

Temporal analysis revealed that UV radiation experienced slight fluctuations from 2005 to 2014 but then exhibited a clear increasing trend from 2015 to 2020, with a nationwide increase of 4.20% compared to 2013. This trend was accompanied by a 48.51% decrease in fine particulate matter (PM2.5) and a 22.70% increase in ground-level ozone (O3) during the same period, suggesting potential correlations among these environmental factors.

Conclusion

This study developed a random forest model to predict daily UV radiation in mainland China at a 10 km spatial resolution from 2005 to 2020, incorporating satellite-based, reanalysis, and auxiliary data. The model demonstrated high predictive accuracy, with R2 values ranging from 0.75 to 0.83 in cross-validation. The inclusion of satellite-based UV radiation data, such as OMI EDD, was found to improve the model’s performance.

The resulting dataset provides comprehensive and high-resolution UV radiation predictions, which can support further health-related research on the effects of UV exposure. The identified spatial and temporal patterns, as well as the potential correlations with air pollution, offer valuable insights for understanding the environmental factors influencing UV radiation and their implications for public health.

The UV radiation dataset generated in this study is freely available at https://doi.org/10.5281/zenodo.10884591 for use by the research community.

Acknowledgments

This research has been supported by the National Key Research and Development Program of China (grant nos. 2023YFC3708304 and 2022YFC3700705) and the National Natural Science Foundation of China (grant no. 82030103).

Facebook
Pinterest
Twitter
LinkedIn

Newsletter

Signup our newsletter to get update information, news, insight or promotions.

Latest Post