Open Access
Issue
J. Space Weather Space Clim.
Volume 14, 2024
Article Number 25
Number of page(s) 16
DOI https://doi.org/10.1051/swsc/2024021
Published online 09 September 2024

© X. Sun et al., Published by EDP Sciences 2024

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

The ≥2 MeV electron flux at geostationary (GEO) orbit is a critical parameter for GEO satellites and is used as a significant indicator of increased risk of internal charging (Wrenn & Sims, 1996; Gubby & Evans, 2002; Wrenn et al., 2002; Romanova et al., 2005; Pilipenko et al., 2006; Horne et al., 2013; Lai et al., 2018). Energetic electrons can cause significant anomalies, leading to temporary or permanent loss of satellite functions, such as interruption of communications and degradation of navigation precision (Reagan et al., 1983; Baker et al., 1987; Violet & Frederickson, 1993; Lanzerotti et al., 1998; Lucci et al., 2005; Ryden et al., 2008; Lohmeyer et al., 2015; Singh et al., 2021). The prediction of ≥2 MeV electron daily fluences for the next 3 days is one of the indispensable contents of space environment predictions. The alerts of relativistic electron enhancement events are triggered when ≥2 MeV electron daily fluences at GEO orbit exceed 108 cm−2 · sr−1 · day−1.

It is essential to understand the distribution of high-energy electrons and make reliable predictions of the radiation environment around spacecraft. With the increasing number of satellites and their growing importance to our lives, a lot of effort over the last two decades has been devoted to understanding the distribution of high-energy electrons and to making reliable predictions. There have been several studies conducted on the mechanisms for the acceleration of relativistic electrons in the radiation belt, such as the ONERA Salammbô model (Beutier & Boscher, 1995; Varotsou et al., 2005, 2008; Maget et al., 2007; Bourdarie & Maget, 2012), the British Antarctic Survey (BAS) Radiation Belt model (Glauert et al., 2014a, b; Kersten et al., 2014; Allison et al., 2019), the Versatile Electron Radiation Belt (VERB) model (Shprits et al., 2008a, b, 2009, 2013, 2015; Subbotin & Shprits, 2009; Kim et al., 2011, 2012; Subbotin et al., 2010, 2011; Pakhotin et al., 2014; Drozdov et al., 2017, 2021; Wang & Shprits, 2019; Wang et al., 2020) and the Dynamic Radiation Belt Environment Assimilation (DREAM-3D) Model (Reeves et al., 2012; Tu et al., 2013). Radial diffusion processes driven by ULF (Ultra-low Frequency) waves and localized electron acceleration due to resonant interactions with whistler mode chorus waves are important mechanisms for electron acceleration (Horne & Thorne, 1998; Summers et al., 1998; Brautigam & Albert, 2000; Friedel et al., 2002; Meredith et al., 2002; Li et al., 2001; Li, 2004; Li et al., 2006, 2007, 2011; Miyoshi et al., 2003; Horne et al., 2005; Shprits et al., 2006a, b, 2008a, b, 2009, 2018, 2022; Albert, 2007, 2008; Millan & Thorne, 2007; Anderson et al., 2015; Li et al., 2016; Ma et al., 2018).

In addition to the physics-based models, empirical methods have also been employed for the prediction of energetic electrons at GEO orbit, such as Paulikas & Blake (1979), Baker et al. (1990), O’Brien et al. (2001), Burin des Roziers & Li (2006), Li et al. (2001), Rigler et al. (2004), Turner & Li (2008), Reeves et al. (2011), He et al. (2013), Sakaguchi et al. (2013), Potapov et al. (2014), Li et al. (2017), Qian et al. (2020), and Landis et al. (2022). The empirical models often require solar wind parameters, geomagnetic indices, and low-energy or medium-energy electron fluxes in the previous days to predict ≥2 MeV electron daily fluences for the next 1–3 days.

With the rapid progress of artificial intelligence, machine learning models have been applied to predict high-energy electron fluxes at GEO orbit. Many machine-learning methods have also been applied, such as Fukata et al. (2002), Ukhorskiy et al. (2004), Xue & Ye (2004), Ling et al. (2010), Balikhin et al. (2011), Balikhin et al. (2016), Wei et al. (2011), Wang & Shi (2012), Boynton et al. (2013), Boynton et al. (2015), Guo et al. (2013), Pakhotin et al. (2014), Ganushkina et al. (2014), Ganushkina et al. (2015), Shin et al. (2016), Wei et al. (2018), Zhang et al. (2020), Katsavrias et al. (2022), Landis et al. (2022), Saikin et al. (2021), and Son et al. (2022). Wei et al. (2018) and Sun et al. (2023) used the Long Short-Term Memory (LSTM) network to predict the ≥2 MeV electron daily fluence at GEO orbit for the next day and the next 3 days, respectively.

The previous prediction models are mainly focused on the prediction of ≥2 MeV electron daily fluences. Limited studies have been focused on the prediction of ≥2 MeV electron fluxes with a 5-minute resolution. Li et al. (2017) developed the model by the Empirical Orthogonal Function (EOF) to give ≥2 MeV electron fluxes with 5-minute resolution on the following day. The EOF coefficients are fitted by the solar wind parameters and geomagnetic indices. The prediction efficiency (PE) from January 2003 to June 2006 is 0.67. Landis et al. (2022) created a NARX (nonlinear autoregressive with exogenous input) neural network to model ≥0.8 MeV electron fluxes with 5-minute resolution from GOES-15 satellite. The NARX model performs well from June 2013 to June 2016, with a linear correlation (LC) coefficient of 0.68 and a PE value of 0.39.

Observation data from satellites often contain data gaps for ≥2 MeV electron fluxes. However, continuous data sets are not only useful for analyzing the dynamic distribution of electrons in the radiation belt but also for model validation. Unfortunately, there is little attention on filling the data gaps of ≥2 MeV electron fluxes, but instead, research has focused on filling the data gaps in solar wind parameters and geomagnetic indices. Qin et al. (2007) developed a decorrelation-time-based approach to interpolate the solar-wind characteristics across data gaps and to evaluate parameters needed for global empirical magnetic models (Tsyganenko & Sitnov, 2005). Kondrashov et al. (2005) and Kondrashov & Ghil (2006) developed a gap-filling method based on the Singular Spectrum Analysis (SSA) method, which is mainly on the presence of significant oscillatory modes in the time series. Kondrashov et al. (2010, 2011, 2014) and Shprits et al. (2012, 2013) also applied the SSA method to reconstruct the solar wind data set covering the long-term time interval of the Combined Release and Radiation Effects Satellite (CRRES) by using the combination of solar wind factors, IMF data, and geomagnetic indices as inputs, and the method was used in several data assimilation and statistical studies of the radiation belts.

In this study, we develop models to predict ≥2 MeV electron fluxes with 5-minute resolution using the LSTM and transformer networks. Various combinations of ≥2 MeV electron fluxes, solar wind parameters, and geomagnetic indices as inputs are discussed, and the model performances are evaluated with different prediction time scales in 2005. These models can fill the data gaps of ≥2 MeV electron fluxes at GEO orbit. LSTM and transformer networks are two commonly used deep learning methods in processing sequential data. LSTM, a type of recurrent neural network, uses gate units to control the flow of information, thereby effectively avoiding the problem of vanishing or exploding gradients. In contrast, the transformer is a neural network that can achieve encoding and decoding of sequences based on attention mechanisms. When processing long sequences, the performance of the transformer model is usually better than that of the LSTM model. Considering the excellent performances of LSTM and transformer networks in processing sequential data, we used the two methods to develop the models.

This paper is structured as follows: Section 2 introduces the data, the LSTM and transformer networks, and the indices for model evaluation. In Section 3.1, we evaluate the performance of models with different offset times as inputs and determine the best offset time for modeling. Sections 3.2 and 3.3 focus on the performances of the LSTM and the transformer models. We evaluate the performances of the models with different prediction time scales and with different parameters as inputs and compare our models with other models in Section 3.4. Section 4 discusses the performances of the prediction models in different situations. The summary and conclusions are given in Section 5.

2 Data and methods

2.1 Data

The data used in this study include ≥2 MeV electron fluxes from GOES satellites, solar wind parameters, geomagnetic disturbance indices, magnetopause subsolar distance (R0), L-shell (Lm), and magnetic local time (MLT) values between 2002 and 2005. The data of GOES satellites with 5-minute resolution were provided by the National Centers for Environmental Information (NCEI) (https://www.ngdc.noaa.gov/stp/satellite/goes/). Solar wind parameters and geomagnetic disturbance indices were provided by OMNI (https://cdaweb.gsfc.nasa.gov/). The R0 indices were calculated by Lin et al. (2010) model. The Lm and MLT values were calculated using the international radiation environment modeling software library (IRBEM) (https://github.com/PRBEM/IRBEM).

The GOES satellites are a series of geostationary environmental satellites, continuously monitoring the energetic electron fluxes from 1974 (Grubb, 1975), which are managed by the National Oceanic and Atmospheric Administration (NOAA). Based on the ≥2 MeV electron fluxes with 5-minute resolution from GOES satellites, the proportion of missing data in different GOES satellites is analyzed. The data missing ratios of ≥2 MeV electron fluxes from GOES-8, GOES-9, GOES-10, GOES-11, and GOES-12 satellites are 6.6%, 6.9%, 5.4%, 4.7%, and 13.9%, respectively. The data missing ratios for eastward (westward) detectors from GOES-13 to GOES-15 satellites are 0.53% (0.45%), 1.26% (0.72%), and 0.43% (0.43%), respectively. The data qualities of GOES-13 to GOES-15 satellites are significantly better than those of previous GOES satellites. Considering that the model will also be used to fill the data gaps, the data will be from GOES-8 to GOES-12 satellites. We also considered the satellite location and the variation in ≥2 MeV electron fluxes at different longitudes. It is shown in Figure 1 of Sun et al. (2021) that most GOES satellites adjusted their locations during operation. For instance, the GOES-10 satellite operated from July 1998 to December 2009, and it shifted from around 135°W to about 60°W between July 2006 and November 2006. Sun et al. (2021) also showed that the ratios of ≥2 MeV electron daily fluences (cm−2 · sr−1 · day−1) from GOES-10 at about 135°W to those from GOES-12 at about 75°W are mainly in the range of 1.0–4.0, with an average of 1.92. Due to the long duration of the GOES-10 satellite operating at the same fixed longitude (135°W), we ultimately chose the GOES-10 satellite.

Solar wind parameters include solar wind speed (Vsw), density (N), dynamic pressure (Pd), the total magnitude of interplanetary magnetic field (Bt), Bx, By, and Bz components of interplanetary magnetic field (IMF) in the GSM coordinates, electric field (E), temperature (T), and plasma beta (beta). Geomagnetic disturbance indices consist of kp, AE, SYM-H, and Dst. Solar wind parameters and AE index are with 5-minute resolution. The Dst index is with 1-hour resolution, and the kp index is with 3-hour resolution. Dst and kp indices are converted to 5-minute resolution by keeping the same values within the 1-hour or 3-hour interval. The solar wind data for calculating R0 are described above. The Lm and MLT values are calculated by International Reference Geomagnetic Field (IGRF) + Tsyganenko (T89) models using IRBEM. The IGRF (Macmillan & Finlay, 2010; Thébault et al., 2015) is the internal geomagnetic field model and the T89 model (Tsyganenko, 1989) is the external geomagnetic field model. Missing values are filled by the linear interpolation method.

The distribution of the missing data of the GOES-10 satellite is shown in Figure 1. There are 2432 data gaps during the operation period of the GOES-10 satellite, and 61.7% of these are 5-minute data gaps, meaning that only one individual value is absent. 94.0% of the data gaps are under an hour, 96.5% are under 3 h, and 99.6% are under 24 h. Therefore, a model for a 1-day prediction is sufficient to fill most of the gaps in the GOES data, and it is more important to improve the performance of the model for 5-minute or 1-hour predictions.

thumbnail Figure 1

The distribution of the missing data of ≥2 MeV electron fluxes from the GOES-10 satellite.

Figure 2a displays the ≥2 MeV electron fluxes from the GOES-10 satellite between 1999 and 2009, which is the main operation period of the GOES-10 satellite. The red dashed line indicates that ≥2 MeV electron fluxes are equal to 1157 cm−2 · s−1 · sr−1, and the corresponding daily fluence is 108 cm−2 · sr−1 · day−1, which is the threshold value of a relativistic electron enhancement event. Figures 2b2d show the longitude of the GOES-10 satellite, the number of days with relativistic electron enhancement events, the number of relativistic electron enhancement events, and the percentage of missing data for each year.

thumbnail Figure 2

(a) The ≥2 MeV electron fluxes from the GOES-10 satellite, (b) the longitudes of GOES-10 satellite, (c) the number of the relativistic electron enhancement events (red line), and the number of days with relativistic electron enhancement events (black line), and (d) the percentage of missing data of the GOES-10 satellite from 1999 to 2010.

It is obvious that the longitudes of GOES-10 remained stable from 1999 to 2005 as shown in Figure 2b, and the quality of the ≥2 MeV electron fluxes from the GOES-10 satellite between 2001 and 2005 is significantly better than that of other years as shown in Figure 2d. To correctly train the machine learning model, the training set should consist of different kinds of space weather phenomena. There are a lot of relativistic electron enhancement events between 2003 and 2005. In addition, the ≥2 MeV electron fluxes were relatively low and with relatively few relativistic electron enhancement events in 2002 as shown in Figures 2a and 2c. Finally, we chose the data from 2002 to 2005 as the data set. We use the data from 2002 to 2003 as the training set, the data in 2004 as the validation set, and the data in 2005 as the test set.

2.2 Long short-term memory (LSTM)

The LSTM network is a Recurrent Neural Network (RNN) based architecture that can preserve more input information and significantly reduces the vanishing gradient problem of conventional neural networks (Hochreiter & Schmidhuber, 1997; Graves & Schmidhuber, 2005; Graves, 2012). There are a series of “gates” in the LSTM network, including the Input Gate, Output Gate, and Forget Gate, which can manage to keep, forget, or ignore historical information based on a probabilistic model. The LSTM network is suited for propagating information through long sequences due to its unique structure. Therefore, it is widely used in natural language processing (NLP) and time series predictions (Gers et al., 2000; Kai et al., 2013; Huang et al., 2015; Cai & Liu, 2016; Greff et al., 2016).

In this study, we develop models by LSTM for filling the data gaps of ≥2 MeV electron fluxes from the GOES-10 satellite. The loss function is mean-square error (MSE), the optimizer is AdamOptimizer (Kingma & Ba, 2014), the batch size of the training set is 64, and the learning rate is 1 × 10−4. Each sliding window is used as an input sequence to predict the next time step’s value based on the preceding time steps. Subsequently, the loss is computed by comparing these predicted values against the actual ones, and the model’s weights are updated via backpropagation. This iterative training continues until every time window in the training set has been processed. We do not set a fixed epoch value to prevent overfitting. When the model performance stops improving for 10 consecutive epochs, the training stops.

The LSTM models target different prediction time scales, containing 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions. The 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions contain the next 12, 36, 72, 144, and 288 data points with 5-minute resolution.

2.3 Transformer

The transformer network is one of the sequence modeling architectures. It has undergone significant progress in recent years, displaying unmatched performance in a variety of applications, including NLP, speech recognition, and computer vision (Vaswani et al., 2017). The transformer network can digest vast sequences of data due to its multi-head self-attention mechanism, which excels at finding semantic correlations between items in a lengthy sequence. Consequently, it is capable of learning time series data with complicated dynamics that pose a challenge for sequence models (Vaswani et al., 2017; Devlin et al., 2018; Dong et al., 2018; Liu et al., 2021; Zeng et al., 2022).

The transformer network was introduced in 2017 by a Google Brain team and is becoming increasingly popular for NLP issues including machine translation and time series prediction. Transformer network requires a lot of data for training. Moreover, more data is often used during training, which usually leads to better results. Considering their versatility and wide range of applications, we also develop models using the transformer network for filling the data gaps of ≥2 MeV electron fluxes from the GOES-10 satellite.

We build a four-layer transformer network model that contains two linear layers, a transformer layer, and an output layer. The loss function is MSE, the optimizer is AdamOptimizer (Kingma & Ba, 2014), and the learning rate is 1 × 10−4. The transformer models also target different prediction time scales, which are the same as those in Section 2.2.

2.4 Model evaluation

The model performance is evaluated by the PE and the root mean square error (RMSE). They are defined as

(1)

(2)

where n is the total number of samples, is the mean value of all observation samples, and mi and pi are the ith observation and prediction, respectively. The better the model, the lower the RMSE and the higher the PE values. In this study, the mi is the log10 (≥2 MeV electron fluxes) from observations, and the pi is the log10 (≥2 MeV electron fluxes) predicted by models.

In addition, the LC coefficient and bias are also used to assess model performances. The bias represents the divergence between the observations and predictions. We calculate the differences between mi and pi, add up all the differences, and divide the total value by the total number of predictions to get the bias.

3 The models for predicting ≥2 MeV electron fluxes at GEO orbit

The models for predicting ≥2 MeV electron fluxes by the LSTM or transformer network with different prediction time scales are developed by using the ≥2 MeV electron fluxes between 2002 and 2004 from the GOES-10 satellite as inputs and tested in the year 2005. The prediction time scales are 1-hour (12 data points), 3-hour (36 data points), 6-hour (72 data points), 12-hour (144 data points), and 1-day (288 data points), respectively. The offset time is always 4 days as selected in Section 3.1. The data from 2002 to 2003 are the training set, the data in 2004 are the validation set, and the data in 2005 are the test set.

We only use log10 (≥2 MeV electron fluxes) or the combinations of it with other external parameters as inputs to develop models for different prediction time scales. There are 17 other external parameters, namely Bx, By, Bz, Bt, Vsw, N, Pd, E, T, beta, AE, SYM-H, kp, Dst, R0, Lm, and MLT, as listed in Section 2.1. The best combinations of input parameters for modeling are determined by the model’s PE values.

In Section 3.1, we use log10 (≥2 MeV electron fluxes) only or the combination of log10 (≥2 MeV electron fluxes) and other single external parameters as the inputs. In other sections, we use the combinations of log10 (≥2 MeV electron fluxes) with other external parameters as inputs, and the number of input parameters for models is controlled within five.

3.1 The selection of the best offset time

The length of the time series of model inputs is the offset time. For example, if the offset time is 4 days, the model will use the consecutive data of the last 4 days as inputs. In this study, the models are aimed at filling the 1-hour (12 data points), 3-hour (36 data points), 6-hour (72 data points), 12-hour (144 data points), and 1-day (288 data points) intervals.

The most suitable offset time will be determined by the PE values. We use the prediction of the log10 (≥2 MeV electron fluxes) the next day (the 1-day prediction with 288 data points) for an example. The log10 (≥2 MeV electron fluxes) is abbreviated as Flux. Three training processes are carried out for each input combination, and the average PE values of the three runs are used.

Figure 3 shows the PE values of the LSTM model (Fig. 3a) and transformer model (Fig. 3b) for the 1-day prediction with different input parameters and different offset times. The offset time ranges from 1 to 9 days and the input parameters are listed on the left of both panels. The colors in the panels represent PE values.

thumbnail Figure 3

The PE values (color-coded) of the LSTM models (a) and the transformer models (b) for the 1-day prediction with different offset times and different input parameters.

It is clear by the colors in Figure 3 that when only using one external parameter as input, the most important external factors for the LSTM models are Vsw, N, and kp, while more external factors also have an impact on transformer models, such as Bz, Vsw, N, Pd, AE, SYM-H, kp, Dst, R0, Lm and MLT. In general, the offset time of the models with solar wind parameters as input parameters is shorter than those with geomagnetic indices. The PE values of the most of models with various input parameters reach their peaks when the offset time is between 3 and 5 days, regardless of whether they are transformer or LSTM models. In addition, we tried some combinations of Flux with two or three external parameters as inputs, the peak of PE values also varied from 3 to 5 days. Moreover, the PE values of the model are not significantly different with offset time between 3 and 5 days. Finally, the offset time for the predictions in the later study is set to 4 days.

3.2 The LSTM models for predicting ≥2 MeV electron fluxes with different prediction time scales

We develop models for different prediction time scales, including the 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions. The PE values of the LSTM models only using log10 (≥2 MeV electron fluxes) as input for the 1-hour, 3-hour, 6-hour, 12-hour, and 1-day are 0.913, 0.801, 0.619, 0.421, and 0.349, respectively, and those of the Persistence models are 0.913, 0.798, 0.572, 0.207, and 0.201, separately. The LSTM models only using log10 (≥2 MeV electron fluxes) as input usually behave better than the Persistence models.

Some combinations as inputs can improve model performances by comparison with the models only using log10 (≥2 MeV electron fluxes) as input. For the 1-hour and 3-hour predictions, Bt, Vsw, N, and Dst are the most important external parameters, because they have the highest frequency of occurrence among the input combinations ranked in the top 100 of PE values. For the 6-hour and 12-hour predictions, Bt, Vsw, N, Pd, AE, kp, Dst, Lm, and MLT can help improve PE values. The addition of Vsw, N, and kp improves the performances of the models for the 1-day prediction when only using log10 (≥2 MeV electron fluxes) as input. Note that Bt, Vsw, N, kp, and Dst have the greatest influence on the ≥2 MeV electron fluxes at the GEO orbit. These parameters are also often used by previous researchers.

The best input combinations for the 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions are (Flux, Dst), (Flux, Bt, Vsw), (Flux, N, Dst, Lm), (Flux, Vsw, N, Lm), and (Flux, Vsw, N), with PE values of 0.919, 0.811, 0.773, 0.554, and 0.490, respectively. Figure 4 shows the comparisons of ≥2 MeV electron fluxes between the observations from the GOES-10 satellite and the predictions of the LSTM models with the best combinations for different prediction time scales. The black dots in Figures 4aFigures 4e represent the ≥2 MeV electron fluxes from the GOES-10 satellite in 2005. The red dots in Figures 4aFigures 4e are the predictions of the LSTM models with (Flux, Dst), (Flux, Bt, Vsw), (Flux, N, Dst, Lm), (Flux, Vsw, N, Lm), and (Flux, Vsw, N) as inputs for 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions from top to bottom, respectively. The blue dashed lines in each panel indicate that ≥2 MeV electron fluxes are equal to 1157 cm−2 · s−1 · sr−1. The data in Figures 4aFigures 4e are plotted in the flux-flux coordinates in Figures 4fFigures 4j with black dots on their respective right sides to show the linear relationship of observations and model results. The blue lines, y = x, indicate the situation when the observations are completely consistent with the predictions.

thumbnail Figure 4

The comparisons of ≥2 MeV electron fluxes between the observations from GOES-10 satellite (black dots) and the predictions of the LSTM models (red dots) (a) with (Flux, Dst) as inputs for 1-hour prediction, (b) with (Flux, Bt, Vsw) as inputs for 3-hour prediction, (c) with (Flux, N, Dst, Lm) as inputs for 6-hour prediction, (d) with (Flux, Vsw, N, Lm) as inputs for 12-hour prediction, (e) with (Flux, Vsw, N) as inputs for 1-day prediction, respectively.

As shown in Figure 4, the LC values of ≥2 MeV electron fluxes between the observations from the GOES-10 satellite and the predictions of the LSTM models with the best combinations as inputs are 0.953, 0.902, 0.879, 0.750, and 0.702 for different prediction time scales, respectively. The LSTM models perform worse as the prediction time scales increase. Due to their propensity to provide average values to assure better overall performance, the LSTM models’ capability to portray in detail the peaks and valleys decreases as prediction time scales increase.

3.3 The transformer models for predicting ≥2 MeV electron fluxes with different prediction time scales

Due to the versatility of the transformer network, it has been widely used in machine learning. In order to compare the performance of the transformer network with the LSTM network on processing time series, we also develop models to predict the ≥2 MeV electron fluxes by using the transformer method. The prediction time scales are the same as those in Section 3.2.

The PE values of the transformer models only using log10 (≥2 MeV electron fluxes) as input for the 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions are 0.931, 0.855, 0.791, 0.677, and 0.554, respectively, which are all higher than those of the LSTM models. In addition, the transformer models can better capture the effect of external parameters on ≥2 MeV electron fluxes. Bt, Vsw, N, Pd, AE, SYM-H, kp, Dst, Lm, and MLT have significant impacts on the PE values of the transformer models with different prediction time scales. These external parameters are all common parameters in previous prediction models.

The best combinations for 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions are (Flux, MLT), (Flux, Bt, AE, SYM-H), (Flux, N), (Flux, N, Dst, Lm), and (Flux, Pd, AE) with PE values of 0.940, 0.886, 0.828, 0.747, and 0.660, respectively. The transformer models perform better than the LSTM models at the same prediction time scales.

The comparisons of ≥2 MeV electron fluxes between the observations from the GOES-10 satellite and the predictions of the transformer models with the best combinations for different prediction time scales are shown in Figure 5. The format is the same as in Figure 4.

thumbnail Figure 5

The comparisons of ≥2 MeV electron fluxes between the observations from GOES-10 satellite (black dots) and the predictions of the transformer models (red dots) (a) with (Flux, MLT) as inputs for 1-hour prediction, (b) with (Flux, Bt, AE, SYM-H) as inputs for 3-hour prediction, (c) with (Flux, N) as inputs for 6-hour prediction, (d) with (Flux, N, Dst, Lm) as inputs for 12-hour prediction, (e) with (Flux, Pd, AE) as inputs for 1-day prediction, respectively.

It is shown that the transformer models always perform better than the LSTM models in terms of the LC values and biases between observations from the GOES-10 satellite and forecast results, as well as the precision of peak and valley predictions. Therefore, the transformer models are used in the following study.

The PE values inevitably decrease as prediction time scales increase, but the prediction for a longer period has always been a challenge. Additionally, the transformer models perform rather poorly in the predictions of low fluxes, especially when the ≥2 MeV electron fluxes are less than 1 cm−2 · s−1 · sr−1. The lowest limit of the GOES satellites’ detectors prevents them from picking up ≥2 MeV electron fluxes below 0.133 cm−2 · s−1 · sr−1 and the fluxes below 0.133 cm−2 · s−1 · sr−1 are recorded as 0.133 cm−2 · s−1 · sr−1 consistently, which will affect the forecast accuracy. Meanwhile, the transformer models require a large amount of data for training, however, the training set only contains 2.65% of the whole data set when the fluxes are under 1 cm−2 · s−1 · sr−1. These factors work together to cause poor performances in low fluxes. All data are used for the computations of PE values in this section, and PE values of the ≥2 MeV electron fluxes higher than 0.133 cm−2 · s−1 · sr−1 will be discussed in Section 4.

3.4 The comparisons with different models

Furthermore, we compare our models with several other models. We calculate the ≥2 MeV electron hourly fluences based on the sum of the 12 values of our transformer model for the 1-hour prediction and compute ≥2 MeV electron daily fluences based on the sum of the 288 values of our transformer model for the 1-day prediction. The PE and RMSE values for the ≥2 MeV electron hourly fluences of our transformer model for the 1-hour prediction, the Persistence model, the LSTM model by Wei et al. (2018), the MLP (multilayer perceptron) model by Son et al. (2022), and the MLP model by Shin et al. (2016) are listed in Table 1. The prediction models above all can provide the prediction of ≥2 MeV electron hourly fluence in the next hour. The PE and RMSE values for ≥2 MeV electron daily fluences of our transformer model, the Persistence model, the Geomagnetic pulsation model by He et al. (2013), the EOF model by Li et al. (2017), and the EMD (empirical mode decomposition) model by Qian et al. (2020) are listed in Table 2.

Table 1

The comparisons of prediction efficiencies of ≥2 MeV electron hourly fluences of different models.

Table 2

The comparisons of prediction efficiencies of ≥2 MeV electron daily fluences of different models.

There are currently few prediction models for ≥2 MeV electron hourly fluences, and most models provide ≥2 MeV electron hourly fluences for the next 24 h. There are a few models that calculate PE values for the next 1 h, and the test data for most models are not in 2005. But Sun et al. (2023) noted that PE values show the solar cycle dependence, so we cannot directly compare our model with other models. It can be seen in Figure 4 of Sun et al. (2023) that the PE values of most models in 2005 at 135°W are lower than those in 2008, so our model performs better than the LSTM model by Wei et al. (2018). Wei et al. (2018) also pointed out that the PE value of their LSTM model is improved significantly compared to some earlier models. Moreover, most models show relatively high PE values in the next 1 hour, and the PE value of our model is higher than that of the Persistence model. Considering that our model provides 12 data points for the prediction within the next 1 h at a time instead of only giving one point of ≥2 MeV electron hourly fluence as in previous models, our model has an advantage in terms of time resolution.

The PE values for predicting ≥2 MeV electron daily fluences on the next day have solar cycle dependence, and they are different in different years. We choose the models whose testing years include 2005, as shown in Table 2. For the comparison of ≥2 MeV electron daily fluences, the PE value of our transformer model in 2005 is higher than those of other previous models. Moreover, our model can provide 288 data points for the next day at a time, which will describe the distribution of ≥2 MeV electron fluxes in more detail. As shown in Figures 5e and 5j, our transformer models perform relatively well during the high-flux periods. Considering that ≥2 MeV electron daily fluences are calculated by the 1-day prediction model, the PE values will increase with shorter prediction time scales. The PE value of ≥2 MeV electron daily fluences can reach 0.975 based on the 1-hour prediction model.

Our transformer model performs better than previous prediction models in predicting ≥2 MeV electron daily fluences or ≥2 MeV electron hourly fluences. In the future, the described methodology can be applied to train new models to predict ≥2 MeV electron fluxes at different prediction time scales or to fill in data gaps.

4 Discussion of the model performance

In this section, we discuss the distributions of the differences between predictions and observations, and the overall performance of the models.

Figure 6a shows the PE values of the transformer models for each month in 2005. It is clear that the transformer models with different time scales have the same trend of PE values changing with time in 2005 and the maximum amplitude of variations become increasingly obvious as the prediction time scales increase. The numbers of the relativistic electron enhancement events in each month are displayed in Figure 6b. It is found that the monthly PE values with a high number of relativistic electron enhancement events are usually lower than the average monthly PE value.

thumbnail Figure 6

(a) The PE values of the transformer models with different prediction time scales in each month in 2005, (b) the numbers of the relativistic electron enhancement events, (c) the PE values of the transformer model for the 1-hour prediction in different flux ranges, and (d) the data number in different flux ranges.

The PE values of the transformer model for the 1-hour prediction and the data number in different flux ranges are shown in Figures 6c and 6d, respectively. The relationships between PE values from other models with different prediction scales and ≥2 MeV electron fluxes are similar to those from the 1-hour prediction model. It can be seen in Figure 6c that PE values are lower at low fluxes, especially when ≥2 MeV electron fluxes are below 10 cm−2 · s−1 · sr−1. It can be concluded that models do not perform well during periods of low fluxes or relativistic electron enhancement events.

4.1 Discussion of the model performances during the relativistic electron enhancement events

In this section, we discuss the model performances during relativistic enhancement events. The relativistic electron enhancement event during 8–16 February 2005, which lasted for 9 days in total is taken as an example.

Figures 7a7f show the observations from the GOES-10 satellite (black dots) and the predictions of the 1-hour transformer models (red dots) with different external parameters as inputs from 6 February to 17 February 2005. The combinations as inputs are Flux, (Flux, Bt), (Flux, N), (Flux, Pd), (Flux, Dst), and (Flux, MLT) from top to bottom, respectively. The external parameters used in modeling are shown in Figures 7g and 7h.

thumbnail Figure 7

(a)–(f) The comparisons of the prediction results of the 1-hour transformer models with different inputs (red dots) and observations from the GOES-10 satellite (black dots) during the relativistic electron enhancement event, and (g)–(h) Bt, N, Pd, and Dst between 6 and 17 February 2005.

Compared to the model only using log10 (≥2 MeV electron fluxes) as input during the relativistic electron enhancement event, the addition of appropriate external parameters can help to improve the prediction efficiencies. The PE values of the models with Flux, (Flux, Bt), (Flux, N), (Flux, Pd), (Flux, Dst), (Flux, AE), (Flux, R0), (Flux, Lm), and (Flux, MLT) as inputs are 0.950, 0.965, 0.966, 0.965, 0.965, 0.961, 0.961, 0.961, and 0.967. These external parameters can reflect the changes in the solar wind and geomagnetic index and should be used as the key indicators of relativistic electron enhancement events, especially Bt, N, Pd, and Dst.

We calculate the ≥2 MeV electron daily fluences based on our transformer model for the 1-day prediction with the best combination as input. There were 35 relativistic electron enhancement events in 2005. The PE values of the first day, the second day, and the end day of the relativistic electron enhancement events are −0.946, 0.754, and 0.576, respectively. It is obvious that the first days of the relativistic electron enhancement events are difficult to predict.

4.2 Discussion of the model performances during low-flux periods

In this section, we discuss the model performance during low-flux periods. There is a protracted low-flux period between 22 October and 27 October 2005, and we used this period as an example to illustrate the improvements of the model.

Adding suitable external parameters at the prediction moment is an effective way of improving the prediction efficiencies during low-flux periods, as shown in Figure A.1 in Supplementary material. Using Pd, R0, AE, SYM-H, Dst, Lm, and MLT as inputs aids in proving more accurate predictions compared to the model only using log10 (≥2 MeV electron fluxes) as input during low-flux periods, and the addition of parameters at prediction moment results in a higher PE value by comparing Figure A.1 with Figure A.2 in Supplementary material. The models in Figure A.1 (Supplementary material) did not use the parameters at the prediction moment when modeling, while the models in Figure A.2 (Supplementary material) did.

The low ≥2 MeV electron fluxes are due to the movement of the magnetopause. R0, the magnetopause subsolar distance is useful for low-flux predictions. The solar wind dynamic pressure (Pd) is one of the main factors causing the compression of the magnetopause. The compression of the magnetopause can cause obvious geomagnetic field disturbances, which can be reflected by AE, SYM-H, and Dst. The parameters related to the magnetopause and geomagnetic field can improve the models’ performances during low-flux periods.

The ≥2 MeV electron fluxes are not physically accurate when they are equal to 0.133 cm−2 · s−1 · sr−1. When these data are removed, the PE values rise. The PE values for 5-minute (discussed later), 1-hour, 3-hour, 6-hour, 12-hour, or 1-day predictions without the ≥2 MeV electron fluxes with 0.133 cm−2 · s−1 · sr−1 increase to 0.987, 0.958, 0.900, 0.838, 0.759, and 0.667 from 0.974, 0.940, 0.886, 0.828, 0.747, and 0.660, respectively.

4.3 Discussion of the model performances for the 5-minute prediction

Except for the 1-hour, 3-hour, 6-hour, 12-hour, or 1-day prediction models, we also developed the 5-minute prediction model using LSTM and transformer networks, which only give one forecast value for the next five minutes in turn.

The best offset time for the 5-minute prediction is selected at first. The performances of the Persistence model and the linear model are adequate for the 5-minute predictions, with the PE value reaching 0.966 and 0.968, respectively. The Persistence model uses the value at the current moment as the prediction at the following time step, and the linear model employs the linear equation created by the previous two data to produce the subsequent data. The 10-minute and 5-minute intervals are tested as offset time to evaluate the model performance, and the model with 10 minutes as offset time performs better than the model with 5 minutes as offset time. The offset time for the 5-minute predictions in this study is 10 minutes.

For predicting ≥2 MeV electron fluxes in the next five minutes in 2005, the PE and RMSE values of the LSTM model only using log10 (≥2 MeV electron fluxes) as input are 0.968 and 0.2053, respectively. We tested the addition of Vsw, kp, Lm, or MLT and found that those parameters improve model performance. The model with (Flux, Lm) as inputs performs best with the PE and RMSE values 0.970 and 0.2001, respectively. The model with (Flux, MLT) ranks second with the PE and RMSE values 0.969 and 0.2025, separately. Lm and MLT, which are associated with the geomagnetic field structure and not symmetrical about local time, are more significant for the 5-minute prediction model than solar wind parameters and geomagnetic indices. If only one external parameter is added in inputs for the 5-minute prediction model based on the LSTM method, Lm or MLT is recommended.

For the 5-minute prediction of ≥2 MeV electron fluxes in 2005, the PE and RMSE values of the transformer model only using log10 (≥2 MeV electron fluxes) as input are 0.972 and 0.1912, respectively. Model performances are improved when external parameters are added. The model with (Flux, Lm) as inputs perform best with the PE and RMSE values 0.974 and 0.1863, respectively, with the rank the same as the LSTM model. It demonstrated again that Lm and MLT, have more influence on the 5-minute prediction of ≥2 MeV electron fluxes than solar wind parameters and geomagnetic indices.

The comparisons of ≥2 MeV electron fluxes between the observations from the GOES-10 satellite and the 5-minute predictions of the LSTM and transformer models with different combinations as inputs are shown in Figure A.2 in Supplementary material. For the 5-minute prediction, the predictions of the LSTM models are basically in line with the observations from GOES satellites.

It can be seen that for the 5-minute predictions, the LSTM and the transformer models with (Flux, Lm) as inputs both performed well, with the PE values of 0.970 and 0.974 in 2005, respectively. The distributions of ≥2 MeV electron fluxes are directly impacted by the alteration in the form of a geomagnetic field, which also impacts the shape of the outer radiation belt. Lm and MLT, which are associated with the geomagnetic field structure, are more important for the 5-minute prediction than solar wind parameters and geomagnetic indices.

5 Conclusions

The variations in ≥2 MeV electron flux at GEO orbit are the result of the combined contribution of space environment parameters. In this study, we applied machine learning to predict ≥2 MeV electron fluxes at GEO orbit because machine learning can effectively deal with massive data samples and solve nonlinear problems. We developed models to predict ≥2 MeV electron fluxes with various prediction time scales using the transformer and LSTM networks. The main conclusions are as follows:

Based on the performances of models with different combinations as inputs, the best offset time is determined as four days for different prediction time scales (1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions). When only using one external parameter as input, the most important external factors for the LSTM models are found to be Vsw, N, and kp, while the addition of more external factors also improves transformer model predictions, such as Bz, Vsw, N, Pd, AE, SYM-H, kp, Dst, R0, Lm, and MLT.

For the 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions, the transformer models performed better than the LSTM models with different prediction time scales. Meanwhile, the predictions of the transformer models also showed more detailed temporal variations in fluxes than the LSTM models.

The best combinations of the LSTM models for 1-hour, 3-hour, 6-hour, 12-hour, and 1-day predictions are found to be (Flux, Dst), (Flux, Bt, Vsw), (Flux, N, Dst, Lm), (Flux, Vsw, N, Lm), and (Flux, Vsw, N) with PE values of 0.919, 0.811, 0.773, 0.554, and 0.490, respectively, and those of the transformer models are (Flux, MLT), (Flux, Bt, AE, SYM-H), (Flux, N), (Flux, N, Dst, Lm), and (Flux, Pd, AE) with PE values of 0.940, 0.886, 0.828, 0.747, and 0.660, respectively.

For the comparison of ≥2 MeV electron daily fluences, the PE value of our transformer model in 2005 is higher than those of other previous models. Moreover, our model can provide 288 data points for the next day at a time, which will describe the distribution of ≥2 MeV electron fluxes in more detail. In addition, our transformer model performs relatively well during the high-flux periods. The PE value of ≥2 MeV electron daily fluences can reach 0.965 based on the 1-hour prediction model.

We discussed the transformer model performances during the relativistic electron enhancement events and low-flux periods. The model performances during relativistic electron enhancement events can be improved by adding appropriate external parameters, such as Bt, N, Pd, R0, AE, SYM-H, Lm, and MLT. For the low-flux periods, adding one or more of the parameters of Pd, R0, AE, SYM-H, Dst, Lm, or MLT in the inputs improves the accuracy of predictions in low-flux periods.

Based on the evaluation results of our models above, it can be concluded that our models are suitable for filling the data gaps of ≥2 MeV electron fluxes.

Acknowledgments

This work was supported by grants from Project U2106201 of the National Natural Science Foundation of China (NSFC). The data used throughout this study are courtesy of NOAA/SWPC science teams. Thanks to the NOAA National Environmental Information Center (NCEI) for providing processed GOES series satellite data and OMNI for proving the solar wind parameters and geomagnetic disturbance indices. The authors also thank the IRBEM (international radiation environment modeling software library) for providing the internal geomagnetic field model and the external geomagnetic field model. Thanks to the China Scholarship Council (CSC) for providing the corresponding author the chance in GFZ section 2.7. We would like to thank Melanie Burns Allison for the paper polishing. Thanks to all the people in GFZ section 2.7 for the useful discussion, especially Stefano Bianco and Maximilian Pfitzer. The editor thanks Spiridon Kasapis, Richard Boynton and an anonymous reviewer for their assistance in evaluating this paper.

Supplementary material

Supporting Information for “A Modeling Study of ≥2 MeV Electron Fluxes at Different Prediction Time Scales Based on Both Networks” Access here

References

Cite this article as: Sun X, Wang D, Drozdov A, Lin R, Smirnov A, et al. 2024. A modeling study of ≥2 MeV electron fluxes in GEO at different prediction time scales based on LSTM and transformer networks. J. Space Weather Space Clim. 14, 25. https://doi.org/10.1051/swsc/2024021.

All Tables

Table 1

The comparisons of prediction efficiencies of ≥2 MeV electron hourly fluences of different models.

Table 2

The comparisons of prediction efficiencies of ≥2 MeV electron daily fluences of different models.

All Figures

thumbnail Figure 1

The distribution of the missing data of ≥2 MeV electron fluxes from the GOES-10 satellite.

In the text
thumbnail Figure 2

(a) The ≥2 MeV electron fluxes from the GOES-10 satellite, (b) the longitudes of GOES-10 satellite, (c) the number of the relativistic electron enhancement events (red line), and the number of days with relativistic electron enhancement events (black line), and (d) the percentage of missing data of the GOES-10 satellite from 1999 to 2010.

In the text
thumbnail Figure 3

The PE values (color-coded) of the LSTM models (a) and the transformer models (b) for the 1-day prediction with different offset times and different input parameters.

In the text
thumbnail Figure 4

The comparisons of ≥2 MeV electron fluxes between the observations from GOES-10 satellite (black dots) and the predictions of the LSTM models (red dots) (a) with (Flux, Dst) as inputs for 1-hour prediction, (b) with (Flux, Bt, Vsw) as inputs for 3-hour prediction, (c) with (Flux, N, Dst, Lm) as inputs for 6-hour prediction, (d) with (Flux, Vsw, N, Lm) as inputs for 12-hour prediction, (e) with (Flux, Vsw, N) as inputs for 1-day prediction, respectively.

In the text
thumbnail Figure 5

The comparisons of ≥2 MeV electron fluxes between the observations from GOES-10 satellite (black dots) and the predictions of the transformer models (red dots) (a) with (Flux, MLT) as inputs for 1-hour prediction, (b) with (Flux, Bt, AE, SYM-H) as inputs for 3-hour prediction, (c) with (Flux, N) as inputs for 6-hour prediction, (d) with (Flux, N, Dst, Lm) as inputs for 12-hour prediction, (e) with (Flux, Pd, AE) as inputs for 1-day prediction, respectively.

In the text
thumbnail Figure 6

(a) The PE values of the transformer models with different prediction time scales in each month in 2005, (b) the numbers of the relativistic electron enhancement events, (c) the PE values of the transformer model for the 1-hour prediction in different flux ranges, and (d) the data number in different flux ranges.

In the text
thumbnail Figure 7

(a)–(f) The comparisons of the prediction results of the 1-hour transformer models with different inputs (red dots) and observations from the GOES-10 satellite (black dots) during the relativistic electron enhancement event, and (g)–(h) Bt, N, Pd, and Dst between 6 and 17 February 2005.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.