Open Access

This article has an erratum: [https://doi.org/10.1051/swsc/2023031]


Issue
J. Space Weather Space Clim.
Volume 13, 2023
Article Number 26
Number of page(s) 21
DOI https://doi.org/10.1051/swsc/2023026
Published online 20 October 2023

© M. Nedal et al., Published by EDP Sciences 2023

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

Solar Energetic Protons (SEP) are high-energy particles that are believed to have originated from the acceleration of particles in the solar corona during coronal mass ejections (CMEs) and solar flares (Aschwanden, 2002; Klein & Dalla, 2017; Lin, 2005, 2011; Kahler et al., 2017). They are typically characterized by their high energy levels – with some particles having energies in the relativistic GeV/nucleon range – and their ability to penetrate through spacecraft shielding, causing radiation damage (Reames, 2013; Desai & Giacalone, 2016). The fluency and energy spectrum of SEP are influenced by several factors, including the strength of the solar flare or CME that produced them, and the conditions of the interplanetary environment (Kahler et al., 1984, 1987; Debrunner et al., 1988; Miteva et al., 2013; Trottet et al., 2015; Dierckxsens et al., 2015; Le & Zhang, 2017; Gopalswamy et al., 2017). SEP exhibits a strong association with the solar cycle, with the frequency and flux of SEP events peaking during the maximum phase of the solar cycle (Reames, 2013). This is thought to be due to the increased activity of the Sun during this phase, which leads to more frequent and powerful flares and CMEs. Previous studies have shown a relationship between the occurrence frequency of SEP and the sunspot number (SN; Nymmik, 2007; Richardson et al., 2016). However, the exact relationship between the solar cycle and SEP is complex and not fully understood. Hence, more work is needed to better understand this connection, as previous studies have reported intense SEP events during relatively weak solar activity (Cohen & Mewaldt, 2018; Ramstad et al., 2018).

SEP have been a subject of interest and research in heliophysics for decades. It is hypothesized that shock waves generated in the corona can lead to an early acceleration of particles. However, SEP have sufficient energy to propagate themselves by surfing the interplanetary magnetic fields (IMF), and therefore, the expanding CME is not necessary for their transport (Reames, 2000; Kóta et al., 2005; Kozarev et al., 2019, 2022). While this theory has gained acceptance, there is an ongoing debate among scientists over the specific mechanisms and conditions responsible for SEP production and acceleration. The creation, acceleration, and transport mechanisms of SEP are complex and involve a combination of magnetic reconnection, shock acceleration, and wave-particle interactions (Li et al., 2003, 2012; Ng et al., 2012). The specific mechanisms responsible for SEP production and acceleration can vary depending on the type and strength of the solar event that triggered them. Further research is imperative to better understand the processes involved in the production and transport of SEP in the heliosphere. This will facilitate the development of more precise models that assist in minimizing the impact of SEP on astronauts and space-based assets.

Several models are available, or under development, for forecasting SEP, which use diverse approaches and serve different objectives. These models comprise computationally complex physics based models, quick and simple empirical models, Machine Learning (ML)-based models, and hybrid models that combine different approaches and produce different types of outputs, including deterministic, probabilistic, categorical, and binary. Deterministic models always generate the same output without any randomness or stochastic components, such as predicting the SEP flux at a specific moment or the arrival time of SEP. On the other hand, probabilistic models provide a probability value that reflects the likelihood of a SEP event occurring. However, replicating SEP fluxes at a specific time is still a significant challenge for current models.

An excellent review of SEP models and predictive efforts was recently published by Whitman et al. (2022), which summarizes the majority of the existing models. For instance, Papaioannou et al. (2022) introduced the Probabilistic Solar Particle Event Forecasting (PROSPER) model, which is incorporated into the Advanced Solar Particle Event Casting System (ASPECS)1. The PROSPER model utilizes a Bayesian approach and data-driven methodology to probabilistically predict SEP events for three integral energy channels >10, >30, and >100 MeV. The model’s validation results indicate that the solar flare and CME modules have hit rates of 90% and 100%, respectively, while the combined flare and CME module have a hit rate of 100%. Bruno & Richardson (2021) developed an empirical model to predict the peak intensity and spectra of SEP at 1 AU between 10 and 130 MeV, using data from multiple spacecraft. The model is tested on 20 SEP events and shows good agreement with observed values. The spatial distribution of SEP intensities was reconstructed successfully, and they found a correlation between SEP intensities and CME speed. Hu et al. (2017) extended the Particle Acceleration and Transport in the Heliosphere (PATH) model to study particle acceleration and transport at CME-driven shocks. They showed that the model can be used to obtain simultaneous calculations of SEP characteristics such as time-intensity profiles, instantaneous particle spectra, and particle pitch angle distributions at multiple heliospheric locations. Overall, results resemble closely those observed in situ near the Earth but also yield results at other places of interest, such as Mars, making it of particular interest to Mars missions. SPREAdFAST (Kozarev et al., 2017, 2022) is a physics-based, data-driven framework that utilizes EUV observations and models to simulate SEP fluxes at 1 AU and to estimate energetic particle acceleration and transport to various locations in the inner heliosphere. It generates time-dependent histograms and movies distributing them through an online catalog. The accuracy and efficiency of the model were encouraging, but the highest energy fluxes showed disagreement with in situ observations by the SOHO/ERNE instrument. However, the framework has great potential for space weather science and forecasting.

Aminalragia-Giamini et al. (2021) used neural networks to provide probabilities for the occurrence of SEP based on soft X-ray data from 1988 to 2013. They obtained >85% for correct SEP occurrence predictions and >92% for correct no-SEP predictions. Lavasa et al. (2021) described a consistent approach to making a binary prediction of SEP events using ML and conventional statistical techniques. The study evaluated various ML models and concluded that random forests could be the best approach for an optimal sample comprising both flares and CMEs. The most important features for identifying SEP were found to be the CME speed, width, and flare soft X-ray fluence. Kasapis et al. (2022) employed ML techniques to anticipate the occurrence of a SEP event in an active region that generates flares. They utilized the Space-Weather MDI Active Region Patches (SMARP) dataset, which comprises observations of solar magnetograms between June 1996 and August 2010. The SMARP dataset had a success rate of 72% in accurately predicting whether an active region that produces a flare would result in a SEP event. Moreover, it provided a competitive lead time of 55.3 min in forecasting SEP events.

Engell et al. (2017) introduced the Space Radiation Intelligence System (SPRINTS), a technology that uses pre- and post-event data to forecast solar-driven events such as SEP. It integrates automatic detections and ML to produce forecasts. Results show that SPRINTS can predict SEP with a 56% probability of detection and 34% false alarm rate. Nevertheless, the HESPERIA REleASE tools provide real-time predictions of the proton flux at first Lagrange point (L1) by using near-relativistic electrons as a warning for the later arrival of protons and have been set to operation (Malandraki & Crosby, 2018). Historical data analysis indicates high prediction accuracy, with a low false alarm rate of approximately 30% and a high probability of detection of 63% (Malandraki & Crosby, 2018).

Forecasting SEP is a critical task that serves operational needs and provides insight into the broader field of space weather science and heliophysics. As emphasized in previous works, a high-precision forecasting model is urgently required to predict SEP flux within a period of time, given the risks associated with these events. This highlights the critical requirement for a dependable forecasting system that can mitigate the risks associated with SEP.

Scientists have been using physics-based and empirical models for decades to forecast SEP. However, these models have certain limitations. Physics-based models require accurate input data and underlying physical assumptions. In addition, the complexity of the physics involved and in correct parameters may introduce uncertainties that can lead to inaccurate predictions. On the other hand, empirical models rely on historical data to make predictions. While they can be accurate sometimes, they may be unable to account for changes in physical conditions related to the acceleration and propagation of SEP, which can influence prediction accuracy. ML models, however, provide a different approach to SEP forecasting. These models can analyze vast amounts of data, learn patterns from the data that are used, and connections that may not be obvious to experts. Additionally, ML models can adapt to changes in underlying physical conditions, resulting in more accurate predictions as more data is collected; they also provide relatively rapid forecasts, which allows for incorporation into a real-time forecasting workflow.

In the upcoming sections, we will explore the limitations in accuracy that arise from dealing with an imbalanced dataset and low-resolution data. Specifically, the presence of intrinsic outliers in the time series data pertaining to SEP flux poses a significant challenge in modeling. These outliers correspond to occurrences of SEP events and, consequently, have an impact on the accuracy of predictions. Notably, they often lead to an underestimation of the SEP fluxes, primarily due to the predominance of relatively low values throughout the majority of the time interval.

In this paper, we present advanced deep-learning models to forecast the daily integral flux of SEP over a 3-day forecasting window by using bidirectional long short-term memory (BiLSTM) neural networks, for three energy channels (>10, >30, and >60 MeV). Our models can forecast the time-dependent development of SEP events in different energy domains, which can be used to model the space radiation profiles using frameworks such as BRYNTRN Wilson et al. (1988) and GEANT4 (Truscott et al., 2000). We describe the data selection and pre-processing in Section 2. We present an overview of the analysis methods and the models implemented in Section 3. Then we show the forecasting results in Section 4. Finally, the summary and implications are presented in Section 5.

2 Data preparation

In this section, we describe the physical quantities, the types of inputs and their sources, as well as the outputs we are forecasting. Some of the technical terms used in this study are explained further in the Appendices AE.

In order to capture the variability of solar activity, which modulates the SEP flux, we selected input physical quantities that describe both the interplanetary medium and solar activity. These input features can be categorized into two groups: remote signatures and in-situ measurements.

The remote signatures consist of the F10.7 index, as well as the long-wavelength (XL) and short wavelength (XS) X-ray fluxes. The F10.7 index represents the flux of solar radio emission at a wavelength of 10.7 cm, measured in solar flux units (sfu). To obtain the x-ray fluxes, we utilized 1- and 5-minute averaged data from the Geostationary Operational Environmental Satellite (GOES) database2, specifically at long wavelengths (1–8 Å) and short wavelengths (0.5–4.0 Å).

The in-situ measurements encompass the near-Earth solar wind magnetic field and plasma parameters. These include the solar wind speed (in km s−1), average interplanetary magnetic field (IMF) strength (in nT), and the integral SEP fluxes at three energy channels: >10, >30, and >60 MeV, which correspond to the GOES channels (in 1/cm2 sec ster). These SEP fluxes were obtained from multiple spacecraft stationed at L1 throughout the study period. In particular, the IMF and plasma data in the OMNI database are obtained from the IMP, Wind, and ACE missions, while the energetic particle fluxes are obtained from the IMP and GOES spacecraft3.

To ensure a comprehensive dataset, we acquired hourly-averaged data covering a timeframe from December 1976 to July 2019, which spans the past four solar cycles. These data were sourced from the Space Physics Data Facility (SPDF) OMNIWeb database4, hosted by the Goddard Space Flight Center. This database provides a wealth of information, including integral proton fluxes, as well as an extensive range of solar wind plasma and magnetic field parameters. Lastly, the daily data on sunspot numbers were obtained from the Sunspot Index and Long-term Solar Observations (SILSO) archive5, maintained by the World Data Center.

Figure 1 shows a plot for the time series data of all features. The top three panels are the logarithms of the SEP integral flux at the three energy channels (log_PF10, log_PF30, and log_PF60), then the sunspot number, the F10.7 index (F10_idx), the logarithms of the X-ray fluxes (log_Xs and log_Xl), the solar wind speed (Vsw), and the average magnitude of the IMF (avg_IMF). Throughout this paper, we adopt the convention that “log” refers to the common logarithm with a base of 10. The gray shades refer to the timespan of solar cycles. The blue, orange, and gold colors refer to the training, validation, and test sets, respectively. The data split method will be explained shortly.

thumbnail Figure 1

Data splitting for all input features, showing the training, validation, and testing sets. Daily data from 1976-12-25 00:00 to 2019-07-30 00:00. The gray shading labels the solar cycles from SC21 to SC24.

Since the input SEP data have been compiled from various spacecraft, it may have artifacts even after processing. In particular, there are occasional jumps in the background level. There are also several day-long gaps in the OMNI solar wind parameters from the early 1980s to mid-1990s where only IMP 8 data are available and this spacecraft spent part of each orbit in the magnetosphere. We are reasonably confident that these issues do not influence the overall analysis significantly.

In deep learning applications, the dataset is split into three sets; namely the training set, the validation set, and the test set. The training set is usually the largest chunk of data that is used to fit the model. The validation set is a smaller chunk of data used to fine-tune the model and evaluate its accuracy to ensure it is unbiased. The test set is the out-of-sample data exclusively used to assess the final model when performing on unseen data (Ripley, 1996).

After inspecting the correlation between the solar wind indices and the SEP integral fluxes in the OMNIWeb database, we chose the top-correlated features with the SEP flux. The correlations were made between the SEP fluxes and the individual parameters. Hence we took only time series of logarithms of the protons’ integral flux at three energy channels (>10, >30, and >60 MeV), the time series of logarithm of the X-ray fluxes, the F10.7 index, the sunspot number, the solar wind speed, and the average strength of the IMF as input parameters to our model. The log of the SEP flux was used across the whole study. The correlation matrices for the training, validation, and test sets are shown in Figure 2. The X-ray and proton fluxes were converted into the logarithmic form because it was more convenient than the original form of data since the time series data were mostly quiet and had numerous sharp spikes, which correspond to solar events. Based on a previous experience with neural networks (NNs) (Nedal et al., 2019), we found that training separate models for each target (output) feature can lead to better results. This is because a dedicated model for each output feature can more easily learn the interrelationships between input features and make more accurate predictions. Therefore, in our current study, we trained three separate models, each one targeting the logarithm of the proton’s integral flux at a specific energy channel.

thumbnail Figure 2

Correlation matrices show the correlation between the features in the training, validation, and test sets.

In order to ensure consistency across all features, all durations of the time series data of the physical quantities were matched to be within the same time range. Subsequently, the dataset was resampled to obtain daily averaged data, resulting in a significant reduction of the dataset size by a factor of 24. This reduction facilitated expeditious training and yielded prompt results. There were missing data values in the original dataset for the Bavg (~10.7%), Vsw (~10.5%), F10.7-index (~0.08%), short-band X-ray flux (~8%), long-band X-ray flux (~9.8%), and proton fluxes (~4.3%). The data gaps were linearly interpolated.

In time series forecasting, it is a common practice to take a continuous set of data points from the main dataset to be the validation set and another smaller chunk of data to be the test set, for instance in Pala & Atici (2019), Benson et al. (2020), Zhang et al. (2022), and Zhu et al. (2022). From our experiments, we got decent results when we applied the same data split method, but the results were a bit biased toward the end of the solar cycle 24 and the testing set was biased towards a quiet period. So, we adapted the 9-2-1 strategy, that is taking from each year 9 months to be added to the training set, 2 months to be added to the validation set, and 1 month to be added to the test set. This is applied over the ~43 years of data (Fig. 1), which yields 74.29% of the data for the training set, 16.2% for the validation set, and 9.51% for the testing set. By doing so, we eliminated the need to do cross-validation and hence, made the training more efficient. It is worth mention that the time series data must not be shuffled as that will break the temporal and logical order of measurements, which must be maintained.

3 Method

In this section, we introduce the data analysis methods used in this work. We start by explaining the model selection phase, followed by a discussion of the BiLSTM neural network architecture. The technical terminologies are described in Appendices AE.

3.1 The Bi-LSTM Model

Recurrent neural networks (RNNs) that support processing input sequences both forward and backward are known as BiLSTM neural networks (Schuster & Paliwal, 1997). Regular RNNs (Hochreiter & Schmidhuber, 1997; Kolen & Kremer, 2001) depend on the prior hidden state and the current input to determine the output at a given time. The output of a BiLSTM network, on the other hand, is dependent on the input at a given moment as well as the previous and future hidden states. As a result, the network is able to make predictions using contexts from the past as well as the future. Hence, accuracy is improving. Each BiLSTM layer consists of two LSTM layers: a forward layer that processes the input sequences from the past to future and a backward layer that processes the input sequences from the future to the past, as illustrated in Figure 3, to capture information from both past and future contexts. The output from each layer is concatenated and fed to the next layer, which can be another BiLSTM layer or a fully connected layer for final prediction.

thumbnail Figure 3

Architecture of a single BiLSTM layer. The blue circles at the bottom labeled by (x0, x1, x2, …, xi) are the input data values at multiple time steps. The purple circles, on the other hand, are the output data values at multiple time steps labeled by (y0, y1, y2, …, yi). The dark green and light green boxes are the activation units of the forward layer and the backward layer, respectively. The orange and yellow circles are the hidden states at the forward layer and the backward layer, respectively. Both the forward and backward layers compose a single hidden BiLSTM layer. The figure is adapted from Olah (2015).

BiLSTM networks are more advantageous than traditional LSTM networks in a variety of aspects (Graves & Schmidhuber, 2005; Ihianle et al., 2020; Alharbi & Csala, 2021). First, as we demonstrate in this study, they are excellent for tasks like time series forecasting, as well as speech recognition and language translation (Wöllmer et al., 2013; Graves & Jaitly, 2014; Sundermeyer et al., 2014; Huang et al., 2018; Nammous et al., 2022) because they can capture long-term dependencies in the input sequence in both forward and backward directions. Second, unlike feed-forward networks, BiLSTM networks do not demand fixed-length input sequences, thus being able to handle variable-length sequences better. Furthermore, by taking into account both past and future contexts, BiLSTM networks can handle noisy data. However, BiLSTM networks are computationally more expensive than regular LSTM networks due to the need for processing the input sequence in both directions. They also have a higher number of parameters and require more training data to achieve good performance.

The final dataset has 7 features, including the target feature, from December 25th 1976 to July 30th 2019, with a total of 15,558 samples (number of days). The training set has 11,558 samples, the validation set has 2520 samples, and the test set has 1480 samples.

The input horizon of 270 steps (30 days × 9 months) was used. A data batch size of 30 was used, which is the number of samples processed that result in one update to the model’s weights (Appendix B). The model consists of 4 BiLSTM layers with 64 neurons each, and an output dense layer with 3 neurons, representing the output forecasting horizon. The total number of trainable parameters is 333,699. The number of training epochs was set to 50 because from experiments, the model stopped improving remarkably after almost 50 epochs. Thus, there was no need to waste time and computational resources to train the model for more than 50 epochs.

The ModelCheckpoint callback function was used to register the model version with the minimal validation loss. The EarlyStopping callback function was used to halt the model run when detecting overfitting, with a patience parameter of 7. ReduceLROnPlateau callback function was used to reduce the learning rate when the validation loss stops improving, with a patience parameter of 5, a reduction factor of 0.1 and minimal learning rate of 1e−6.

3.2 Model selection

To determine the most suitable model for our objective and provide justifiable reasons, we conducted the following analysis. First, we examined the naive (persistence) model, which is very simplistic and assumes that the time series values will remain constant in the future. In other words, it assumes that the future value will be the same as the most recent historical value. That was the baseline. Next, we examined the moving-average model, which calculates the future values based on the average value of historical data within a specific time window. This gives a little bit lower error.

After that, we went towards the ML-based models. For all the ML models, we chose the Adaptive moment estimation (Adam) optimizer (Kingma & Ba, 2015) as the optimization algorithm due to its minimal memory requirements and high computational efficiency as it is well-suited for applications that involve large number of parameters or large datasets. As a rule of thumb, we set the optimizer’s learning rate to be 0.001 as it is usually recommended (Zhang et al., 2022).

In order to prepare the data in a readable format for the ML models, we created a windowed dataset with an input horizon of 365 steps representing 1 year of data and an output horizon of 3 steps representing the forecast window of 3 days. We call this windowing method as Multi-Input Multiple Output (MIMO) strategy, in which the entire output sequence is predicted in one shot. The MIMO strategy adopts the sliding window method that was mentioned in Benson et al. (2020) in which each sequence is shifted by one step with respect to the previous sequence until reaching the end of the available data (Fig. 4). This approach minimized the imbalance of active days, with high SEP fluxes, and quiet days.

thumbnail Figure 4

Illustration of the sliding window technique for a sample of 10 timesteps, where each number denotes a distinct time step. As an example here, the input horizon (blue color) length is 4 timesteps and the output horizon length is three timesteps. The input window slides one time step at a time across the entire data sequence to generate four distinct input and forecast horizon pairs. The purple, orange, and green colors of the output horizon represent 1-day, 2-day, and 3-day ahead forecasting, respectively. The timesteps of 1-day ahead forecasting across the data sequences are then concatenated into a single timeseries list that is called 1-day ahead prediction. The same for 2-day and 3-day ahead.

After experiments with different loss functions and evaluating their performance on our dataset, we chose the Huber function (C1) as the loss function, and the Mean Absolute Error (MAE) was used as the metric function to monitor the model performance. We used the Huber function because it is robust and combines the advantages of both Mean Squared Error (MSE) and MAE loss functions. It is less sensitive to outliers than MSE, while still being differentiable and providing gradients, unlike MAE. Since our data is noisy and contains outliers that may negatively impact the model’s performance, the Huber loss function is a good choice.

We examined various neural network models to determine the optimal architecture for our task. Initially, we started with a simple linear model comprising of a single layer with a single neuron. However, this model did not yield satisfactory results. We then explored a dense ML model consisting of 2 hidden layers, each with 32 neurons and a RelU activation function. Next, we experimented with a simple RNN model with the same number of hidden layers and neurons. To find the optimal learning rate, we utilized the LearningRateScheduler callback function and discovered that a rate of 1.58e−4 under the basic settings minimized the loss. We proceeded to examine stateful versions of RNN, LSTM, and BiLSTM models with three hidden layers, each with 32 neurons and a learning rate of 1.58e−4. In addition, we explored a hybrid model that consisted of a 1-dimensional convolutional layer with 32 filters, a kernel size of 5, and a RelU activation function. We combined this with a two-hidden layer LSTM network with 32 neurons each and a learning rate of 1.58e−4. We experimented with Dropout layers but did not observe any significant improvement in the results. Finally, we evaluated a BiLSTM model with five hidden layers, 64 neurons each, and a learning rate of 0.001. Based on the evaluation of all the models on both the validation and test sets (Fig. 5 and Table D1 in Appendix D), we selected the BiLSTM model for further refinement. More details on the final model architecture and hyperparameters are explained in Appendix D. Figure 5 presents a comparative analysis of the Huber loss within the validation and testing sets across the ten aforementioned models. We used several evaluation measures to assess our models since each metric provides valuable insights into the accuracy and performance of the forecasts (Appendix C), helping to identify areas for improvement and adjust the forecasting models accordingly.

thumbnail Figure 5

Benchmarking of 10 models, shows the Huber loss for the validation and test sets.

4. Results and discussion

The benchmarking in Figure 5 showed that, in general, the ML-based methods were not much different. On the other hand, the persistence model and moving average model resulted in the highest errors compared with the ML-based models, and their results were close to some extent. As we see, the BiLSTM model performed the best over both the validation and test sets compared with the other models.

We developed and trained three BiLSTM models to forecast the integral flux of SEP, one model per energy channel. After the training was completed, we evaluated the performance of the models from the loss curve (Fig. 6) using the Huber loss (the left panel) and the metric MAE (the middle panel). During the training, the learning rate was reduced multiple times via the LearningRateScheduler callback function (the right panel). The left panel quantifies the discrepancy between the model’s predictions and the true values over time. It shows how the Huber loss function changes during the training iterations (Epochs) for the training and validation sets for the three energy channels so that each channel has one color. The middle panel shows how the model’s metric MAE changes with training epochs. It is used to evaluate the performance of the trained model by measuring the average absolute difference between the model’s predictions and the true values, providing a single numerical value that indicates the model’s error at a given epoch. The right panel shows how the learning rate of the model’s optimizer changes with epochs via the LearningRateScheduler callback function, which changes the learning rate based on a predefined schedule to improve training efficiency and convergence. The learning rate refers to the rate at which the model’s parameters are updated during the training process. We noticed that at the epochs where the learning rate has changed, there were bumps in the loss curves across all the energy channels, which is expected. This highlights the boundaries within which the learning rate yields better performance.

thumbnail Figure 6

Left Panel – The Huber loss vs. the number of training epochs for the BiLSTM model for the validation and test sets, for the 3 energy channels. Middle Panel – The mean absolute error (MAE); the model’s metric vs. the number of training epochs. Right Panel – Shows how the learning rate of the Adam optimizer changes over the number of epochs.

From experimentation, we found that the batch size and the optimizer learning rate are the most important hyperparameters that have a strong influence on the overall model’s performance (Greff et al., 2016). In addition, adding dropout layers as well as varying the number of hidden layers and hidden neurons resulted in only marginal improvements to the final model performance, while substantially increasing training time and requiring greater computational resources.

The term batch size refers to the number of data sequences processed in one iteration during the training of ML model (Goodfellow et al., 2016). Initially, a batch size of 64 was selected, however, we observed that the model produced better results when a batch size of 30 was used instead. This could be related to the Carrington rotation, which lasts for ~27 days. There were ~570 Carrington rotations between December 25th 1976 and July 30th 2019. Therefore, updating the model’s weights after every Carrington rotation could be a reasonable choice for improving its performance. Figure 7 shows how good the model predictions are (on the y-axis) compared with the observations of the validation set (on the x-axis). The blue, orange, and gold colors refer to 1-day, 2-day, and 3-day ahead predictions, respectively. The top panel is for the >10 MeV channel, the middle panel is for the >30 MeV channel, and the bottom panel is for the >60 MeV channel. The left column is for the entire validation set, while the right column is for the observations points ≥10 proton flux units (pfu). That is the threshold value of proton flux as measured by the National Oceanic and Atmospheric Administration (NOAA) GOES spacecraft to indicate the severity of space weather events caused by SEP.

thumbnail Figure 7

Correlation between the model predictions and observations for 1-day, 2-day, and 3-day ahead for >10 MeV (top panel), >30 MeV (middle panel), and >60 MeV (bottom panel). The panels in the left column represent all the points of the validation set and those in the right column represent all the observation points with daily mean flux ≥10 pfu.

We found that, overall, the models performed very well. The R correlation was >0.9 for all points of the validation set across the forecasting windows for the three energy channels. The R correlation was >0.7 for the observations points ≥10 pfu as well. However, the correlation between the modeled data and the observations exhibited a decline as the forecast horizon increased, in accordance with the anticipated result.

To confirm the validity of the models, we performed the same correlation analysis between the modeled data and the observations of the out-of-sample test set (Fig. 8), which was not given to the model. Again, we found a high correlation across the forecasting windows for the three energy channels. The points were more dispersed between 1 and 1.5 on the x-axis, which reflected in a bit lower correlation. This might be a limitation in the current version of the model between that range of SEP fluxes since the models underestimated the flux values within that range across all energy channels, possibly due to the relatively smaller training samples with fluxes above 10 pfu compared with the majority of the data.

thumbnail Figure 8

Same as Figure 7 but for the test set.

In order to see the temporal variation of the correlation between the modeled data and the observations, we applied a rolling window of 90 steps (3 months × 30 days/month = 1 season) that shows the seasonal variation of the correlation, as shown in Figure 9. Here, we show only the 1-day ahead predictions for the test set, for the three energy channels. We observe drops in the correlation factor synchronized with the transition between solar cycles (e.g., particularly between ~1995 and 2000, which represents the declining phase of the solar cycle 22 and the rising phase of the solar cycle 23). This could be related to the fact that the low SEP fluxes during quiet times are more random and thus more difficult to forecast (Feynman et al., 1990; Gabriel et al., 1990; Rodriguez et al., 2010; Xapsos et al., 2012).

thumbnail Figure 9

Comparison between the model outputs and observations of the test set for the 3 energy channels. In addition to the rolling-mean window correlation for 1-day ahead predictions.

During periods of low solar activity, the forecasting of low SEP fluxes becomes more challenging due to their increased randomness. This difficulty arises from the reduced occurrence of conventional SEP drivers, such as solar flares and CMEs. Studies have suggested that the most significant solar eruptions tend to happen shortly before or after the solar cycle reaches its maximum (Švestka, 1995). Additionally, sporadic increases in solar activity have been observed (Kane, 2011), which might contribute to the diminished correlations observed in our research. There is clearly some factor that is influencing the correlation during certain periods where there are no or only weak SEP events. However, it is not obvious which physical phenomena are the cause rather than, for instance, some artifact of the data. Understanding the interplay between these factors and their influence on SEP fluxes during periods of reduced solar activity remains a critical area of research. It would be interesting to find what is reducing the correlations, thus more investigation is needed.

Overall, the modeled data were correlated the most with observations at >60 MeV, then the second rank was for the >10 MeV channel, and the third rank was for the >30 MeV channel. That could be related to the relatively larger extent of drops in correlation at the >30 MeV channel. The decline in correlation at the >30 MeV channel is consistent with the findings of Le & Zhang (2017). A summary of the performance results of the models for both the validation set and test set is presented in Table 1.

Table 1

Summary of the performance results of the models for the validation and test sets.

From the visual inspection of the test set examples (Figs. 1012), we found that the predicted onset time, the peak time, and end times of SEP events were highly correlated with those of the observations, which implies that the model captured the temporal variations, as well as the trends in SEP flux.

thumbnail Figure 10

The model’s forecasts for the out-of-sample testing set for the >10 MeV channel are shown at forecast horizons of 1 day, 2 days, and 3 days ahead, using samples of data from December in selected years mentioned in the top-left side of the plots.

thumbnail Figure 11

The model’s forecasts for the out-of-sample testing set for the >30 MeV channel are shown at forecast horizons of 1 day, 2 days, and 3 days ahead, using samples of data from December in selected years mentioned in the top-left side of the plots.

thumbnail Figure 12

The model’s forecasts for the out-of-sample testing set for the >60 MeV channel are shown at forecast horizons of 1 day, 2 days, and 3 days ahead, using samples of data from December in selected years mentioned in the top-left side of the plots.

To get further insight into the model’s performance, we conducted an assessment of various skill scores, including True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). Additionally, skill score ratios such as Probability of Detection (POD), Probability of False Detection (POFD), False Alarm Rate (FAR), Critical Success Index (CSI), True Skill Statistic (TSS), and Heidke Skill Score (HSS). Detailed descriptions of these skill scores can be found in Appendix E. To extract individual SEP events from the test dataset, we implemented a threshold-based clustering algorithm. This algorithm uses the NOAA/SWPC warning threshold value of 10 pfu for the E ≥ 10 MeV channel. Upon analysis, we identified the number of detected SEP events for each output forecasting window and calculated the skill scores (Table 2). In the true test set, we identified 12 SEP events.

Table 2

Confusion matrix for the energy channel ≥10 MeV predictions in the test set.

The evaluation of the model revealed notable trends as the length of the output forecasting window increased. The POD and CSI exhibited a declining pattern, indicating a reduced ability of the model to accurately detect and capture positive events (SEP occurrences) as the forecasting horizon extended further into the future. This suggests that the model’s performance in identifying and capturing true positive instances diminishes with longer forecasting windows. Moreover, the POFD demonstrated an increasing trend, indicating an elevated rate of false positive predictions as the forecasting horizon lengthened. The model’s propensity to generate false alarms rose with the lengthening forecasting window, leading to incorrect identification of non-events as positive events. Consequently, the TSS and HSS exhibited decreasing values, signifying a deterioration in the model’s overall skill in accurately capturing and distinguishing between positive and negative instances. Overall, our skill scores are comparable with those reported by previous studies (Table 3). Although the UMASEP model does better than ours (i.e., has a higher POD), our FAR is much lower, thus, making fewer false alarms than the UMASEP model.

Table 3

Comparing the skill scores with previous models. The dashed entries mean the data is unavailable (see Whitman et al., 2022, for more details).

5 Conclusions

Forecasting the SEP flux is a crucial task in heliophysics since it affects satellite operations, astronaut safety, and ground-based communication systems. It is a challenging task due to its non-linear, non-stationary, and complex nature. Machine learning techniques, particularly neural networks, have shown promising results in predicting SEP flux. In this study, we developed and trained BiLSTM neural network models to predict the daily-averaged integral flux of SEP at 1-day, 2-day, and 3-day ahead, for the energy channels >10 MeV, >30 MeV, and >60 MeV. We used a combination of solar and interplanetary magnetic field indices from the OMNIWeb database for the past 4 solar cycles as input to the model. We compared the models with baseline models and evaluated them using the Huber loss and the error metrics in Appendix C.

The data windowing method we used, based on the MIMO strategy, eliminates the need to feed the output forecast as input back into the model and that allows forecasting relatively far into the future while maintaining decent results (e.g., the MSE ranging between 0.007 and 0.015 for 1-day forecasting in the test set, compared to an MSE of 0.236 for a persistence model. See Table 1). The results show that the model can make reasonably accurate predictions given the difficulty and complexity of the problem. The MSE was ranged between 0.009 and 0.055 for the validation set, and between 0.007 and 0.05 for the test set. The correlations between the observations and predictions were >0.9 for the validation and test sets (Figs. 7 and 8). Nevertheless, the mean temporal correlation was ~0.8 for the test set (Fig. 9). Although our models performed well, we observed a relatively large discrepancy between the predictions and the observations in the >30 MeV energy band.

The findings of this study underscore the challenges encountered by the forecasting model in accurately predicting SEP data over longer time periods. As the length of the output forecasting window increased, the model’s ability to detect true positives and its overall skill in differentiating positive and negative instances diminished. Additionally, the model displayed an elevated rate of false negative predictions, indicating an increased tendency to generate misses as the forecasting horizon extended. These results highlight the importance of carefully considering the appropriate forecasting window length for SEP data to ensure the model’s optimal performance. Our skill scores generally align with those from previous works (Table 3). There are variations in the metrics’ values across different studies, highlighting the complexities and nuances associated with each study. Nevertheless, it is important to acknowledge that the statistical significance of the results in this study is limited due to data averaging. Future studies should consider incorporating hourly data, as this is likely to result in a greater number of identified events. The model can provide short-term predictions, which can be used to anticipate the behavior of the near-Earth space environment. These predictions have important implications for space weather forecasting, which is essential for protecting satellites, spacecraft, and astronauts from the adverse effects of solar storms.

Multiple techniques exist for identifying the optimal combination of hidden layers and neurons for a given task such as empirical methods, parametric methods, and the grid search cross-validation method, which we will explore in future work. The observed reduction in correlation necessitates further investigation to determine its origin, whether stemming from tangible causal factors or potential aberrations within the model or data. We plan to expand upon this work by performing short-term forecasting using hourly-averaged data. This extension will involve integrating additional relevant features such as the location and area of active regions and coronal holes on the Sun.

BiLSTM networks are particularly useful for tasks involving sequential data such as time series forecasting. Given their capacity to handle input sequences in both directions in time and capture long-term dependencies, they are valuable in a broad range of applications. Nonetheless, one should carefully consider their data requirements and computational complexity before adopting them. Our results emphasize that the use of deep learning models in forecasting tasks in heliophysics are promising and encouraging, as pointed out by Zhang et al. (2022).

This work is a stepping stone towards real-time forecasting of SEP flux based on the public-available datasets. As an extension, we are currently working on developing a set of models that deliver near-real time prediction of SEP fluxes at multiple energy bands, multiple forecasting windows, with hourly-averaged data resolution, with a more sophisticated model architecture, as well as more features that address the state of solar activity more comprehensively. We plan to extend the analysis to include more recent data from solar cycle 25, in order to improve the accuracy of the models. In conclusion, our study highlights the potential of using BiLSTM neural networks for forecasting SEP integral fluxes. Our models provide a promising approach for predicting the near-Earth space environment too, which is crucial for space weather forecasting and ensuring the safety of our space assets. Our findings contribute to the growing body of literature on the applications of deep learning techniques in heliophysics and space weather forecasting.

Appendix A

Terminologies

In this section, we introduce the main concepts related to machine learning which are presented in this paper.

  • Cross-validation: A technique used to evaluate the performance of a machine learning model by dividing the data into subsets and assessing the model on different combinations of these subsets.

  • Input horizon: The number of previous time steps considered as input to a model for time series forecasting. It represents the length of the historical sequence used for predictions.

  • Batch size: The number of samples processed together in a single iteration of the training algorithm. It affects training speed and memory requirements.

  • Updating the model’s weights: The process of adjusting the parameters of a neural network based on training data to minimize the difference between predicted and true outputs. The model’s weights represent the parameters that are learned during the training process.

  • Loss: A function that quantifies the difference between predicted and actual outputs. It guides the optimization process during training.

  • Minimum validation loss: The lowest value achieved by the loss function on a validation dataset during training. It indicates the most accurate predictions on unseen data.

  • Overfitting: When a model performs well on training data but fails to generalize to unseen data due to memorizing training examples instead of learning underlying patterns.

  • Learning rate: A hyperparameter that determines the step size at each iteration of the optimization algorithm during training. It affects learning speed and convergence. A high learning rate can cause the training process to converge quickly, but it may also result in overshooting the optimal solution or getting stuck in a suboptimal solution. On the other hand, a very low learning rate can make the training process slow and may struggle to find the optimal solution.

  • Reducing the learning rate when the validation loss stops improving: This concept involves adjusting the learning rate dynamically during the training process. When the validation loss reaches a plateau or stops improving, it indicates a suboptimal point. By reducing the learning rate, the model can take smaller steps in weight space, potentially finding a better solution. This technique, known as learning rate scheduling or learning rate decay, is commonly used to fine-tune the model’s performance.

  • Patience: A parameter used in training to determine the number of epochs to wait for an improvement in validation loss before stopping the training process.

  • Patience parameter of 7: In the context of early stopping, training will be stopped if the validation loss does not improve for 7 consecutive epochs.

  • Adam optimizer: A popular optimization algorithm in deep learning that combines Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSprop) to achieve efficient optimization.

  • Optimal architecture: The best configuration of a neural network, including the number of layers, neurons, and other choices, for optimal performance on a specific task.

  • Hyperparameters: Parameters set before training a model that control the learning algorithm’s behavior, such as learning rate, batch size, and activation functions.

  • Layer: A building block of a neural network that performs specific operations on input data. Includes input, hidden, output, fully connected, convolutional, recurrent, activation, and dropout layers. Here is a description for each layer:

    • Input layer: The first layer of a neural network that receives raw input data. It passes the input to subsequent layers for further processing. The number of nodes in the input layer is determined by the dimensionality of the input data.

    • Hidden layers: Intermediate layers between the input and output layers. They perform computations on the input data and capture higher-level representations or abstractions. Hidden layers are not directly exposed to the input or output.

    • Output layer: The final layer of a neural network that produces model predictions or outputs based on computations from preceding layers. The number of neurons in the output layer depends on the problem being solved, such as regression or classification.

    • Fully connected layer (dense layer): Each neuron in this layer is connected to every neuron in the previous layer. It allows information flow between all neurons, enabling complex relationships to be learned.

    • Convolutional layer: Commonly used in Convolutional Neural Networks (CNNs) for analyzing grid-like data, such as images. It applies convolution operations using filters or kernels to learn spatial patterns or features.

    • Recurrent layer: Used in Recurrent Neural Networks (RNNs) to process sequential data. These layers have feedback connections that allow information to be passed from one step to the next, capturing temporal dependencies and maintaining memory of past inputs.

    • Activation layer: Applies a non-linear function to the output of a layer, introducing non-linearity into the neural network. Activation functions like Sigmoid, Hyperbolic Tangent (tanh), or Rectified Linear Unit (ReLU) determine neuron outputs based on weighted inputs.

    • Dropout layer: A regularization technique commonly used in deep learning models. It randomly sets a fraction of outputs from the previous layer to zero during training, preventing overfitting and improving generalization.

Layers play a crucial role in the information processing and learning capabilities of neural networks. The arrangement and combination of different layers determine the network’s architecture and ultimately its ability to solve specific tasks.

  • Stateful: A property of Recurrent Neural Networks (RNNs) where the hidden state is preserved between consecutive inputs, allowing the network to have memory.

  • Neuron: A computational unit in a neural network that receives input, applies weights, and passes the result through an activation function to produce an output.

  • Hidden neuron: A neuron in a hidden layer of a neural network that performs intermediate computations.

  • Callback function: A function used during model training to perform specific actions at certain points or conditions, such as saving the best model, adjusting learning rates, or early stopping.

  • LearningRateScheduler callback function: A function used in training to dynamically adjust the learning rate at specific points based on a predefined schedule or function. It improves training efficiency and convergence by allowing the model to make finer adjustments as it approaches the optimal solution.

Appendix B

Mathematical Representation of the LSTM NN Model

The computations inside one LSTM cell can be described by the following formulas (Ihianle et al., 2020):

(B1)

(B2)

(B3)

(B4)

(B5)

(B6)

where xt is input data at time t. The input gate it determines which values from the updated cell states (candidate values) should be added to the cell state. It also takes into account the current input xt and the previous output ht−1, and is passed through a sigmoid activation function. represents the candidate values that are added to the cell state at time t. The forget gate activation vector ft at time step t, which determines how much of the previous cell state should be retained. The cell state Ct at time t is updated based on the forget gate, input gate, and candidate values. The output gate ot at time t determines how much of the cell state should be output. The output vector ht at time t is calculated based on the cell state and the output gate values. ht−1 is the output vector at the previous time step t − 1. Wf, Wi, Wc, Wo are the weight matrices for the input vector xt. Uf, Ui, Uc, Uo are the weight matrices for the output vector ht−1. bf, bt, bc, bo are the bias vectors. The symbol ⊙ denotes a pointwise multiplication. The sigmoid function σ is used as the activation function for the gate vectors, and the hyperbolic tangent function tanh is used for the candidate values and the output vector.

Appendix C

Evaluation metrics

To evaluate the model performance, we used the following equations:

(C1)

(C2)

(C3)

(C4)

(C5)

(C6)

where y is the true value, is the predicted value, and δ is a threshold in the Huber loss function that controls the trade-off between the mean squared error (MSE) and the mean absolute error (MAE). In this paper, it was set to 0.1, which was selected based on several experiments.

MSE is the mean squared error, which measures the difference between predicted and actual values by calculating the average of squared differences. It provides a measure of the average squared magnitude of the errors in your forecasts, which can be useful in penalizing larger errors more heavily than smaller errors.

MAPE is the mean absolute percentage error, which measures the difference between predicted and actual values by calculating the average of absolute differences. It provides a measure of the average magnitude of the errors, allowing to evaluate the overall accuracy of your forecasts.

RMSE is the root mean squared error, which measures the difference between predicted and actual values by taking the square root of the average of squared differences. It provides a measure of the accuracy of the forecasts in the same units as the original data, allowing to evaluate the magnitude of errors in the same scale as the data.

MAPE is the mean absolute percentage error, which measures the accuracy of a forecast by calculating the average of absolute percentage errors. It provides a measure of the accuracy of the forecasts in percentage terms, allowing to evaluate the magnitude of errors relative to the actual values. MSE, MAE, RMSE, and MAPE are often used in regression analysis to assess the accuracy of the model’s predictions.

Finally, R is the Pearson correlation coefficient, which measures the strength and direction of the relationship between two continuous variables, and can provide an indication of the extent to which changes in one variable may be related to changes in the other.

Appendix D

Model Configuration

The configurations for the ML models shown in Figure 5 and their performance on the validation set and the test set for the SEP integral flux >10 MeV are presented in Table D1. The batch size was set to be 64 and the number of training epochs was set to be 100. The EarlyStopping callback function, with a patience of 10, is used to help prevent overfitting during the training process by stopping training when the monitored metric has stopped improving for a certain number of epochs. The patience parameter controls how many epochs the training will continue without improvement before it is stopped. This is useful because if the validation loss stops getting better, the model has probably overfitted the training data and is not generalizing effectively to new data. By stopping the training early, we can avoid wasting time and resources on further training that is unlikely to improve the model’s performance.

Table D1

Configuration of the ML model. (1) refers to the error value for 1-day forecasting. Same for (2) refers to 2-day forecasting, and (3) for 3-day forecasting. *In the 1D-CNN layer, 32 filters, a kernel size of 5, and strides of 1 were used.

We used the ModelCheckpoint callback function to save the best weights of the model during training so that they can be reused later. The LearningRateScheduler callback function allows to dynamically adjust the learning rate of the model during training using a function passed to it that will be called at the beginning of each epoch, and it should return the desired learning rate for that epoch. It can be useful when training deep neural networks, as it allows for a higher learning rate in the early stages of training when the model is still far from convergence, and a lower learning rate as the model approaches convergence, which can help it to converge more accurately. The downside might be the longer training time.

All the calculations and model runs were implemented under the framework of TensorFlow 2.3.0 (Singh & Manure, 2020) in Python 3.6.13. The models were executed on Ubuntu 20.04.1 LTS OS with 4 × GPUs (NVIDIA GeForce RTX 2080 Ti, 11019 MiB, 300 MHz). According to the Keras API guide (Ketkar, 2017), the requirements to use the cuDNN implementation are the activation function must be set to tanh and the recurrent activation must be set to sigmoid. We also set the seed number to 7 across all the model runs to maintain reproducibility.

Stateful RNNs can be difficult to work with when using callbacks in Keras because their hidden state must be manually managed across mini-batch updates. When training a stateful RNN in Keras, the hidden state is carried over from the previous epoch and can cause problems with certain callbacks, such as EarlyStopping or ModelCheckpoint. To work around this issue, one can use stateless RNNs or manually reset the hidden state at the end of each epoch, but this can be complex and prone to errors.

Appendix E

Skill Scores

Skill scores and ratios are commonly used in evaluating the performance of classification models, particularly in binary classification tasks. They provide insights into the model’s ability to correctly predict positive and negative instances. Here is a brief description of each skill score and ratio, along with their formulas:

  • True positive (TP): The number of data points or intervals correctly identified as positive by the model. It represents instances where both the model and the ground truth indicate the presence of an event.

  • True negative (TN): The number of intervals correctly identified as negative by the model. It represents instances where both the model and the ground truth indicate the absence of an event.

  • False positive (FP): The number of intervals incorrectly identified as positive by the model. It occurs when the model predicts an event, but the ground truth indicates its absence.

  • False negative (FN): The number of intervals incorrectly identified as negative by the model. It occurs when the model fails to detect an event that the ground truth indicates its presence.

  • Accuracy: Represents the proportion of correct predictions out of total predictions.

(E1)

  • Precision: Represents the proportion of positive predictions that are actually positive.

(E2)

  • Probability of detection (POD) or recall: Represents the model’s ability to correctly identify positive instances.

(E3)

  • Probability of false detection (POFD): Measures the model’s tendency to falsely predict positive instances when the ground truth indicates their absence.

(E4)

  • False alarm rate (FAR): Indicates the ratio of false positive predictions to the total number of positive instances.

(E5)

  • Critical success index (CSI): Measures the model’s ability to correctly predict both positive and negative instances.

(E6)

  • True skill statistic (TSS): Takes into account both the model’s ability to detect positive instances and its ability to avoid false alarms.

(E7)

  • Heidke skill score (HSS): Evaluates the model’s performance by comparing it with random chance. It takes into account the agreement between the model’s predictions and the observed data, considering both true positive and true negative predictions.

(E8)

where

Acknowledgments

We thank the referees for their feedback. We acknowledge using data from the GSFC/SPDF OMNIWeb interface. This work is supported by the Modeling and ObServational Integrated Investigations of Coronal Solar Eruptions (MOSAIICS) project, funded under contract KP-06-DV-8/18.12.2019 to the Institute of Astronomy and National Astronomical Observatory (NAO), the Bulgarian Academy of Sciences (BAS), under the National Scientific Program “VIHREN” in Bulgaria. The editor thanks Ian Richardson and two other anonymous reviewers for their assistance in evaluating this paper.


References

Cite this article as: Nedal M, Kozarev K, Arsenov N & Zhang P 2023. Forecasting solar energetic proton integral fluxes with bi-directional long short-term memory neural networks. J. Space Weather Space Clim. 13, 26. https://doi.org/10.1051/swsc/2023026.

All Tables

Table 1

Summary of the performance results of the models for the validation and test sets.

Table 2

Confusion matrix for the energy channel ≥10 MeV predictions in the test set.

Table 3

Comparing the skill scores with previous models. The dashed entries mean the data is unavailable (see Whitman et al., 2022, for more details).

Table D1

Configuration of the ML model. (1) refers to the error value for 1-day forecasting. Same for (2) refers to 2-day forecasting, and (3) for 3-day forecasting. *In the 1D-CNN layer, 32 filters, a kernel size of 5, and strides of 1 were used.

All Figures

thumbnail Figure 1

Data splitting for all input features, showing the training, validation, and testing sets. Daily data from 1976-12-25 00:00 to 2019-07-30 00:00. The gray shading labels the solar cycles from SC21 to SC24.

In the text
thumbnail Figure 2

Correlation matrices show the correlation between the features in the training, validation, and test sets.

In the text
thumbnail Figure 3

Architecture of a single BiLSTM layer. The blue circles at the bottom labeled by (x0, x1, x2, …, xi) are the input data values at multiple time steps. The purple circles, on the other hand, are the output data values at multiple time steps labeled by (y0, y1, y2, …, yi). The dark green and light green boxes are the activation units of the forward layer and the backward layer, respectively. The orange and yellow circles are the hidden states at the forward layer and the backward layer, respectively. Both the forward and backward layers compose a single hidden BiLSTM layer. The figure is adapted from Olah (2015).

In the text
thumbnail Figure 4

Illustration of the sliding window technique for a sample of 10 timesteps, where each number denotes a distinct time step. As an example here, the input horizon (blue color) length is 4 timesteps and the output horizon length is three timesteps. The input window slides one time step at a time across the entire data sequence to generate four distinct input and forecast horizon pairs. The purple, orange, and green colors of the output horizon represent 1-day, 2-day, and 3-day ahead forecasting, respectively. The timesteps of 1-day ahead forecasting across the data sequences are then concatenated into a single timeseries list that is called 1-day ahead prediction. The same for 2-day and 3-day ahead.

In the text
thumbnail Figure 5

Benchmarking of 10 models, shows the Huber loss for the validation and test sets.

In the text
thumbnail Figure 6

Left Panel – The Huber loss vs. the number of training epochs for the BiLSTM model for the validation and test sets, for the 3 energy channels. Middle Panel – The mean absolute error (MAE); the model’s metric vs. the number of training epochs. Right Panel – Shows how the learning rate of the Adam optimizer changes over the number of epochs.

In the text
thumbnail Figure 7

Correlation between the model predictions and observations for 1-day, 2-day, and 3-day ahead for >10 MeV (top panel), >30 MeV (middle panel), and >60 MeV (bottom panel). The panels in the left column represent all the points of the validation set and those in the right column represent all the observation points with daily mean flux ≥10 pfu.

In the text
thumbnail Figure 8

Same as Figure 7 but for the test set.

In the text
thumbnail Figure 9

Comparison between the model outputs and observations of the test set for the 3 energy channels. In addition to the rolling-mean window correlation for 1-day ahead predictions.

In the text
thumbnail Figure 10

The model’s forecasts for the out-of-sample testing set for the >10 MeV channel are shown at forecast horizons of 1 day, 2 days, and 3 days ahead, using samples of data from December in selected years mentioned in the top-left side of the plots.

In the text
thumbnail Figure 11

The model’s forecasts for the out-of-sample testing set for the >30 MeV channel are shown at forecast horizons of 1 day, 2 days, and 3 days ahead, using samples of data from December in selected years mentioned in the top-left side of the plots.

In the text
thumbnail Figure 12

The model’s forecasts for the out-of-sample testing set for the >60 MeV channel are shown at forecast horizons of 1 day, 2 days, and 3 days ahead, using samples of data from December in selected years mentioned in the top-left side of the plots.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.