Open Access
Research Article
Issue
J. Space Weather Space Clim.
Volume 12, 2022
Article Number 24
Number of page(s) 24
DOI https://doi.org/10.1051/swsc/2022019
Published online 28 June 2022

© A. Papaioannou et al., Published by EDP Sciences 2022

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

Solar energetic particle (SEP) events constitute a significant component of the near Earth radiation environment and consist of protons, electrons, and heavier ions (Vainio et al., 2009). Such events originate from particle acceleration in solar flares (SFs) and/or shocks associated with coronal mass ejections (CMEs) (e.g., Reames, 2015; Vlahos et al., 2019). Once energetic particles are accelerated and injected into open magnetic field lines, they are consequently routed through the interplanetary magnetic field (IMF) (Belov et al., 2005; Cane & Lario, 2006). Subsequently, when an observer (i.e., spacecraft) is magnetically connected to the source of the particles, enhancements by several orders of magnitude above the pre-event background are observed in-situ, with SEP events often observed in a broad range of solar longitudes (Rouillard et al., 2012; Lario et al., 2016). SEPs can last from a few hours to several days, and their relative composition varies by many orders of magnitude from event to event (Reames, 2013; Desai & Giacalone, 2016). The classical paradigm divides SEP events into two categories, see, e.g., Reames (1999): those particles that are accelerated at SFs (Aschwanden, 2002) are known as impulsive events, and other particle populations that are accelerated by near-Sun CME-driven shocks are termed as gradual (Reames, 1999, 2002; Kahler, 2001; Cane & Lario, 2006). However, this “two-class” picture does not match the diversity and wealth of the observed SEP event properties, which indicates a more complex nature (Cane et al., 2010; Papaioannou et al., 2016; Vlahos et al., 2019).

SEP events have a direct space weather impact on electronics and humans (Baker, 2004). For example, the survivability of a spacecraft is directly affected by the total energy deposited by the passage of energetic particles. This may result in the degradation and ultimate failure of its electronic components due to ionization or displacement damage mechanisms (Daly et al., 1996; Feynman & Gabriel, 2000). Effects on electronics also include single event effects (SEEs) (Pellish et al., 2010); these appear when particles deposit sufficient energy or charge in a sensitive region of a component, with non-permanent (soft) errors such as bit flips, as well as, permanent (hard) errors such as latchups (i.e., SELs) and burnouts (SEBs) taking place (Sexton, 2003; Harboe-Sørensen, 2013). Furthermore, SEP events are a major threat to human spaceflight outside the protective shield of the Earth’s magnetosphere (Bizzarri et al., 2017; Townsend, 2021) and pose a severe danger for aircrews and passengers on polar flights (Mishev et al., 2015; Tobiska et al., 2015; Miroshnichenko, 2018). The ionizing particle radiation can lead to damage to human cells and DNA alternations (Azzam et al., 2012). There are two basic parameters that have a decisive role in the radiation effects in humans: the strength of the exposure and the specific organ/tissue that was encountered. For example, short-term exposure to high doses of radiation depositing energy in the eye can lead to acute radiation effects such as cataracts. Long-term exposure to low doses of radiation deposing energy in bone marrow may progressively lead to leukemia or other types of cancer. The former effects are categorized as deterministic while the latter as stochastic (Cucinotta et al., 2003). The high-energy tail of the SEP spectrum, in which ions are accelerated to relativistic energies, is dominant in spacecraft orbits well within the magnetosphere. In addition, high-energy particles, upon interaction with matter, e.g., the shielding material of a spacecraft, produce secondaries that can enhance the radiation effect of SEP events and are of major concern in heavily shielded environments such as human spaceflight (Raukunen et al., 2018). Nonetheless, these high-energy particles can also reach the Earth’s atmosphere and generate secondaries through nuclear reactions. Consequently, these secondaries are recorded as a significant sudden increase at ground level, as detected by e.g., neutron monitors (NMs). Such events are termed ground-level enhancements (GLEs) (Butikofer et al., 2009; Asvestari et al., 2017; Mishev et al., 2018). This results in enhanced ionization together with modifications of the local chemistry of the high polar atmosphere of the Earth (Usoskin et al., 2011; Mironova & Usoskin, 2014).

SEPs, along with trapped radiation in planetary environments, are responsible for cumulative (dose) effects on spacecraft electronics and materials under low and moderate amounts of shielding. These are modeled statistically to produce specification (climatological) models to predict degradation over the duration of a mission. Such models make use of fits to SEP flux data in the form of a lognormal, truncated power law, and exponential cut-off power-law distributions to derive fluxes as a function of mission duration and confidence (Jiggens et al., 2018; Raukunen et al., 2018 and references therein). However, these models cannot make short-term forecasts of the SEP environment to provide warnings for operational missions, human spaceflight, launch operators, and aircraft operators.

A small fraction of all solar flares and CMEs lead to SEP events, and as a result, the prognosis of SEPs is not a trivial task since this is a highly imbalanced problem (see the relevant discussion in Lavasa et al., 2021). The scientific questions that one needs to address in such studies (see e.g., Anastasiadis et al., 2019) are summarized as follows: If we know the characteristics of the parent solar events, could the probability of SEP occurrence be reliably inferred, and how do the characteristics of SEP events (e.g., peak flux) map to the characteristics of their parent solar events? Both questions are being largely addressed through the implementation of databases and the establishment of empirical and/or semi-empirical statistical relations (see e.g., Gopalswamy et al., 2003, 2004; Garcia, 2004a, 2004b; Belov et al., 2005; Laurenza et al., 2009, 2018; Núñez, 2011; Trottet et al., 2015; Dierckxsens et al., 2015; Anastasiadis et al., 2017; Kahler & Ling, 2018; Papaioannou et al., 2018b; Richardson et al., 2018). The underlying idea is to identify a proper proxy (or combinations of proxies) that can be used for the unfolding of patterns and relationships among the parameters of SEP events and their parent solar events, using observational evidence at hand.In turn, such empirical relations point to the underlying physical processes of the SEP generation (Balch, 2008). For example, Laurenza et al. (2009) provide a prognosis of the SEP occurrence based on solar flare location, size (i.e., soft X-ray (SXR) fluence), and evidence of particle escape (i.e., radio fluence at ~ 1 MHz); Papaioannou et al. (2018b) provides short-term forecasts of the SEP occurrence and the corresponding peak flux utilizing CME characteristics (e.g., width and speed); Kahler & Ling (2018) has used the SXR peak flux ratio, following Garcia (2004a), to establish the probability of SEP occurrence, and Richardson et al. (2018) used the CME speed and the direction relative to the observer’s site (i.e., magnetic connection) to predict the peak intensity of protons. In addition, the work from Posner (2007) has proven the concept of short-term forecasting of the appearance and intensity of solar ion events by means of relativistic electrons, making use of the higher speed of these electrons propagating from the Sun to 1 AU. Hence, it appears that such empirical or semi-empirical relations can be used for forecasting solar radiation storms. Given the complexity and the incomplete knowledge of the underlying physical mechanisms at work, recent studies attempt to make use of higher-order statistical relations (see e.g., Papaioannou et al., 2018a) and machine learning approaches (see Lavasa et al., 2021) in order to infer the probability of SEP occurrence. Such studies make use of the complete parameter space at hand, utilizing both solar flare and CME characteristics, and have shown promising results. In this work, the Probabilistic Solar Particle Event Forecasting (PROSPER) Model is presented. This model is integrated into the advanced solar particle event casting system (ASPECS) operational tool that provides predictions of the probability of occurrence of SEP events, the expected peak flux, and the resulting SEP time profile (http://phobos-srv.space.noa.gr/). PROSPER applies a novel, data-driven methodology to predict SEP events probabilistically for a set of integral energies, namely E > 10, > 30, and > 100 MeV.1 As it is going to be presented in detail, PROSPER takes advantage of the Bayes theorem (Bayes & Price, 1763), taking into account all observational evidence at hand without any bias. Bayesian approaches have been increasingly applied to the field of solar physics (Arregui, 2022) and solar flare forecasting, in particular (see e.g., Wheatland, 2004, 2005, and reference therein). The model comes with three modes of operation depending on the available inputs : (a) CME characteristics (width, speed); (b) SF characteristics (longitude, magnitude), and (c) combinations of both CME and SF characteristics. The methodology is outlined and discussed in Section 2. The application of the methodology to all modes of operation is detailed in Section 3. Validation results for a set of case studies are presented in Section 4. Finally, the results are summarised and discussed in Section 5.

2 Analysis

2.1 Data

For the development of the PROSPER model, a catalog of SEP events that includes 314 SEP events from 1984 to 2013 was utilized (e.g., Papaioannou et al., 2016). This SEP event catalog is based on geostationary operational environmental satellite (GOES)/energetic particle sensor (EPS) data (for details on the data set, see Sandberg et al., 2014) and it includes key information on the proton peak flux and the total fluence of the identified SEP events in the three integral energy channels (namely, E > 10; >30; >100 MeV).2 It further includes the associated solar sources of the SEP events in terms of solar flares and CME characteristics. In particular, it includes solar SXR flux measurements provided by GOES, recorded in the same period, including 35,306 C, M and X class flare events (ftp://ftp.ngdc.noaa.gov/STP/space-weather/solar-data/solar-features/solar-flares/x-rays/goes/). From the initial sample of SXR flares, 14,818 events for which the location was not available were excluded, leading to a sample of 20,429 C, M, and X class flares. At the same time, the CME identifications (i.e., plane-of-sky speeds and angular widths – AW) are made by the large angle and spectroscopic coronagraph (LASCO) (Brueckner et al., 1995) onboard the solar and heliospheric observatory (SoHO) in the period from 1997 to 2013. The initial CME sample consists of 22,143 events, which was reduced to 3693 events when association criteria between CMEs and solar flares were applied (see details in Papaioannou et al., 2016).3 The CME identifications utilized in this study are included in the coordinated data analysis web (CDAW) online CME catalog (https://cdaw.gsfc.nasa.gov/CMElist/; Gopalswamy et al., 2009).

2.2 Mathematical formulation of PROSPER

In this section, the general methodological approach for the implementation of PROSPER is presented. PROSPER provides both the probability of SEP occurrence P(SEP) (Sect. 2.2.1) and the expected peak proton flux (Sect. 2.2.2). The first part starts with the construction of cumulative distribution functions (CDFs) by the data, continues with the implementation of probability density functions (PDFs), and concludes with the application of the Bayes formula, from which the P(SEP) is directly provided. The second part presents the establishment of the peak proton flux with the construction of the relevant CDFs.

2.2.1 Probability of SEP occurrence

This section presents the procedure for the establishment of P(SEP) within the PROSPER mode of operation (a), i.e., utilizing CME characteristics alone. PROSPER is applied to E > 10, >30, (>60), and >100 MeV and for three modes of operation i.e., (a), (b), and (c), in Table 1, which provides an overview of the modes of operation for PROSPER, the inputs used, as well as, the bins applied based on the characteristics of their parent solar events. In particular, CMEs are grouped with respect to their angular width (see details in Papaioannou et al., 2018b), while solar flares by their longitude. Solar events originating in the west of the Sun, as observed by an observer on Earth, are more likely to produce large SEPs detectable on Earth (see e.g., Swalwell et al., 2017), especially those that lie within W20°–W80° (see Cliver et al., 2012, and references therein). From our sample, roughly half of the SEP events (~46%) are associated with a solar flare originating west of W20°, hence our choice of positional requirement in the algorithms. Such events are termed as: “well connected” whereas those east of W20° as “poorly connected”.

Table 1

Details on PROSPER’s modes of operation, inputs and initial binning.

Column 1 denotes the mode of operation, column 2 declares the inputs used per mode, column 3 details the initial bins applied to the data based on the inputs, and column 4 presents the resulting number of functions per bin. The numbers in the brackets of column 4 denote the number of resulting functions per mode and integral energy. These functions are the P(SEP) ones, which are going to be detailed in the following steps for each of the selections. Hence for each integral energy (e.g., E > 10 MeV) there are 3 f(V CME), 2 f(F SXR), and 6 f(V CME, F SXR) functions in accordance with column 3. PROSPER is applied independently to each of the integral energies considered.

Implementation of cumulative distribution functions

First, we bin the data in accordance with Table 1. The empirical cumulative distribution function (CDF) for each sub-sample was constructed by sorting the events in the ascending order of the continuous parameter (CME speed and/or solar flare magnitude; see also Table 1) and using the standard fractional ordinates as probabilities. As parent solar events (i.e., parameterized by flares and/or CMEs) can be either associated with SEP events or not, it is possible to construct two CDFs: one for each sample, i.e., one for all parent solar events (i.e., parameter, Λ) of the specific binning: P(Λ < λ)4 and one for all parent solar events that were associated with a SEP event in that bin, P(Λ < λ | SEP).

Next, the resulting (filtered) data were fit with a log-normal distribution (see e.g., Raukunen et al., 2018), which was found to be the most robust of the flux distributions applied to SEP fluxes for specification models in this work. This is given by the following equation:

(1)

where F(λ) is the probability of Λ being lower than λ, erf(x) is the error function, and μ and σ are the mean and standard deviation of log10(λ), respectively.

Implementation of the probability density functions

These CDFs were consequently used in order to construct probabilities for which the CME speed/flare magnitude, λ lies within a certain range defined by an upper and a lower limit, i.e., ∈[λ 1, λ 2]. The resulting independent probability of observing α, i.e., the probability of Λ to fall within a certain range of values is given by the subtraction of two CDFs constructed by all data points in a sample one for the lower limit, i.e., Λ < λ 1 and one for the upper one, i.e., Λ < λ 2 leading to P(Λ < λ 2) and P(Λ < λ 1). The subtraction of which results in:

(2)

The independent probability of observing β, i.e., the probability of SEP events in a sample, is based on the number of SEP associated events (N SEP) and the total number of events (N total) in the sample and is defined as:

(3)

The conditional probability of observing α under the condition of observing β, P(α | β), i.e., the probability of the Λ to fall within a certain range of λ, under the condition that these values refer to SEP events. This is also given as the subtraction of two CDFs. In particular, the first CDF refers to the lower limit, i.e., Λ < λ 1 | SEP, and the second to the upper one, i.e., Λ < λ 2 | SEP leading to P(Λ < λ 1 | SEP) and P(Λ < λ 2 | SEP). As a result the conditional probability is defined as:

(4)

Using the Bayes theorem (Bayes & Price, 1763; Joyce, 1999) one may define the inverse conditional probability:

(5)

which is a PDF and provides the probability of SEP occurrence when Λ ∈ [λ 1, λ 2]. The Bayes formula dictates that:

(6)

Merging equations (2)(6), the following formula is obtained:

(7)

Hence, equation (7) shows the probability to have an SEP event under the condition that Λ ∈ [λ 1, λ 2]. For example, in the case of CMEs, substituting Λ with V CME, λ 1 with V 1 and λ 2 with V 2, equation (7) becomes:

(8)

Moreover, setting V 1 = V and V 2 = V + dV and dividing both nominator and denominator by dV while letting dV → 0 results to:

(9)

Both equations (8) and (9) provide the probability of occurrence of an SEP event given a CME of speed, V.

In equation (9), f(V) = P′(V CME < V) is the probability density function (PDF) of the CME speeds for the binned sub-sample of choice (see Table 1) and f SEP(V) is the PDF for those CMEs of the bin that are associated with SEP events. In order to construct the PDFs (i.e., f(V) & f SEP(V)) from the relevant CDFs, there are two ways: (a) apply numerical differentiation (labeled as Method 1) and (b) fit the data points with an empirical CDF and directly take the derivatives of the relevant distributions, as a function of the CME speed (V CME, [km/s]) (labeled as Method 2). As it can be seen in Figure 1, for the case of Halo CMEs and E > 10 MeV SEP events, the output is equivalent. Therefore, from the empirical CDFs, which are based on data, and the applied log-normal fits, one can directly obtain the relevant PDFs, and from the application of the Bayes formula (Eq. (9)) obtain an analytical expression that answers to the question: What is the probability that an SEP event [P(SEP)] will occur if one knows the CME speed (V CME ) (and width) of the driving CME?

thumbnail Fig. 1

An illustration of the two methods (Method 1; (blue color) and Method 2; (red color)) as a superposition investigated in this work. Both fits in the plots correspond to the Probability Distribution Functions (PDFs) for the case of Halo CMEs, with respect to the CME speed V CME. See text for details

Equation (9) also stands for the case of solar flares. In this case, the application of the Bayes formula based on solar flare longitude and flux (F) results in:

(10)

At this point, it is noted that equation (6) acquires very large values and tends to infinity as P(α) → 0. For this reason, using some elementary mathematical properties of sets and probabilities, equation (6) can be written in a form that is more stable and appropriate for numerical calculations, as:

(11)

with β c being the complementary event of β (Maritz & Lwin, 2018). Although the denominators of equations (6) and (11) are equivalent, the former →0 faster than the latter one. As a result, equation (11) has an optimal numerical behavior and allows one to obtain outputs at a wider range compared to equation (6) before the denominator of equation (6) →0. Therefore, the formulas used in PROSPER calculations are the more complex expressions of equations (7)(10) based on the substitution of equations (2)(5) in equation (11).

Additionally, in the case of a solar flare and an associated CME, the Bayes formula used in PROSPER gets the general form:

(12)

where i refers to CME AW, i.e., Halo, Partial Halo, Non-Halo and j to the solar flare longitude (“well” and/or “poorly” connected); see details per selection in Table 1). P(β i,j ) and are calculated directly by the measurements.

2.2.2 Peak flux estimation

PROSPER provides a peak flux estimation for a given (user-defined) confidence based on the P(SEP) described here above. First, the data are filtered based on the P(SEP) that was derived in Section 2.2.1. In particular, once P(SEP) is obtained, peak flux data are filtered based on the integral energy that the SEP event is expected to reach. That said, if i.e., a SEP event is expected to reach E > 100 MeV, only the events from the list with significant peak fluxes reaching E > 100 MeV are employed across all integral energies. The same stands for filtering the data samples based on other integral energies based on the obtained P(SEP) (i.e., if an event is expected to reach E > 30 MeV). In any case, the presence of a high energy population is an important filter for lower energy fluxes since it ensures spectral coherence. Next, additional binning is applied using the available characteristics of the parent solar events (i.e., SF magnitude & location, CME speed & width). Once the data samples are retrieved, we utilize the tabulated peak proton flux (PPF) for the SEP events in the list employed in Papaioannou et al., (2016) in order to derive the relevant distributions.

The filtered peak fluxes were fit with the exponential cut-off power law that was found to be the most robust of the flux distributions applied for specification models (consistent with findings from both Jiggens et al., 2018; Raukunen et al., 2018). This is given by the following equation:

(13)

with γ being the power-law exponent, x low a parameter related to the lower limit of the F P distribution, and x lim a parameter related to the upper limit of the F P distribution.

Equation (13) answers the question: What is the expected probability that the peak flux will exceed a certain threshold (PPF 0 ) for a given integral energy, provided that it is certain an SEP event will occur? From equation (13), one can find the expected peak flux that corresponds to a specific probability threshold, solving equation (13) for PPF0 for a given value of P(F P  ≥ PPF0 | SEP) = P thres. Following this, the P(SEP | Λ = λ 0) is used to modulate the prediction of the expected peak flux. In essence, a weighted average is employed, driven by the obtained P(SEP | Λ = λ 0). There are two components in this averaging: the first one is the PPF0 for a specific user-defined threshold multiplied by P(SEP | Λ = λ 0), and the second one is the background value of the specific integral energy multiplied by the 1 − P(SEP | Λ = λ 0). The averaging leads to PPFthres, and the obtained P(SEP | Λ = λ 0) is used as a weight. To this end PROSPER provides F Pthres obtained from equation (14):

(14)

In the limiting cases of the binary extremes of P(SEP | Λ = λ 0), the following expression is obtained:

(15)

The background flux was calculated on the basis of the recorded intensity of each integral energy of interest (i.e., E > 10−; >30−; and >100 MeV)5 and corresponds to the mean value obtained in September 2009 (e.g., during the solar minimum), which is: 0.23, 0.122, 0.089 and 0.050 pfu, respectively (see Fig. 2). Evidently, depending on the CL, thres of choice (by default in the ASPECS tool, PROSPER provides thres = 50% and 90%, which were defined by the user community), and thus the model provides PPF50 and PPF90.

thumbnail Fig. 2

The distributions of the GOES/EPS data per integral energy of interest for September 2009. The relevant percentiles of 68–95% & 99.7% of the distributions are depicted as vertical dashed lines. The mean values of these distributions (presented as solid vertical lines) are used as the background level per channel in equation (14).

3 Results

3.1 Coronal mass ejections

In the case of CMEs, PROSPER’s concept was applied across three CME AW bins (see Table 1), for all integral energies of our database (E > 10; >30; (>60) and >100 MeV) and for both the distribution of all CMEs speeds (V CME) in each sub-sample, as well as, the distribution of the speeds of CMEs related to SEPs. Figure 3 depicts the obtained empirical CDFs from the data (i.e., P(V CME < V)). From top to bottom, on the left-hand side, the column refers to all Halo, all Partial Halo, and all Non-Halo CMEs of our sample. Additionally, similar plots for P(V CME < V | SEP) for integral energy of E > 10 MeV are provided on the right-hand side of the plot, for each case, following the same labeling. Each sub-plot of Figure 3 presents the data points (in black color) for each distribution. Additionally, the log-normal fit (from Eq. (1)) to these points is presented as a red-line, with the fraction of the data (in %) represented by the fit. Finally, the mean absolute error (MAE)6 was calculated in each case and is imprinted on every plot. Although the related results are not shown in Figure 3, it is noteworthy that when shifting to higher energies, i.e., from E > 10 MeV to E > 100 MeV, a reduction in the number of SEP events that reach higher energies is apparent. In turn, fits for these energies rely on fewer points.

thumbnail Fig. 3

The observational Cumulative Distribution Functions (CDFs) constructed by the database (i.e., all black points correspond to actual data). The red line depicts the log-normal fit to the data per case. Each plot further includes the fraction of the data explained by the fit (in [%]) (i.e., a 100% fraction means that all data points have been used in the corresponding fit), the mean absolute error (MAE), and the fit that was used (i.e., log-normal). From top to bottom, the column on the left-hand side corresponds to the CDFs per CME width (Halo, Partial Halo, Non-Halo) for all CMEs in each sample. The column on the right-hand side corresponds to the CDFs in case the CME was associated with a SEP at an energy of E > 10 MeV, at each width bin. Similar fits (and plots) have been constructed for all other SEP integral energies of interest but are not shown.

A combined representation of these CDFs, but for all integral energies of interest (e.g., E > 10; >30; (>60) and >100 MeV) is given in Figure 4.7 Each panel provides the fitted CDFs for the P(V CME < V) (black line) and P(V CME < V | SEP) distributions – the latter for each integral energy, color-coded as: E > 10 MeV – red line; E > 30 MeV – blue line; E > 60 MeV – green line and E > 100 MeV – orange line. From top to bottom, panels refer to Halo, Partial Halo, and Non-Halo CMEs, respectively. The PDFs from the derivatives of the distributions f(V) and f SEP(V) are consequently presented in Figure 5, following the same reasoning (and labeling) as in Figure 4. It should be noted that the X-axis (i.e., V CME) in all panels extend to a simulated value of 10,000 km/s. However, the actual data point to an upper limit V CME of ~3000 km/s (see Fig. 3). That said, the derived Bayes P(SEP) is obtained up to that V CME. However, we present the whole evolution of the fit, and we overlay a gray border hatched rectangle area after that limit.

thumbnail Fig. 4

The Cumulative Distribution Functions (CDFs) for the case of Halo CMEs (top panel); Partial Halo CMEs (middle panel), and Non-Halo CMEs (bottom panel). The black color represents all CMEs in the respective sample, whereas CMEs associated with SEP events for different integral energies are color-coded. The gray border hatched rectangle area provides the limit for the V CME.

thumbnail Fig. 5

The Probability Distribution Functions (PDFs) for the case of Halo (top panel), Partial Halo (middle panel), and Non-Halo (bottom panel) CMEs, as derived by the CDFs, applying Method 2. The black color represents all CMEs in the sample, whereas CMEs associated with SEP events for different integral energies are color-coded. The gray border hatched rectangle area provides the limit for the V CME.

The final step in this formulation is to employ the Bayes formula (Eq. (9)) and retrieve the fits that reply to the question: what is the probability of SEP occurrence from a CME with known width (AW) and speed (V CME )? These fits are presented in Figure 6. Each panel corresponds to one integral energy of interest, and each line within every panel refers to Halo (blue line); Partial Halo (red line), and Non-Halo (green line) CMEs. Hence, for each integral energy, a total of 3 f(V CME) functions are obtained (see Table 1). The abscissa of Figure 6 gives the V CME, and the ordinate provides the probability of SEP occurrence. Hence, for a CME with known AW, one may select the proper fit and from the known V CME directly obtain the expected probability, P(SEP). It can be noted that up to the limit of V CME ~ 3000 km/s for any given CME speed, the largest P(SEP) is obtained for Halo (blue line), followed by Partial Halo (red line) and Non-Halo (green line). Moreover, both the red and the green curves would have similar behavior as the blue curve if the X-axis had been expanded to >10,000 km/s. In addition, when crossing the V CME ~ 3000 km/s limit, it can be seen that Partial Halo (red line) and Non-Halo (green line) CMEs lead to a higher P(SEP) compared to Halo-CMEs (blue line). This is the output of the application of equation (9), and it points to the possibility of getting a higher P(SEP) for Partial and Non-Halo CMEs provided that these CMEs are faster than any CME ever observed. Nonetheless, applying an upper limit for V CME inferred by the actual sample results in higher probabilities for Halo CMEs up to that limit. That said, all observed CMEs with V CME > 3000 km/s are treated as having a V CME = 3000 km/s.

thumbnail Fig. 6

The curves for the estimation of the probability of SEP occurrence when both the speed (V CME, X-axis) and the width (Halo, Partial Halo, and Non-Halo) of a CME are known. Each panel corresponds to an integral energy (i.e., E > 10−; >30−; >60− and >100 MeV). Within each panel, three fits are presented, one per CME width – color-coded as: Halo (blue), Partial Halo (red), and Non-Halo (green) CMEs. The gray border hatched rectangle area provides the limit for the V CME. See text for details.

As noted in Section 2.2.2, based on user consultations, a focused probability range from 50–90% has been selected while taking into account, as a filter, the probability of deriving a SEP event at respective energy. If a SEP event is expected to reach e.g., E > 100 MeV, the data for all energies are binned based on this energy filtering, and consequently, the probabilities are derived based on CME speed and width. The obtained results for the case of CME inputs are depicted in Figure 7. There are three different filters based on the predicted integral energy that the particles will reach and different bins based on the CME characteristics. In particular, panels (a) and (b) of Figure 7 refer to SEP events that are expected to reach an integral energy of E > 10 MeV. The red (blue) lines depict the fast (slow) bins, and each of the panels refers to Halo (panel (a)) and Not Halo (panel (b)) CMEs. In addition, panels (c–f) of Figure 7 refer to SEP events that are expected to reach integral energy of E > 30 MeV. The blue (orange) lines in each of the panels present the E > 10 (E > 30) MeV fits, whereas each of the panels stands for a different selection of AW and CME speed – panel (c): Halo and slow; panel (d) Halo and fast; panel (e) not Halo and slow and panel (f) not Halo and fast CMEs. Finally, panel (g) of Figure 7 represents the fits for filtering at E > 100 MeV and each line corresponds to integral energy of interest color-coded as: red (E > 10 MeV), green (E > 30 MeV), and blue (E > 100 MeV). All bins and selections are detailed in Table 2.

thumbnail Fig. 7

The CDFs constructed by the database (i.e., all points correspond to actual data) for each integral energy of choice for the estimation of the peak flux (Peak flux [pfu], X-axis), when both the speed and the width of a CME are known. Lines depict the exponential cut-off fits to the data points per case (from Eq. (13)). Panels: (a) and (b) correspond to a filtering integral energy at E > 10 MeV; (c–f) to E > 30 MeV and (g) to E > 100 MeV). See text and Table 3 for details on the bins and the obtained parameters of the fit per case.

Table 2

Details on PROSPER’s peak flux fits for the CME bins, derived by equation (13). The filtering on the integral energy is first applied, and then there are two bins. One for the width of the CME (i.e., AW = 360°Halo or AW < 360° Not Halo) and one for the CME speed (i.e., V CME being < or ≥ 1250 km/s). These fits are presented in Figure 7.

3.2 Solar flares

A similar probabilistic approach was further applied for solar flare inputs replacing V CME with SXR peak flux and imposing two longitudinal source location ranges: well connected (longitude ≥ 20°), poorly connected (longitude < 20°) in place of CME angular width. Applying the same methodology, we obtained the following fits presented in Figure 8. These fits are similar to Figure 6 but for the SF case. Each panel corresponds to one integral energy of interest, and each line within every panel refers to well-connected (red line) and poorly connected (blue line) SFs. As a result, for each integral energy, a total of 2 f(F SXR) functions are obtained (see Table 1). The X-axis of sub-panels in Figure 8 gives the F SXR, and the Y-axis provides the probability of SEP occurrence. Hence, for a solar flare with known longitude, one may select the proper fit and, from the magnitude of the solar flare, directly obtain the expected probability, P(SEP). In this case, data point to an upper limit F SXR of X28, and thus the derived Bayes P(SEP) is obtained up to that limit. Again, the total evolution of the simulated fit is presented and we overlay a gray hatched area after that limit.

thumbnail Fig. 8

The curves for the estimation of the probability of SEP occurrence when both the SXR magnitude and the longitude of a solar flare are known. Each panel corresponds to integral energy (i.e., E > 10−; >30−; >60− and >100 MeV). Within each panel, two fits are presented, one per longitudinal bin – color-coded as: well connected (red) and poorly connected (blue) solar flares. See text for details. The gray border hatched rectangle area provides the limit for the F SXR. This is similar to Figure 6.

In the case of the peak flux, the focused range of 50 and 90% is maintained across all modes of operation. Figure 9 provides the outputs per filtered integral energy and bins based on solar flare characteristics. Specifically, Figure 9a refers to filtering of E > 10 MeV with each of the presented fits color-coded as: red (F SXR < M3.0), green (M3.0 ≤ F SXR < X1.0), and blue (F SXR ≥ X1.0). Moreover, in Figure 9, panels (b), (d), and (f) refers to filtering at E > 30 MeV. Red (green) fits present the integral energy of E > 10 (E > 30) MeV, while each of the three panels points to individual solar flare bins. Finally, panels (c), (e), and (g) of Figure 9 are obtained with filtering at E > 100 MeV. In this case, each of the lines represents integral energy: red (E > 10 MeV), green (E > 30 MeV), and blue (E > 100 MeV). Again, each of these three panels has to do with a selection based on solar flare magnitude. All of the fits depicted in Figure 9 are detailed in Table 3.

thumbnail Fig. 9

The CDFs constructed by the database for each integral energy of choice for the estimation of the peak flux (Peak flux [pfu], X-axis), when the SXR solar flare magnitude is known. Lines depict the exponential cut-off fits to the data points per case. Panel (a) corresponds to a filtering of the integral energy at E > 10 MeV. The left-hand column (panels (b), (d), (f)) corresponds to a filtering integral energy at E > 30 MeV, and the right-hand column (panels (c), (d), (g)) to a filtering integral energy at E > 100 MeV. Black color denotes the SXR flare flux bin in each panel. See text and Table 3 for details on the bins and the obtained parameters of the fit per case.

Table 3

Details on PROSPER’s peak flux fits for the SF bins, derived by equation (13). The filtering of the integral energy is first applied, and then there are three bins based on the magnitude of the solar flare. These fits are presented in Figure 9.

3.3 Solar flares & CMEs

For the case of a combination of a solar flare (with known magnitude and longitude) and a CME (with known AW and speed) being identified, equation (12) was employed but enforcing more conditions, depending on the obtained characteristics (e.g., flare longitudinal bin and magnitude, CME width and velocity). In this case, all combinations were considered (e.g., Halo, Partial Halo, and Non-Halo, together with well and poorly connected) for all respective energies. Figure 10 depicts the derived probability of SEP occurrence using both flare and CME inputs for the case of Halo and well-connected events for integral energy of E > 10 MeV. Each line corresponds to a specific solar flare class. In particular, C1.0 (black), M1.0 (blue), X1.0 (red) and X10.0 (magenta) lines are shown. Based in the above, the X-axis of Figure 10 is extended up to V CME = 3000 km/s. From this plot, one can derive the probability of getting a SEP event, in case the longitude of the parent solar flare falls within the bin (lon ≥ 20° – well connected) and the corresponding CME is a Halo one (AW = 360°). Then from the specific magnitude of the flare, the proper curve (fit) is selected, and the speed of the CME (X-axis on the plot) leads to the probability of SEP occurrence (Y-axis on the plot). Similar fits have been constructed for all combinations of: (lon. × AW × integral proton energy). As a result, equation (12) was applied in total in 18 cases (not shown).

thumbnail Fig. 10

The curves for the estimation of the probability of SEP occurrence when both a CME and a solar flare have taken place, and thus the SXR magnitude (X-axis) and the longitude of a solar flare, as well as, the CME speed and AW are known. The figure presents the output of PROSPER for the case of E > 10 MeV with Halo CMEs (AW = 360°) and well-connected flares (lon ≥ 20°). Each fit corresponds to a solar flare class, color-coded as: C1.0 (black line), M1.0 (blue line), X1.0 (red line) and X10 (magenta line). The X-axis shows the speed of the CME (in km s−1).

Figure 11 shows the obtained fits per filtered energy and bins based on all obtained inputs. Similar to the previous cases, the fits are presented in detail in Table 4. As a rule, the dashed lines in the case of E > 10 MeV (Fig. 11a) and E > 100 MeV (Figs. 11b, 11d, 11f) present the bins with the largest CME speed per case for a given solar flare flux bin, while the continuous lines depict the cases with the slower CMEs for the same solar flare bins (see details of the bins in each panel of Fig. 11 and Table 4). Panel (a) of Figure 11 shows E > 10 MeV fits for six different selections of solar flares & CMEs (see detail in Table 4). Each pair of lines (i.e., solid and dashed) colored with the same color refer to a similar solar flare bin. In particular, these are: F SXR < M3.0 (red); M3.0 ≤ F SXR < X1.0 (blue) and F SXR ≥ X1.0 (magenta) (see panel (a)). Moreover, panels (b), (d), and (f) of Figure 11 represents the case of applying filtering at E > 100 MeV. In all these panels, each line refers to integral energy color-coded as: red (E > 10 MeV), blue (E > 30 MeV), and magenta (E > 100 MeV). However, there was no event associated with a solar flare of magnitude <X3.0 for any CME speed, reaching integral energy of E > 100 MeV. Therefore, the relevant panel (f) in Figure 11 shows only the fits for events associated with ≥X3.0 for each of the three CME bins (i.e., only dashed lines) (see Table 4). In addition, no reliable fit for SEP events associated with solar flares of magnitude <M6.5 and CME speed V CME < 1350 km/s, at E > 10 MeV, when a filter on reaching an integral energy of E > 100 MeV is applied, was found (see Fig. 11b and Table 4). Hence, there is no solid red line displayed in this panel. Finally, panels (c)–(e), (g)–(i) and (h)–(j) present the cases with filtering at E > 30 MeV. Each panel provides two fits color-coded per integral energy as: E > 10 MeV (red) and E > 30 MeV (blue). Each of the mentioned pairs of panels refers to the same solar flare bin (i.e., F SXR < M3.0; M3.0 ≤ F SXR < X1.0 and F SXR ≥ X1.0) while differing in the CME speed as detailed in Table 4. As it can be seen,in two cases where an event reaches E > 30 MeV there seems to be a cross-over between the obtained fit at E > 10 and E > 30 MeV (i.e., panels (c) and (g)). However, these crossings are below a threshold of 40%. Evidently, if a CL < 50% is chosen, then in this particular case (i.e., E > 30 MeV) a spectral incoherence arises. This means that in future work, an update of the selection in PROSPER for the solar flare & CME case that will lead to spectral coherence in all cases for lower CL should be pursued.

thumbnail Fig. 11

The CDFs constructed by the database (i.e., all points correspond to actual data) for each integral energy of choice. See text and Table 4 for details on the bins and the obtained parameters of the fit per case. Each panel further displays the selected bins and the displayed energies if more than one is included in a panel.

Table 4

Details on PROSPER’s peak flux fits for the combined solar flares & CME bins, derived by equation (13). The filtering on the integral energy is first applied and then bins on the solar flare magnitude and CME speed are applied. These fits are presented in Figure 11.

4 Validation

Table 5 provides all inputs of the NASA CCMC SEP scoreboard campaign events8 that have been investigated in this part of the validation of PROSPER presented in this work. The purpose of this challenge is to facilitate the collaboration of as many SEP modelers as possible and to provide a standardized validation procedure, with specific inputs being used by all parties. Such a challenge is based on curated events and greatly facilitates the comparison of outputs from different prediction methods. Nonetheless, larger samples of events are needed for a complete evaluation of these methods, including PROSPER. The inputs (i.e., solar flare and/or CME characteristics) have been utilized in order to achieve the PROSPER’s P(SEP) per the mode of operation. PROSPER peak flux outputs are compared to the derived peak fluxes by the SEP challenge at two integral energies, i.e., E > 10 and E > 100 MeV. These values are presented in the last two columns of Table 5. The SEP event on 06 January 2014 was not associated with a solar flare (see Table 5), although a back-sided origin of the driving CME and a (probably occulted) C2.1 solar flare9 have been reported (Thakur et al., 2014). Moreover, the sequence of SEP events at 04 & 06 September 2017 is considered as one event leading to a single peak flux for the E > 10 MeV. A SEP event started on 04 September 2017, with a significant enhancement recorded at E > 10 MeV. However, on 06 September 2017 the strongest solar flare of solar cycle 24 was marked (i.e., X9.3) (Jiggens et al., 2019), and a SEP event was clearly visible in higher energies (up to E > 100 MeV), while also characterized as a sub-GLE (Mishev et al., 2017). Nonetheless, the E > 10 MeV flux continued to be sustained after its initiation on 04 September 2017 and throughout both SEP events lasting several days. Finally, four SEP events did not exceed a threshold of 1 pfu at E > 100 MeV and thus are presented in Table 5 as “N/A” entries. It is important to note that in this part of our analysis, the parameters of Table 5 are taken “as is” from the campaign event website and all comparisons and results are driven by the parent solar event parameters and the observed peak proton fluxes at E > 10 & E > 100 MeV.

Table 5

Details of the SEP events’ solar event parameters (i.e., solar flare and CME characteristics) and observed peak proton flux at E > 10−; & E > 100 MeV, used in the validation, provided under the SHINE/ISWAT/ESWW SEP Model Validation Challenge.

4.1 Probability of detection

Using the associated parent solar events as input parameters (see Table 5), derived the probability of SEP detection per event and for all three PROSPER’s modes of operation (see Table 1). Figure 12 presents the obtained outputs. Results are presented in red (flare mode of operation), blue (CME mode of operation), and green (solar flare & CME mode of operation). Each panel corresponds to each integral energy (i.e., E > 10−; >30−; and >100 MeV). Within the panel of E > 10 MeV (upper panel), the threshold (pt) above which the probabilistic forecast is transformed into a categorical one is over-plotted with a dotted horizontal black line. This pt was directly obtained by Table 1 of Anastasiadis et al. (2017) and corresponded to the threshold above which a SEP event would have been identified. Such a pt is obtained at the perfect skill of a forecasting algorithm and is usually applied for direct discrete analysis of SEP yes/no. However, it must be noted that PROSPER gives a probabilistic forecast. From the inspection of the E > 10 MeV panel in Figure 12, it seems that the PROSPER’s mode of operation based on solar flare input (red bars) would spot almost all of the events: ~90% (i.e., 8/9 events for which flare input was available). The SEP event that would not have been spotted by the PROPSER’s flare mode of operation is the one that occurred on 14 July 2017. This SEP event was associated with a “well-connected” (W33°) solar flare but with relatively modest magnitude (M2.4) that, although led to a probability of SEP occurrence of ~20%, was lower than the pt and thus would not have been spotted by PROSPER. Moreover, the P(SEP) seems to be quite successful (100%) for the CME module (i.e., 10/10 events for which CME input was available were identified as SEP events (i.e., crossing the pt). Adding to this, the 14 July 2017 event that was “missed” by the flare mode of operation was associated with a halo CME with a speed of 1200 km/s – which enhanced the probability of SEP occurrence to ~ 58% in the CME mode and thus pushed it above the pt threshold, resulting into an identification of the event by PROSPER. Finally, the green bars represent PROSPER’s outputs for the solar flare & CME mode of operation. As it can be seen, the hit rate is 100% (i.e., 9/9 events for which both flare and CME inputs were available).

thumbnail Fig. 12

Comparison of the predicted probabilities of SEP detection per PROSPER’s mode of operation for all cases included in Table 5, for each integral energy of interest (i.e., E > 10−; E > 30−; and E > 100 MeV). In each panel, the red histogram represents the outputs of PROSPER based on SF data, the blue histogram represents the outputs of the model based on CME data, and the green histogram represents the outputs of the model based on both SF & CME inputs. In the upper panel, the horizontal line represents the threshold above which most SEP prediction modules would issue a notification for a forthcoming SEP event – see details in Table 1 of Anastasiadis et al. (2017).

These outputs (i.e., P(SEP)) stem from the fits presented in Section 3 here above. In particular, 5/9 (~66%) of the parent solar flares included in Table 5 are “poorly connected” (i.e., longitude < 20°), and 4/9 (~44%) are “well connected” (i.e., longitude ≥ 20°). Hence, the fits used in these cases are presented in Figure 8. Figure 12 shows that at an integral energy E > 10 MeV the largest differences in the achieved P(SEP) are due to the magnitude of the associated solar flare. For example, the first two events on the list are the 07 March 2012 (X5.4/E15 – “poorly connected”) and 17 May 2012 (M5.1/W89 – “well connected”), with the obtained probability being higher for the former of the two (see Fig. 12). When shifting to higher energies (E > 100 MeV), the differences in the obtained P(SEP) values (red bars) become larger because of the additional separation of the fits due to the longitudinal difference. Moreover, the two events on 06 & 10 September 2017 resulted in very high probabilities of SEP occurrence, which seem to be preserved in each integral energy of interest. The specific SEP events are associated with very strong (>X8.0) and “well connected” solar flares, which seem to be dominant. All driving CMEs are Halo and fast thereby all of the P(SEP) values (blue bars) are obtained from the blue lines (fits) of Figure 6, and the differences are due to the V CME inputs, with larger energies requiring a higher value of V CME in order to lead to a larger P(SEP). That said, for the same V CME, the obtained P(SEP) is lower when shifting to higher energies in all tested cases. Finally, for the flare & CME mode of operation (green bars) one can notice that the cases of 12 July 2012 and 11 April 2013 are both “poorly connected” ones, associated with comparable CMEs (see Table 5). The difference between the achieved P(SEP) is driven by the magnitude of the associated solar flare and the former case leads to higher P(SEP) at all integral energies. For the SEP event on 04 September 2017, which is also a “poorly connected” one, the V CME is almost a factor of ~1.7 larger than the previous two cases, although the magnitude of the associated solar flare is M5.5 (i.e., less strong than the previous two cases). However, the obtained P(SEP) is higher than both cases at all integral energies, pointing to the role of the V CME in the establishment of that probability. Finally, the SEP event on 14 July 2017, which is a “well connected” one and associated with a CME of V CME = 1200 km/s leads to a higher P(SEP) compared to the case of the “poorly connected” SEP event at 11 April 2013 at all integral energies. In the latter case, the associated flare is stronger (M6.5 vs. M2.4), but the CME speed is slower (861 vs. 1200 km/s). Thus the better magnetic connection and the higher V CME seem to lead to a higher P(SEP).

4.2 Peak proton flux

Figure 13 provides scatter plots of the predicted peak flux versus the observed peak flux at E > 10 MeV (top row) and E > 100 MeV (bottom row) for each of the three PROSPER’s modes of operation. In particular, the first column corresponds to the PROSPER’s outputs based on SF input; the middle column refers to the outputs of PROSPER utilizing CME input alone, and the third column represents outputs from PROSPER’s third mode of operation that makes use of both SF and CME inputs. Blue circles denote the obtained outputs at a 50% CL (lower limit), and red circles present the same outputs for a 90% CL (upper limit).

thumbnail Fig. 13

Comparison of predicted with observed peak flux for the SEP events of Table 5, based on solar flare (column at the left-hand side), CME (column in the middle), and solar flare & CME (column at the right-hand side) input data. The top row refers to integral energy of E > 10 MeV and the bottom to E > 100 MeV.

The predicted peak flux at E > 10 MeV as a function of the observed peak flux at the same energy, utilizing only SF information (upper panel on the left-hand side of Fig. 13), shows a strong correlation, especially in the lower limit (i.e., 50% CL) with the correlation coefficient (cc) being 0.76 (lower limit) and 0.72 (upper limit). Moreover, more than half of the predicted values seem close to the dichotomous line, indicating a perfect prediction. However, the upper limit predictions (90% CL) show an over-forecasting being above the perfect dichotomous prediction line, with these predictions being within one order of magnitude from the observed peak fluxes, but not always. Nonetheless, the lower and the upper limits seem to capture the actual peak flux quite well at E > 10 MeV. When moving to the CME mode of operation (column in the middle) for E > 10 MeV, a weaker correlation is observed, with the cc being 0.33 (lower limit) and 0.32 (upper limit), respectively. The tendency for over predicting is also visible in this case and holds for the majority of the events, both at the lower and the upper limit. At the same time, the CME mode of operation seems not to provide granularity since all of the points are predicted using an input halo and fast CME and with the expectation of a SEP event reaching E > 100 MeV. Hence the differences are small between events since all outputs are derived by the fits of the bottom panel of Figure 10. As a result, those small differences are corroborated with the small differences between the established P(SEP) (see Eq. (14)). Moving to the results of the combined SF & CME mode of operation (column on the right-hand side), a stronger correlation is achieved (cc is 0.77 – lower limit & 0.71 – upper limit). Furthermore, the predictions at both the lower and the upper limit seem to be very close to the dichotomous line, with a small tendency for over-forecasting in the upper limit predictions. Nonetheless, the predictions at the lower limit for the strongest SEP events (i.e., those achieving the largest peak flux) at E > 10 MeV seem to fall at the perfect dichotomous line.

In the case of E > 100 MeV, the peak flux seems to be captured well by all three modes of operation, with the obtained cc being on average ≥ 0.90. Especially with the upper limit in the case of the solar flare input (panel at the left-hand side at the bottom row) and the flare & CME (panel on the right-hand side at the bottom row) modules, the agreement is quite reasonable with most of the events predicted at the upper limit (90%) being either on the dichotomous line or very close to it (i.e., within less than 1 order of magnitude).

The results presented in Figure 13 underline the inherent difficulty of the SEP characteristics prognosis. However, the usage of the upper (90%) and lower (50%) limit in PROSPER shows promise and seems to capture the expected peak flux of the SEP events reasonably well.

5 Discussion and conclusions

A new probabilistic model, PROSPER, that provides short-term forecasting (nowcasting) of SEP events (i.e., probability of occurrence and corresponding peak flux), has been presented in this work. In particular, PROSPER was implemented based on a straightforward application of the Bayes theorem using probability distribution functions constructed from a large database of solar flares, CMEs, and SEPs. The output of the prediction is the probability of a solar event leading to a SEP event, i.e., P(SEP). As a second step, the estimated peak proton flux of the SEP event is evaluated. PROSPER has three modes of operation based on the inputs received. There is one mode that utilizes solar flare data (magnitude and longitude), one that makes use of CME identifications (AW and speed), and a third one that makes use of the combined input of solar flares and CMEs (see Table 1). The basic implementations of PROSPER presented in this study are summarized as follows:

  • Probability of SEP occurrence. For the identification of P(SEP) in each case, log-normal fits (Eq. (1)) to the data were utilized (see e.g., Figs. 3 and 4), and the resulting distributions were combined under the Bayes formula (Eq. (11)). These Bayes fits were obtained for different integral energies spanning from E > 10−; E > 30−, and E > 100 MeV (see Figs. 6, 8 and 10).

  • Peak fluxes. The modeling of the peak fluxes was based on exponential cut-off fits (Eq. (14)) to cumulative distribution functions (CDFs) constructed by the recorded peak fluxes of the SEP events in our sample. In this case, first, the data are filtered based on the highest expected integral energy that a SEP event will occur (e.g., applying thresholding on the P(SEP) obtained in the previous step and thus utilizing the prediction of an event being expected to reach E > 100 MeV, E > 30 MeV or E > 10 MeV). Detailed fits are presented in Figures 7, 9, and 11, as well as in Tables 2, 3, and 4, respectively.

The ramifications of Bayes’ rule are countless and have gained wide recognition in data driven Space Weather-related topics. Nonetheless, to our knowledge, PROSPER is the first SEP prediction model that allows the estimation of P(SEP) utilizing Bayes’ theorem (see Camporeale, 2019 and references therein). PROSPER allows a direct estimation of P(SEP) based on a previous outcome having occurred in similar circumstances, and in doing so, it takes into account all available data without any a priori bias. As a result, the benefit of such an approach is the natural and principled way of combining prior information with data within a solid decision theoretical framework, providing exact inferences that are conditional on data (Hartigan, 2012).

Moreover, all PROSPER’s modes of operation were validated based on detailed case studies of SEP events selected as part of the NASA CCMC SEP scoreboard challenge (see Table 5). Blind tests with archived parent solar data of these events were applied to PROSPER’s modes in order to derive the probabilities of SEP detection per the mode of operation and per SEP event. The obtained results were discussed in a comparative manner, showing that the magnetic connectivity (i.e., “well connected” events), strong solar flares, and fast CMEs lead to higher achieved P(SEP) values. The validation results of the PROSPER models are based on a sample of 10 SEP events and are summarized as follows:

  • PROSPER’s module that is based on solar flare input would spot ~90% (i.e., 8/9) of the events for which flare input was available.

  • For the CME module, the hit rate is 100% (i.e., 10/10 events for which CME input was available were identified as SEP events).

  • The solar flare & CME mode of operation also has a hit rate of 100% (i.e., 9/9 events for which both flare and CME inputs were available).

Additionally, the largest differences in the achieved P(SEP) in lower energies (i.e., at an integral energy E > 10 MeV) are due to the magnitude of the associated solar flare, F SXR. For the higher energies (i.e., E > 100 MeV), the differences in the obtained P(SEP) values become larger due to the additional separation of the fits, which is driven by their longitudinal difference. Finally, the solar flare & CME module revealed that the better magnetic connection and the higher V CME seem to lead to a higher P(SEP).

Furthermore, we included scattering plots of observed versus predicted SEP peak fluxes to quantify the reliability of PROSPER’s predictions. Although the SEP peak fluxes are difficult to infer based on the characteristics of their parent solar events, the usage of an upper (90%) and lower (50%) limit in PROSPER seemed to capture the expected peak flux of the SEP events, reasonably well. In particular, at E > 10 MeV, for the solar flare module, the predicted peak flux, as a function of the observed peak flux, shows a strong correlation, especially in the lower limit (i.e., 50% CL) with more than half of the predicted values being close to the dichotomous line – indicating a perfect prediction. Nonetheless, the CME module at the same energy seems not to capture the expected peak flux with granularity. However, the combined SF & CME mode of operation shows that the predictions at both the lower and the upper limit seem to be very close to the dichotomous line, with a small tendency for over-forecasting in the 90% limit predictions. Nonetheless, the predictions at 50% for the SEP events achieving the largest peak flux at E > 10 MeV seem to fall at the perfect dichotomous line. Additionally, in the case of E > 100 MeV, the peak flux seems to be captured well by all three modes of operation, especially with the 90% limit in the case of the SF and the flare & CME modules the agreement is quite reasonable with most of the events predicted at 90% being either on the dichotomous line or very close to it (i.e., within less than 1 order of magnitude).

The PROSPER model has been incorporated into the new operational Advanced Solar Particle Event Casting System (ASPECS) tool providing outputs in real-time through the web portal http://phobos-srv.space.noa.gr/. In addition, ASPECS offers the capability to interrogate PROSPER for historical cases via a run-on-demand functionality. ASPECS is the first realization of ESA’s SEP Advanced Warning System (SAWS) – a modular framework for forecasting solar energetic particle (SEP) events, their characteristics, and profiles.

A point that still needs to be addressed in the validation is how the model reacts when solar events not associated with SEPs are introduced as inputs. This is a part of an ongoing follow-up study focusing on the performance of the ASPECS tool. On a preliminary basis, it can be commented that PROSPER achieves categorical scores in line with the expected range for data-driven models. The range of scores are taken from Table 1 of Anastasiadis et al. (2017) (i.e., Probability of Detection – POD ~50–70%, False Alarm Rate – FAR ~30–50%). Such a range is realistic and inherently imposed due to the imbalanced dataset that is being used for SEP prediction efforts (see the relevant detail discussion in Lavasa et al., 2021). In fact, recently, Stumpo et al. (2021) have shown that the major drawback in predicting the occurrence of SEPs, i.e., P(SEP), in the framework of statistical forecasting concepts, is the optimization of FAR which directly depends on the imbalance of the dataset used. That means that the greater the imbalance, the greater the FAR is affected by the presence of false positives. Taking all of these into account, in our next study such a thorough validation step will be implemented.

It should be mentioned that PROSPER’s CME and flare & CME modes of operation are subject to the CME input data used. In particular, while developing PROSPER, CME data from the CDAW CME catalog have been used. It was pointed out by Richardson et al. (2015) that different CME catalogs provide different estimations of the same CME event, hence given that PROSPER is a data-driven model, it is inherently affected by such differences. On top of the CDAW CME catalog, other options include: (a) the Computer-Aided CME Tracking (CACTUS) catalog (https://wwwbis.sidc.be/cactus/), which is compiled using a specialized software package (Robbrecht & Berghmans, 2004); (b) the SEEDS (Solar Eruptive Event Detection System) catalog (http://spaceweather.gmu.edu/seeds/) (Olmedo et al., 2008); (c) the CORIMP (coronal image processing) method catalog (http://alshamess.ifa.hawaii.edu/CORIMP/) (Byrne et al., 2012) and (d) the Space Weather Database Of Notification, Knowledge, Information (DONKI) developed by CCMC (https://ccmc.gsfc.nasa.gov/donki/). Therefore, a natural next step of this study is to adapt PROSPER to each of these catalogs and quantify differences/changes following the study by Richardson et al. (2015).

The usage of CME identifications (e.g., angular width and speed) for the derivation of the SEP occurrence probabilities (P(SEP)) and expected peak proton flux has been explored, already, in a few operational efforts (e.g., Dierckxsens et al., 2015; Papaioannou et al., 2018b; Richardson et al., 2018). On top of that, statistical studies have pointed out that P(SEP) increases as a function of solar flare magnitude (F SXR), longitude, and CME speed (V CME) – especially when considering solar flare-CME couples situated on the west part of the visible solar disk (i.e., Dierckxsens et al., 2015). In addition, it was recently shown that F SXR, V CME, and SXR fluence have the largest potential for discriminating solar events that are associated with SEP events (Lavasa et al., 2021), while as many as targeted parameters need to be taken into account, in order to build a reliable and more accurate SEP predicting system (see e.g., Papaioannou et al., 2018b). PROSPER takes advantage of such findings and comes with three modes of operation. Nevertheless, its performance certainly benefits from the provision of reliable CME estimates or a proper proxy for the CME speed and width, which would be very much desirable to obtain in near-real-time.

Acknowledgments

This work was supported through the ESA Contract No. 4000120480/NL/LF/hh “Solar Energetic Particle (SEP) Advanced Warning System (SAWS)”. Athanasios Papaioannou and Angels Aran acknowledges the support from the project MDM-2014-0369 of ICCUB (Unidad de Excelencia “María de Maeztu”). The CME Catalog used in this work is generated and maintained at the CDAW Data Center by NASA and The Catholic University of America in cooperation with the Naval Research Laboratory. SOHO is a project of international cooperation between ESA and NASA. Athanasios Papaioannou, Rami Vainio & Anastasios Anastasiadis acknowledge the International Space Science Institute and the supported International Team 441: High EneRgy sOlar partICle Events Analysis (HEROIC, http://www.issibern.ch/teams/heroic/). Finally, Athanasios Papaioannou acknowledges support from NASA/LWS project NNH19ZDA001N-LWS. The authors would like to thank Dr Manolis Georgoulis for stimulating discussions, Dr Katie Whitman for fruitful collaboration and exchange of ideas as concerns the validation of PROSPER, and Mr George Vasalos for valuable technical assistance. Furthermore, the authors would like to thank the anonymous referees for critical and constructive reading of the manuscript and for valuable comments that improved the contents of the paper. The editor thanks Ian Richardson and an anonymous reviewer for their assistance in evaluating this paper.


1

PROSPER was further extended to E > 60 MeV but the results are not shown through the ASPECS operational system.

2

PROSPER will further be extended to E > 300 MeV as part of the ASPECS tool.

3

By construction, CMEs associated to far side solar flares are not included in our sample.

4

λ refers to the continuous parameter of Table 1 per case.

6

MAE = , with n being the number of pairs, y and x the fitted and the actual value, respectively.

7

Although E > 60 MeV is not implemented in the ASPECS tool, we present the obtained fits for P(SEP) in this and consequent figures for consistency.

References

Cite this article as: Papaioannou A, Vainio R, Raukunen O, Jiggens P, Aran A, et al. 2022. The probabilistic solar particle event forecasting (PROSPER) model. J. Space Weather Space Clim. 12, 24. https://doi.org/10.1051/swsc/2022019.

All Tables

Table 1

Details on PROSPER’s modes of operation, inputs and initial binning.

Table 2

Details on PROSPER’s peak flux fits for the CME bins, derived by equation (13). The filtering on the integral energy is first applied, and then there are two bins. One for the width of the CME (i.e., AW = 360°Halo or AW < 360° Not Halo) and one for the CME speed (i.e., V CME being < or ≥ 1250 km/s). These fits are presented in Figure 7.

Table 3

Details on PROSPER’s peak flux fits for the SF bins, derived by equation (13). The filtering of the integral energy is first applied, and then there are three bins based on the magnitude of the solar flare. These fits are presented in Figure 9.

Table 4

Details on PROSPER’s peak flux fits for the combined solar flares & CME bins, derived by equation (13). The filtering on the integral energy is first applied and then bins on the solar flare magnitude and CME speed are applied. These fits are presented in Figure 11.

Table 5

Details of the SEP events’ solar event parameters (i.e., solar flare and CME characteristics) and observed peak proton flux at E > 10−; & E > 100 MeV, used in the validation, provided under the SHINE/ISWAT/ESWW SEP Model Validation Challenge.

All Figures

thumbnail Fig. 1

An illustration of the two methods (Method 1; (blue color) and Method 2; (red color)) as a superposition investigated in this work. Both fits in the plots correspond to the Probability Distribution Functions (PDFs) for the case of Halo CMEs, with respect to the CME speed V CME. See text for details

In the text
thumbnail Fig. 2

The distributions of the GOES/EPS data per integral energy of interest for September 2009. The relevant percentiles of 68–95% & 99.7% of the distributions are depicted as vertical dashed lines. The mean values of these distributions (presented as solid vertical lines) are used as the background level per channel in equation (14).

In the text
thumbnail Fig. 3

The observational Cumulative Distribution Functions (CDFs) constructed by the database (i.e., all black points correspond to actual data). The red line depicts the log-normal fit to the data per case. Each plot further includes the fraction of the data explained by the fit (in [%]) (i.e., a 100% fraction means that all data points have been used in the corresponding fit), the mean absolute error (MAE), and the fit that was used (i.e., log-normal). From top to bottom, the column on the left-hand side corresponds to the CDFs per CME width (Halo, Partial Halo, Non-Halo) for all CMEs in each sample. The column on the right-hand side corresponds to the CDFs in case the CME was associated with a SEP at an energy of E > 10 MeV, at each width bin. Similar fits (and plots) have been constructed for all other SEP integral energies of interest but are not shown.

In the text
thumbnail Fig. 4

The Cumulative Distribution Functions (CDFs) for the case of Halo CMEs (top panel); Partial Halo CMEs (middle panel), and Non-Halo CMEs (bottom panel). The black color represents all CMEs in the respective sample, whereas CMEs associated with SEP events for different integral energies are color-coded. The gray border hatched rectangle area provides the limit for the V CME.

In the text
thumbnail Fig. 5

The Probability Distribution Functions (PDFs) for the case of Halo (top panel), Partial Halo (middle panel), and Non-Halo (bottom panel) CMEs, as derived by the CDFs, applying Method 2. The black color represents all CMEs in the sample, whereas CMEs associated with SEP events for different integral energies are color-coded. The gray border hatched rectangle area provides the limit for the V CME.

In the text
thumbnail Fig. 6

The curves for the estimation of the probability of SEP occurrence when both the speed (V CME, X-axis) and the width (Halo, Partial Halo, and Non-Halo) of a CME are known. Each panel corresponds to an integral energy (i.e., E > 10−; >30−; >60− and >100 MeV). Within each panel, three fits are presented, one per CME width – color-coded as: Halo (blue), Partial Halo (red), and Non-Halo (green) CMEs. The gray border hatched rectangle area provides the limit for the V CME. See text for details.

In the text
thumbnail Fig. 7

The CDFs constructed by the database (i.e., all points correspond to actual data) for each integral energy of choice for the estimation of the peak flux (Peak flux [pfu], X-axis), when both the speed and the width of a CME are known. Lines depict the exponential cut-off fits to the data points per case (from Eq. (13)). Panels: (a) and (b) correspond to a filtering integral energy at E > 10 MeV; (c–f) to E > 30 MeV and (g) to E > 100 MeV). See text and Table 3 for details on the bins and the obtained parameters of the fit per case.

In the text
thumbnail Fig. 8

The curves for the estimation of the probability of SEP occurrence when both the SXR magnitude and the longitude of a solar flare are known. Each panel corresponds to integral energy (i.e., E > 10−; >30−; >60− and >100 MeV). Within each panel, two fits are presented, one per longitudinal bin – color-coded as: well connected (red) and poorly connected (blue) solar flares. See text for details. The gray border hatched rectangle area provides the limit for the F SXR. This is similar to Figure 6.

In the text
thumbnail Fig. 9

The CDFs constructed by the database for each integral energy of choice for the estimation of the peak flux (Peak flux [pfu], X-axis), when the SXR solar flare magnitude is known. Lines depict the exponential cut-off fits to the data points per case. Panel (a) corresponds to a filtering of the integral energy at E > 10 MeV. The left-hand column (panels (b), (d), (f)) corresponds to a filtering integral energy at E > 30 MeV, and the right-hand column (panels (c), (d), (g)) to a filtering integral energy at E > 100 MeV. Black color denotes the SXR flare flux bin in each panel. See text and Table 3 for details on the bins and the obtained parameters of the fit per case.

In the text
thumbnail Fig. 10

The curves for the estimation of the probability of SEP occurrence when both a CME and a solar flare have taken place, and thus the SXR magnitude (X-axis) and the longitude of a solar flare, as well as, the CME speed and AW are known. The figure presents the output of PROSPER for the case of E > 10 MeV with Halo CMEs (AW = 360°) and well-connected flares (lon ≥ 20°). Each fit corresponds to a solar flare class, color-coded as: C1.0 (black line), M1.0 (blue line), X1.0 (red line) and X10 (magenta line). The X-axis shows the speed of the CME (in km s−1).

In the text
thumbnail Fig. 11

The CDFs constructed by the database (i.e., all points correspond to actual data) for each integral energy of choice. See text and Table 4 for details on the bins and the obtained parameters of the fit per case. Each panel further displays the selected bins and the displayed energies if more than one is included in a panel.

In the text
thumbnail Fig. 12

Comparison of the predicted probabilities of SEP detection per PROSPER’s mode of operation for all cases included in Table 5, for each integral energy of interest (i.e., E > 10−; E > 30−; and E > 100 MeV). In each panel, the red histogram represents the outputs of PROSPER based on SF data, the blue histogram represents the outputs of the model based on CME data, and the green histogram represents the outputs of the model based on both SF & CME inputs. In the upper panel, the horizontal line represents the threshold above which most SEP prediction modules would issue a notification for a forthcoming SEP event – see details in Table 1 of Anastasiadis et al. (2017).

In the text
thumbnail Fig. 13

Comparison of predicted with observed peak flux for the SEP events of Table 5, based on solar flare (column at the left-hand side), CME (column in the middle), and solar flare & CME (column at the right-hand side) input data. The top row refers to integral energy of E > 10 MeV and the bottom to E > 100 MeV.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.