Two solar proton fluence models based on ground level enhancement observations

Osku Raukunen, Rami Vainio, Allan J. Tylka, William F. Dietrich, Piers Jiggens, Daniel Heynderickx, Mark Dierckxsens, Norma Crosby, Urs Ganse and Robert Siipola 1 Department of Physics and Astronomy, University of Turku, 20014 Turku, Finland 2 Emeritus, NASA Goddard Spaceflight Center, Greenbelt, 20771 MD, USA 3 Consultant, Prospect Heights, 60070 IL, USA 4 ESA Space Environments & Effects section (TEC-EES), ESA-ESTEC, Keplerlaan 1, 2201 AZ Noordwijk, Netherlands 5 DH Consultancy, Leuven, Belgium 6 Royal Belgian Institute for Space Aeronomy, Avenue Circulaire 3, 1180 Uccle, Belgium 7 Department of Physics, University of Helsinki, Helsinki, Finland


Introduction
In some of the most extreme solar energetic particle (SEP) events, particles can be accelerated up to GeV energies. These energies are sufficient for particles to reach the atmosphere of the Earth, where their interactions produce showers of secondary particles all the way to the ground level; hence, these events are known as ground level enhancements (GLEs). These secondary particles can be detected with neutron monitors (NMs; Simpson et al., 1953;Simpson, 1958) on ground as increases in intensity above the background produced by the galactic cosmic rays (GCRs). Since 1942, 72 GLEs have been observed (Cliver et al., 1982;Cliver, 2006;  activity and from cycle to cycle; the most dramatic change in GLE activity has been the decrease from 16 GLEs during solar cycle 23 to only one during the ongoing cycle 24 (between December 2008 andMarch 2017). A detailed list of GLEs 1-71 and their solar event associations is presented in Table 1. We note that since GLE 72 occurred after the release of our model 1 (see Sect. 5), it is not included in the analysis presented in this paper. According to the two-class paradigm (e.g., Reames, 1999;Reames, 2013, and references therein), SEP events are categorized as impulsive events relating to particle acceleration in solar flares, or gradual events, associated with shocks driven by coronal mass ejections (CMEs). Impulsive events generally have shorter durations, smaller particle fluences and more compact spatial scales when compared to gradual events. Another distinguishing factor is the particle composition: impulsive events are characterized as electron-and 3 He-rich and with enhanced abundances of heavy ions (e.g., Reames et al., 1985;Reames, 1988), whereas the particle composition of gradual events more or less resembles that of the solar corona (e. g., Meyer, 1985). Although GLEs are usually considered as extreme examples of gradual events, some studies suggest that GLEs have a direct flare component, based on their temporal behaviour (Grechnev et al., 2008;McCracken et al., 2008), or particle composition and charge states (Cohen et al., 1999;Mason et al., 1999;Möbius et al., 1999). It has also been suggested that the energy-dependent composition and charge states in these events can be understood in terms of evolving shock-normal angle in a shock acting on remnant flare suprathermals from earlier flare activity (Tylka et al., 2005;Tylka and Lee, 2006;Sandroos and Vainio, 2007;Sandroos and Vainio, 2009). SEP events can be considered as one of the most important features of space weather (e.g., Vainio et al., 2009). As the most energetic class of SEP events, GLEs are important drivers of space weather phenomena. For spacecraft outside the Earth's trapped radiation belts and the shelter provided by the Earth's magnetic field, SEPs constitute the most severe radiation hazard. Radiation effects caused by SEPs include degradation of electronic components and solar cells and single event effects (SEE) including latchups and upsets (e.g., Feynman and Gabriel, 1996). In addition to being harmful to electronics, ionizing particle radiation is dangerous to biological organisms. Astronauts on space missions and even aircraft crews and passengers on high altitude polar flights are susceptible to radiation hazards. Effects of radiation on humans are divided into two categories: early (deterministic) effects, caused by sudden exposure to a large dose of radiation, which include nausea or even death, and late (stochastic) effects, usually manifesting as different forms of cancer. Effects of space weather on both electronic and biological systems have been reviewed in e.g., Facius and Reitz (2007); Lanzerotti (2007); Vainio et al. (2009);Singh et al. (2010), and references therein.
proton models based on long-term observations of high energy SEP events. We start by giving a short overview of advances in the modeling of solar energetic particles (Sect. 2), then describe the energy spectra of GLEs (Sect. 3) and analyze their timing (Sect. 4). In Section 5 and Section 6 we present details of our two models, compare the models in Section 7 and present a summary and discussion in Section 8.

Previous models of SEPs
The first widely accepted probabilistic solar proton fluence model was presented by King, (1974). His model was based on observations of >10 to >100 MeV protons during the solar cycle 20. The fluence for that cycle was dominated by the GLEs of August 1972, which King, (1974 considered "anomalously large", in contrast to "ordinary" events. The model assumed that the spectrum of all predicted anomalously large events would resemble that of August 1972, and that the fluences of the ordinary events follow log-normal distribution.
An update for the King model, known as the Jet Propulsion Laboratory (JPL) model, was developed by Feynman et al., (1990Feynman et al., ( , 1993. Using data from solar cycles 19-21, they showed that the distribution of event sizes during the active part of the solar cycle can be modeled with a continuous log-normal distribution, thus removing the need to divide events into ordinary and anomalously large. The JPL model uses Monte Carlo simulations to calculate the total fluence accumulated during a mission at a certain confidence level. The JPL model was in turn improved by Rosenqvist et al., (2005), who studied the effect of the fitting procedure, fluence thresholds and the inclusion of different data sources on the model. They used >10 MeV proton data from January 1974 to May 2002 and provided to the public all the data and tools needed to update the model. Their work was further developed by Glover et al., (2008), who extended the study to >30 MeV protons.
Another solar proton fluence model is the Emission of Solar Protons (ESP) model (Xapsos et al., 2000), which predicts integral fluences at energies from >1 MeV to >300 MeV during the active part of the solar cycle. It applies a lognormal fit to the observed yearly fluences which is then extrapolated to longer mission durations. Xapsos et al. (2004) developed the model further into the Prediction of Solar particle Yields for Characterization of Integrated Circuits (PSYCHIC) model, which predicts proton spectra up to >327 MeV and considers solar minimum and maximum periods. The PSYCHIC model was extended to also predict cumulative solar heavy ion fluences (Xapsos et al., 2007). In addition, a model for solar electrons has been developed based on the approach of the ESP model (Taylor et al., 2011).
Another approach to solar proton modeling is the Solar Energetic Particle Environment Modelling (SEPEM) (Jiggens et al., 2012), which is based on "virtual timelines" rather than traditional Monte Carlo approaches which base the number of SEP events to be sampled from an event frequency distribution. Finally, in the Moscow State University (MSU) model by Nymmik (1998);Nymmik (1999), the occurrence rate of the particle events follows the solar activity level, parameterised by the Wolf sunspot number. Another property of the MSU model that differentiates it from other models is that the shape of the fluence spectra of generated events is modeled instead of the fluence distribution being derived in each individual energy channel. More recently Atwell et al. (2016) have provided a probabilistic model of SEP fluences and doses during periods of low solar activity, with monthly smoothed sunspot number less than 50. More extensive space particle radiation model reviews and comparisons can be found for example in Vainio et al. (2009);Jiggens et al. (2012); Xapsos et al. (2013).
3 Integral fluence spectra of GLEs Tylka and Dietrich (2009) developed a method for analysing data from the worldwide neutron monitor network. They derived normalized, event-integrated proton spectra for 53 of the 66 GLE events occurring between 1956-2009. Their results agreed well with the fluences measured by IMP-8 (McGuire et al., 1986), SAMPEX (Baker et al., 1993) and GOES (Onsager et al., 1996) spacecraft at energies of ∼300-700 MeV, corresponding to rigidities of ∼0.81-1.43 GV. Furthermore, they showed that the integral fluence spectra in rigidity from combined spacecraft and neutron monitor measurements can be represented with a double power law function, also known as the Band function (Band et al., 1993): Here J(> R) is the omnidirectional event-integrated integral fluence in units of cm -2 , J 0 is an overall fluence normalization coefficient, g 1 is the low rigidity power law index, g 2 the high rigidity power law index and (g 2 À g 1 )R 0 ≡ R 1 is the breakpoint rigidity. The Band function is constructed in such a way that both the function and its first derivative are continuous. Recently, the Band function has been shown to approximate well the energetic particle spectra in the GLEs of solar cycle 23 measured by instruments on the ACE, GOES, SAMPEX and STEREO spacecraft (Mewaldt et al., 2012) and GLE71 measured by the PAMELA experiment (Asvestari et al., 2017), as well as proton and heavy ion spectra of large SEP events measured by instruments on the ACE, SAMPEX, SOHO and GOES spacecraft (Desai et al., 2016a, b). The original Tylka and Dietrich (2009) GLE analysis was extended through 2012 using data from GOES spacecraft 2 and the worldwide NM network. The analysis uses timelines of pressure-corrected neutron monitor counts obtained primarily from online public archives at the US National Geophysical Data Center (NGDC) 3 and the Bartol Research Institute 4 , as well as a few other public station-websites and online archives, some of which are no longer available. The analysis starts with defining the uncorrected integral solar proton fluence as the product of the integral GCR fluence (Nymmik et al., 1992;Smart et al., 2006), which depends on the cutoff rigidity of the neutron monitor, R i , and time, T, and the fractional increase in the NM counts, i.e., The uncorrected fluence is then corrected with a factor C(R i , T, g), which also depends on the cutoff rigidity and time, as well as the spectral shape of the solar protons, which is assumed to be a power law in rigidity. The correction factor can be evaluated numerically as where s(r) is the differential solar proton spectrum, g(r) is the differential GCR spectrum, and y(r) is the NM yield function . The correction factor and the corrected fluence for each NM i is then calculated for a range of power law indices g. For each GLE, a least-squares fit to a power law in rigidity is performed on the corrected fluences, and the value of g that gives the best fit is selected. As an example of the corrected NM fluences, the integral proton spectrum of GLE 71 is shown in Figure 1. The NM observations are shown in orange, GOES observations in green (MEPAD) and red (HEPAD), and the Band-fit spectrum in blue. For the analysis of the fluence spectra of GLEs we utilize a dataset with spectral parameters for 59 GLEs occurring in 1956-2012. During the years, 67 GLEs occurred in total, but eight of them had too small fluences for the Band-fits to be reliable. The smaller events also need to be taken into account when modeling the proton fluences; this will be discussed in more detail in both of our model descriptions. The spectral parameters and their uncertainties are given in Table 2. Figure 2 shows the event-integrated fluences above 1 GV for all 59 GLEs. The fluences and their uncertainties were calculated as the geometric means and geometric standard deviations of large samples (N = 10 5 ) of fluence values generated for each event by sampling each Band-fit parameter from normal distributions centered on their best-fit values with standard deviations given by their error estimates. Note that here and later in the paper we plot fluences in units of cm À2 sr À1 , i.e., the omnidirectional fluences divided by 4p.
Five events (GLE43, GLE58, GLE59, GLE62 and GLE65) were analysed separately for their GLE and energetic storm particle (ESP 5 ) components. In addition, GLE42 was analysed separately for the first 75 minutes and the following 61 hours. The GLE components of these events are marked with additional red squares and the ESP components are shown in red in Figure 2 (which also shows the monthly sunspot number 6 as an overall measure of solar activity). Three of the GLE components (GLEs 58, 59 and 65) have higher fluences at R>1 GV than their separately analysed counterparts, and the reverse is true for the other three. Neither of the separately analysed components seem to have noticeably different fluence distributions compared to the other datapoints. In both of our model descriptions, we discuss how the components are taken into account.

Analysis of GLE timing 4.1 Waiting time distribution
Between 1942 and September 2017, a total of 72 GLEs have been observed, resulting in an average rate of approximately 0.95 GLEs per year. As the observations during the first years were unreliable considering smaller GLEs, omitting the observations made during solar cycles 17 and 18 would result in a better estimate for the event rate: 67 GLEs during the approximately 62.5 years, i.e., 1.07 GLEs per year. However, as can be seen from Figure 2, the occurrence of GLEs is not a homogenous process throughout the solar cycle. Therefore, a more detailed analysis is needed in order to accurately model the GLE occurrence.
The red squares in Figure 3 show the waiting time distribution of the GLEs occurring in 1956-2012. The waiting times were calculated using the GLE onset times given in Table 1. The distribution seems to be exponential except for the shortest (less than 30 days) and longest (more than 700 days) waiting times. Many of the GLEs that occur in quick succession are produced in the same active region; an example of this are GLEs 65-67, occurring between October 28 and November 2, 2003, in the NOAA active region 10486. As the events from the same active region are not statistically  independent of each other, the occurrence of active regions capable of producing one or more GLEs is a better subject for analysis than the occurrence of individual GLEs. Thus, we grouped together all GLEs occurring in the same active region, resulting in 49 GLE episodes with one or more GLEs. The episode waiting times, calculated as the difference of the onset times of the first GLEs of both episodes, are shown as the blue squares in Figure 3. The exponential fit to waiting times less than 700 days, shown as the blue line, describes the data very well. Five out of the six waiting times greater than 700 days occur between the last event of the previous cycle and the first event of the following cycle (marked with extra squares in Fig. 3). The only exception is episode 3, on July 16, 1959, occurring 1049 days after the previous episode on August 31,1956. The IGY-type NMs used during this time had lower statistics and time resolution compared to the NM64 introduced in the early 1960s (Stoker et al., 2000, and references therein). In addition, the number of NMs increased from only twelve during January 1957 to 51 by the end of the year (Shea and Smart, 2000). Therefore, one or more small GLEs may have occurred unobserved, resulting in an erroneous waiting time. Nonetheless, Figure 3 strongly suggests that considering the GLE episodes, each cycle consists of an active period with GLE episodes occurring as a Poisson process with a mean rate, l of 0.00274 per day (1.002 per year) 7 , and a quiet period with too few episodes to reasonably estimate a mean rate.

Normalized occurrence time
To study the occurrence of GLE episodes within a cycle in a better way, we calculated the normalized occurrence times of GLE episodes after the start of the corresponding solar cycle, which we take to be middle of the month with the lowest sunspot number, according to the following equation:  where m d = 10.93 a is the average duration of the solar cycles 19-23, t min(i) the date of the cycle minimum before the episode, t min(iþ1) the date of the cycle minimum after the episode and t i the date of the episode. Figure 4 shows the cumulative distribution of the normalized occurrence times. From this figure it becomes clear that there are indeed two different levels of GLE activity within a cycle: most of the GLE episodes occur during and between the second and seventh years of the standard cycle, leaving very little activity to the first and the last three years of the cycle. More precisely, we assume that only the first and last points of the distribution constitute the quiet part of the cycle. The rest of the points, 46 out of 48 can be well fitted with a line (red line in Fig. 4). Normalized occurrence time for the most recent GLE, on May 17, 2012, cannot be calculated because, as of this writing, cycle 24 is still underway. With the further assumption that the rate of occurrence during the quiet part in the beginning of the cycle is equal to the rate of occurrence during the quiet part in the end of the cycle (shown as blue lines in Fig. 4), we can calculate the starting and ending times of the active part, as well as the rate of occurrence of GLE episodes during the quiet part. The resulting durations for the quiet and active parts of the standard cycle are 3.923 a and 7.011 a, respectively. The rates of occurrence of GLE episodes are the slopes of the linear fits given in the figure multiplied by the total number of episodes (48) and divided by the number of cycles (5), resulting in 0.102 a À1 and 1.312 a À1 for the quiet and active parts, respectively. Note that the resulting value for the active part is higher than the value given in Section 4.1, which is also effectively calculated for the active part of the cycle.

Model 1
Our first model is a JPL-type proton fluence model which was originally developed as a part of the Interplanetary and Planetary Radiation Analysis Model for Human Spaceflight (IPRAM) project for the European Space Agency (ESA). After some small improvements, it was released as a part of the Space Radiation Expert Service Centre (R-ESC) on ESA's Space Situational Awareness (SSA) Space Weather (SWE) Service Network 8 . We utilized the set of Band-fit parameters and calculated differential fluence spectra for each individual GLE and ESP component using the equation   8 The goal of the SWE network is to support users in mitigating the space weather effects while reducing costs and improving reliability of their systems. In addition to the R-ESC, the SWE network consists of four ESCs: Solar Weather, Heliospheric Weather, Ionospheric Weather and Geomagnetic Conditions. The R-ESC is available at http://swe.ssa.esa.int/space-radiation where J (> R i ) is the integral fluence calculated with the Band function and R i is the rigidity corresponding to the kinetic energy E i . We used the reference energies of channels 3-10 of the SEPEM model (Jiggens et al., 2012), plus additional four logarithmically spaced channels to obtain an energy range from about 10 MeV to 1 GeV. The eight small GLEs, for which the Band fits were not performed, were assumed to have a fluence equal to the smallest fluence in each energy channel. Fluence spectra for the multi-GLE episodes were calculated by summing over the individual GLE spectra, as was also done in the case of separately analyzed GLE and ESP components. The cumulative probability distribution of episode integrated fluences for each individual channel was analyzed by fitting three functions: log-normal distribution, truncated power law, and exponentially cut-off power law, given by the following equations, respectively: FðfÞ FðfÞ where F (f) is the probability of an episode fluence being lower than the fluence f, m and s are the mean and standard deviation of log 10 (f), b and g are power law exponents, f min and f low are parameters related to the lower limit of the distribution, and f max and f lim parameters related to the upper limit of the distribution. All parameters in equations (6a)-(6c) are fitted to the binned distribution, not calculated directly from the sample. Figure 5 shows the fluence distribution and the three fit functions for the energy channel of 200-300 MeVas an example. The inset shows the top 10% probability in greater detail. As can be seen from the figure, the log-normal fit (shown in green), while describing the data well for moderate fluences, has a long tail in the very high fluences. This can result in over-estimation of mission fluences at high confidence levels in our model. On the other hand, the truncated power law distribution (shown in blue) has a maximum value close to the highest observed fluence, which implies that the highest possible fluence has been already measured, probably resulting in under-estimation of fluences. Therefore, we select the cut-off power law (shown in red) as a compromise between the two cases.

Distribution of parameters
The principle of our second proton fluence model is to model the actual spectral shape of GLEs instead of the fluences in separate energy channels. To achieve this we first analyse the relationships between the spectral parameters and find out how they are distributed. Figure 6(a) presents the relationship between the parameters J 0 and R 0 . Blue squares mark the GLEs and GLE components of those events with separately analysed GLE and ESP components, while the ESP components are shown in red. As the ESP components may result from a different physical process than their GLE counterparts, we will treat them separately in Section 6.4. GLE 42 (29 September 1989), which is marked in the plot, has the largest value for R 0 (and also the smallest one for g 2 ) of the whole set of parameters, and is clearly an outlier. A linear fit was performed in log-log-scale (implying a power law dependence), omitting the ESP components and GLE 42. The results of the fit are shown in black. With the correlation coefficient r = À 0.81 and sample size n = 58, the probability that the correlation occurs by chance is very small (p ≪ 0.001), and thus we can say that the parameter R 0 is statistically significantly correlated with J 0 . Similarly, Figure 6(b) presents the relationship between the parameters g 1 and J 0 . In this figure the linear fit is performed in log-lin-scale, implying a logarithmic dependence. The correlation coefficient is now r = À 0.71, which again gives a very small probability (p ≪ 0.001) that the correlation occurs by chance. This means that there is also statistically significant correlation between the parameters g 1 and J 0 .
Figures 7(a)-7(c) show the parameter g 2 as a function of J 0 , R 0 and g 1 , respectively. The formatting in each figure is similar to the previous figure. In Figures 7(a) and (b) the linear fits were performed in log-lin-scale. In none of the three cases is the correlation significant at 95% level. Although noncorrelation does not necessarily imply statistical independence, we make that assumption for simplicity, i.e., we assume that the parameter g 2 is independent from the other parameters. Thus, we have two independent parameters, J 0 and g 2 , and two parameters, R 0 and g 1 , that depend upon the value of J 0 .
In the following we show that the parameters J 0 and R 0 are distributed log-normally and the parameters g 1 and g 2 normally. Figure 8(a) presents the cumulative distribution function (CDF) of the logarithm of the independent parameter J 0 shown in red. The Gaussian CDF, shown in blue, is calculated with the sample mean and variance of log 10 (J 0 ). We utilize the Lilliefors test (Lilliefors, 1967;Dallal and Wilkinson, 1986) to see whether or not log 10 (J 0 ) is indeed distributed normally. The test statistic, D log 10 ðJ 0 Þ ¼ 0:071, equals the largest discrepancy between the empirical CDF and the Gaussian CDF. For 5% confidence level and a sample size of n = 58, the critical value for the test statistic would be D crit = 0.116. As the resulting test statistic is smaller than the critical value, we can safely assume log 10 (J 0 ) to be normally distributed. The CDF of the other independent parameter, g 2 , is shown in Figure 8(b), with similar formatting as in Figure 8(a). Again, we utilize the Lilliefors test, and with the test statistic, D g 2 = 0.081 < D crit , we can assume that the parameter g 2 is normally distributed.
To find information about the distribution of the dependent parameters, we first have to remove the J 0 -dependency. This is done via the following equations: log 10 ðR 0 Þ 0 ¼ log 10 ðR 0 Þ À 0:660 þ 0:175⋅log 10 ðJ 0 Þ ð7aÞ where the numerical factors are taken from the linear fits shown in Figures 6(a) and 6(b). Figures 9(a) and (b) show the CDFs of the "residual" parameters log 10 ðR 0 Þ 0 and g 0 1 , respectively, with similar formatting as in Figure 8. For the test statistics we get D log 10 ðR 0 Þ 0 ¼ 0:110 and D g 0 1 ¼ 0:059. Both of these values are again smaller than the critical value, and therefore we can assume that the parameters log 10 (R 0 ) 0 and g 0 1 are also distributed normally.

Small GLEs
In the first model, we considered the small GLEs as having fluences equal to the smallest fluence of the GLEs with Band-fits in each energy channel, but here we adopt a more detailed approach. In addition to the dataset of GLE spectral parameters we have been using, we can utilize a similar dataset but for so-called sub-GLEs, i.e., large SEP events with increases of protons above 300 MeV, but not with sufficient intensities to be detected with ground level neutron monitors. The spectral parameters (and their uncertainties) for sub-GLEs occurring in solar cycles 20-22 are given in Table 3. The parameters for the sub-GLEs occurring in cycles 23 and 24 are listed in Vainio et al. (2017). Figure 10 shows the response (number of counts measured) of a high-latitude NM, DN, (in arbitrary units) for both GLEs and sub-GLEs, calculated with the equation where y (R) is the neutron monitor yield function of Clem and Dorman (2000). GLEs are shown in blue and sub-GLEs in red. The observations of sub-GLEs were performed with IMP-8/ GME from October 1973 to 1986 and with GOES/HEPAD after 1986. IMP-8/GME had a lower background rate than GOES/HEPAD, and detected a few smaller events which fall below the lower limit of the figure, but as they are outside the range of interest, we decided to omit those and show only the events with DN > 10, which seems to be approximately the low   Figure 8, but for the "residual" parameters log 10 ðR 0 Þ 0 (a) and g 0 1 (b).
limit for GOES/HEPAD. There is a large overlap in the DN values of GLEs and sub-GLEs, although in reality the sub-GLEs were not observed by NMs with the geomagnetic cutoff rigidity above 1 GV. One reason for the overlap could be that the sub-GLEs were fitted with only satellite data, not taking into account that the neutron monitors did not detect them. The two dashed lines in Figure 10 show the highest DN from sub-GLEs and the lowest DN from GLEs, and the events with fluences between these values are marked with square symbols. We select these events to represent the eight GLEs for which the Band-fitting was not performed. This selection is a good balance between assuming that all the small GLEs have fluences equal to the smallest fitted GLE in each channel, and assuming that the small GLEs are similar to the other GLEs.
For simplicity and brevity, we assume that the parameters of the selected GLEs and sub-GLEs behave similarly as the parameters of the GLEs, i.e., log 10 (J 0 ) and g 2 are normally distributed independent variables, and log 10 (J 0 ) and g 1 are normally distributed variables which depend linearly upon the value of log 10 (J 0 ). We find that the parameters of the small GLEs are distributed according to the following equations: Dðlog 10 ðR 0 ÞÞ ¼ Nðm ¼ 0; s 2 ¼ 0:022Þ þ0:714 À 0:195⋅log 10 ðJ 0 Þ; Dðg 2 Þ ¼ Nðm ¼ 6:374; s 2 ¼ 3:660Þ; ð10cÞ Table 3. Spectral parameters of sub-GLEs and an ESP counterpart. The uncertainties are estimated by varying the parameter of interest while holding the other parameters at their best-fit values.

Unphysical parameters
As can easily be seen, the spectral parameter distributions presented in equations (8) and (10) also allow for unphysical combinations of parameters. Therefore care has to be taken to discard unphysical values when simulating the spectra. Firstly, if g 1 > g 2 , R 1 in equation (1) has a negative value, which obviously is unphysical. Secondly, g 1 and g 2 are drawn from normal distributions, and may therefore have any real values. If g 1 or g 2 has a negative value, the corresponding power law index in equation (1) will be positive. As the integral spectrum must be strictly decreasing, this may result in an unphysical spectrum. If we look at the derivative of the Band function, we see that the derivative has a zero at R z = À g 1 R 0 . This means that in the cases where g 1 < 0 but g 2 > 0, the Band function is strictly decreasing when R > R z . As we only consider energies above 10 MeV (rigidities above 0.137 GV) in our model, we decide to allow negative values for g 1 if R z < 0.137GV.

ESP components
Since the launch of IMP-4 in May 1967, 54 GLEs have been observed. The ESP components were found to exist for six out of the 54 events, i.e., for 11.1% of the events. Therefore, in our model, we make the simple assumption that each simulated event has a probability of 11.1% to have an additional ESP component. Going back to Figures 6 and 7, we see that five of the six ESP components have values of log 10 (J 0 ) above the average value of m log 10 ðJ 0 Þ ¼ 7:589 of GLEs. In addition, all of the g 1 -values for the ESP components are well above the g 1 vs. J 0 line fitted for the GLEs. These properties imply high fluences for the lowest rigidities, which rapidly decrease with increasing rigidity, implying a soft spectrum. This is expected, as the ESP events are caused by particle acceleration or trapping in an interplanetary shock passing the observer. Still, the sample of six ESP events does not provide enough statistical significance for a proper analysis, and therefore we will simply model them similarly as the GLE events.

GLE episodes
In addition to the occurrence rate of the GLE episodes, we need to model the number of individual GLEs in each episode. The blue bars in Figure 11 show the proportion of episodes with the corresponding number of GLEs. Statistical errors are shown with the orange error bars. We compare this distribution with three common discrete probability distributions: (shifted) geometric, logarithmic and zero-truncated Poisson, shown in black, red and green, respectively. The following equations relate the expected number of events per episode, E[X] = 1.367, to the parameter, p i , of the distributions: E½X ¼ p p e p p e p p À 1 ; where p g , p l and p p are the parameters for geometric, logarithmic and zero-truncated Poisson distributions, respectively. All of these distributions describe the data at least reasonably, but we select the logarithmic distribution with p = 0.449 for our model since it resembles the observed numbers most closely.

Simulation procedure
The procedure for simulation of mission-integrated fluences is presented as follows: -Draw the number of GLE episodes, N ep , occurring during the mission from either Pðl  7 Comparison of models Figure 12(a) shows mission integrated differential fluence spectra for 50% and 95% confidence levels, calculated with one-component versions of both models, i.e., using a single Poisson parameter for GLE episode occurrence as presented in Section 4.1. Mission duration was one year. Model 1 is shown in blue and model 2 in red, and the confidence levels are shown with plus signs (50%) and crosses (95%). Here, and in the following comparisons, we repeat the simulations for N = 1 ⋅ 10 5 times and sort the simulated fluences by ascending fluence values in each energy channel. The [(CL/100) ⋅ N]-th value for each energy is presented as the spectrum at confidence level CL%. The 50% confidence spectra in Figure 12(a) are of similar shape, but show a difference of factor of about 2-3. The 95% confidence spectra have slightly different shapes with model 1 showing a slight "bump" at ∼50 MeVand decreasing more quickly at higher energies. Also of note is the fact that for 50% confidence, model 2 has higher fluences over the whole energy range, whereas for 95% confidence model 1 has higher fluences except for the energies above ∼300 MeV.
With similar formatting, Figure 12(b) shows mission integrated differential spectra, calculated with the twocomponent versions of the models, i.e., using different Poisson parameters for active solar period and quiet solar period. The simulated mission had a duration of six years during solar active time and three years during solar quiet time. Here, model 1 has higher fluences for energies below about 140 MeV for both confidence levels. For the highest energies, model 2 is slightly higher for 50% confidence and model 1 for 95% confidence. The bump of model 1 at 95% confidence at about 50 MeV is more pronounced than in Figure 12(a), making the shape of the spectrum a bit unusual. Figure 13(a) shows the probability of exceeding a mission integrated fluence for 2-year and 6-year missions at 200-300 MeV, calculated with the one-component versions of both models. The probability curves do not start at P = 1.0 because there is a finite probability for a mission to have no GLEs occurring during its duration, thus also having zero fluence. For a 2-year mission this chance is 0.135 and for a 6-year mission 0.002. The probability curves for the two models clearly have different shapes; for both the lowest and the highest fluences, model 1 gives a lower probability of exceeding. Only in the middle part, from about 3 ⋅ 10 4 cm À2 sr À1 MeV À1 to 1 ⋅ 10 5 cm À2 sr À1 MeV À1 for a 2-year mission and from about 6 ⋅ 10 4 cm À2 sr À1 MeV À1 to 1.5 ⋅ 10 5 cm À2 sr À1 MeV À1 for a 6-year mission, model 1 gives a higher probability. For the very highest fluences, the difference is quite large: the probability of exceeding a fluence of 3 ⋅ 10 5 cm À2 sr À1 MeV À1 is larger for a 2-year mission simulated with model 2 than for a 6-year mission simulated with model 1.
The probabilities of exceeding worst case fluences presented in Figure 13(b) show very similar behaviour as the cumulative fluences in Figure 13(a). Here, model 1 gives higher probability of exceeding for fluences between about 10 3 cm À2 sr À1 MeV À1 and 10 4 cm À2 sr À1 MeV À1 for both 2 and 6 year missions. Again, for the very high fluences, the probability of exceeding is higher for a 2-year mission simulated with model 2 than for a 6-year mission simulated with model 1. Figures 14(a) and (b) present comparisons of our two models with the SEPEM model and the ESA's Space Environment Information System (SPENVIS) 9 implementation of ESP and JPL models. The figures show mission integrated fluence spectra at 95% confidence for a 2 year mission (a) and a 7 year mission, (b). The SEPEM model agrees best with our models, the spectrum lying between our models for energies below 70 MeV, and falling below both of our models above 70 MeV, for both mission durations. The JPL model shows a slightly harder spectrum, which agrees with our models for low energies but is higher by a factor of about 3-5 for energies above 100 MeV. The ESP model shows the worst agreement, being clearly above both of our models for all energies.
Finally, in Figure 15 we compare the total fluence averaged over the years 1956-2012 to a "mission" simulated with our second model. The black curve, which shows the total fluence from GLEs, calculated with the Band fits and averaged over the time interval, is similar to the blue curve in Figure 2 of Kovaltsov et al. (2014), except for a change of units to directional fluences. The orange curve shows the exponent-over-rigidity solar proton fluence spectrum averaged over 1954(Reedy 2012. Using the normalized solar cycle shown in Figure 4, we modeled the 1956-2012 time interval as 37 years of solar active time and 20 years of solar quiet time. The blue and red curves show the results, simulated with the second model and averaged over time, for 30% and 90% confidence levels, respectively. The 90% confidence spectrum is very close to the total GLE fluence at low energies, overestimating the fluence by an increasing amount above 100 MeV. The 30% confidence spectrum underestimates the GLE fluence at low energies, but gives a better estimation at energies above 100 MeV.

Summary and discussion
We have presented two new statistical models of high energy solar proton fluences. Both models are based on GLE fluence spectra observed with both satellites and groundbased neutron monitors during the solar cycles 19-24, but utilize different modeling methodologies. In the first methodology, we calculate mission-integrated integral fluences for certain energies from Band-function fits and model their distributions with exponentially cut-off power law functions. In the second methodology, we model the spectral parameters as two independent and two dependent, normally distributed variables, thus modeling the spectral shape itself. The results from the two models agree well except for short missions at low confidence levels. There is also reasonable agreement between our models and three commonly used  solar proton models (JPL, ESP and SEPEM), despite the large differences in both the data and the methodologies used in the models. It is interesting to note the agreement at the lowest energies (below few tens of MeV), since the data used to construct the other models also includes non-GLE events, which may contribute with very large fluences at low energies. The agreement suggests that during the solar cycles 19-24, on average, GLEs have been responsible for a major portion of fluence even at low energies. On shorter timescales this is not necessarily the case; for example, during the first eight years of cycle 24 the total proton fluence at >10 MeV from sub-GLEs, calculated with the Band-fits (parameters given in Vainio et al., 2017), was about 7.9 ⋅ 10 8 cm À2 sr À1 , whereas the only GLE of the period produced a fluence of about 6.6 ⋅ 10 6 cm À2 sr À1 , i.e., two orders of magnitude smaller.
Our models benefit from the very long data aqcuisition period (over 5 complete solar cycles) as well as the extremely wide energy range (from 10 MeV to ∼10 GeV). One limiting factor in the accuracy of our results is the small number of GLE episodes, n GLE = 49, considered in our data. This number means that every result for probabilities smaller than 1/49 ≈ 2.0% comes purely from extrapolation. The statistics for high-fluence events could potentially be increased by studying historical cosmogenic isotope data (e.g., Usoskin et al., 2006;Miyake et al., 2012;Usoskin and Kovaltsov, 2012), but as was pointed out by Kovaltsov et al. (2014), this method depends strongly on the assumed spectral shape of events for integral fluences at energies lower or higher than >200 MeV. The occurrence rate of events used in our data is also very low (close to one event per year), and therefore modeling of missions with durations shorter than one year does not produce meaningful results.
Another factor that should be noted is the quality of satellite data used in the Band fitting. Recently, Sandberg et al. (2014) developed a method for cross-calibrating the medium energy GOES/EPS channels with corresponding channels of the science-level IMP-8/GME instrument. The effect of this cross-correlation was validated by Rodriguez et al. (2017) using STEREO observations (Mewaldt et al., 2008;von Rosenvinge et al., 2008). Sandberg et al. (2014) also noted some problems with the low energy IMP-8 data, attributed to the gradual failing of the LED instrument between 1984 and 1990. In addition, Smart and Shea (1999) described an intercalibration method for correcting side-penetration related discrepancies in the high energy spectrum measured by the GOES/HEPAD instrument; more recent work on GOES/ HEPAD channels is being performed by Rodriguez (personal communication, 2016). These corrections for instrumental effects have not been performed on the data products which we used for the Band fitting.
Acknowledgements. The research described in this paper was partly supported by ESA Contracts 4000106133/12/NL/AF and 4000113187/15/D/MRP. We gratefully acknowledge the use of Ha and X-ray data made available at the NOAA STP  , exponent-over-rigidity solar proton fluence spectrum averaged over 1954(Reedy, 2012, compared with spectra at 30% confidence (in blue) and 90% confidence (in red) simulated with model 2 for a mission of 37 years during solar active time and 20 years during solar quiet time. online service. We also acknowledge the important work of the PIs of the Neutron Monitors contributing their data to the world data centres. O.R. wishes to thank the Vilho, Yrjö and Kalle Väisälä foundation for financial support. We would also like to thank the two anonymous referees, whose comments helped to greatly improve the manuscript.
The editor thanks Ilya Usoskin and an anonymous referee for their assistance in evaluating this paper.