The virtual enhancements solar proton event radiation ( VESPER ) model

A new probabilistic model introducing a novel paradigm for the modelling of the solar proton environment at 1 AU is presented. The virtual enhancements solar proton event radiation model (VESPER) uses the European space agency's solar energetic particle environment modelling (SEPEM) Reference Dataset and produces virtual time-series of proton differential fluxes. In this regard it fundamentally diverges from the approach of existing SPE models that are based on probabilistic descriptions of SPE macroscopic characteristics such as peak flux and cumulative fluence. It is shown that VESPER reproduces well the dataset characteristics it uses, and further comparisons with existing models are made with respect to their results. The production of time-series as the main output of the model opens a straightforward way for the calculation of solar proton radiation effects in terms of time-series and the pairing with effects caused by trapped radiation and galactic cosmic rays.


Introduction
Solar particle events (SPE) are of particular interest for spacecraft design and radiation estimations.Protons with energies higher than ∼1 MeV have the ability to cause typical semiconductor (Si, GaAs, etc.) devices to malfunction through the effects of total ionizing dose (TID) such as errors caused by Single Event Upsets (SEUs) (Dodd & Massengill, 2003) which cause the excitation of electrons triggering false signals.Furthermore lasting damage can be caused by protons, and heavier nuclei, which result in atomic displacements in the crystalline lattices.Such defects can seriously hamper device function or even cause failure.Underestimation of such risks can degrade the performance of systems aboard spacecrafts and cause important loss of mission lifetime.Such considerations become even more important for manned missions, especially outside the protective shielding of Earth's magnetic field especially with regard to manned missions to the Moon or Mars (Cucinotta et al., 2010).On the other hand, overprotection against these risks in the form of excessive and unneeded shielding, results in increased cost, design complexity and severely reduced payloads.
In order to have reliable and cost-effective spacecraft design and implement new space technologies, accurate and reliable models for estimating the various radiation risks (Crosby et al., 2008) have been developed over the last years.Specifically for SPEs, evidence has been shown by Xapsos et al. (2006) that they are a Self-Organised Critical (SOC) phenomenon.This suggests that it may not be possible to predict the size/intensity or the occurrence of SPEs over long periods.It underscores that along with important work in the theoretical understanding (Vainio et al., 2009), SPE modelling efforts will continue to use techniques from the field of probability theory.Statistical, probabilistic models provide predictions of the cumulative and maximum fluence or peak flux values to take place over the course of a mission as a function of confidence level over a range of energies seen at the Earth.The selection of the physical variable of the SPE to be modelled (fluence/peak flux), the data that are being used, and the selection of suitable mathematical functions that provide satisfactory matching with the measured proton flux/fluence distributions have resulted in the development of various probabilistic solar proton models, King (Stassinopoulos & King, 1974), JPL (Feynman et al., 1993), ESP (Xapsos et al., 1998(Xapsos et al., , 1999)), Nymmik (Nymmik, 1999).These widely used models are usually based on a Poisson description regarding the event frequency distribution and they usually assume lognormal distributions and/or power function distributions for the modelled variable(s).The lognormal distribution describes the large events well but underestimates the probability of smaller events.On the other hand, the power-law functions describe the smaller events well but overestimate the probability of larger events.Jiggens & Gabriel (2009) have questioned that the occurrence of SPEs is really random.The authors chose alternative descriptions (based on Lèvy or on time-dependent Poisson distributions) and showed that the resulting probability models do provide better fits for the available event lists.The success on using non-Poisson descriptions suggests that there is an inherent memory in the system and the use of Poissonian description in SPE models may not be the optimum selection.Later on, Jiggens et al. (2012) developed a new modelling methodology À the "Virtual Timelines" method (VTM).The novel idea behind this method was to allow for the incorporation of the duration of SPEs which are non-point-like in nature but this cannot be explicitly considered when sampling the frequency distribution.Using VTM, it was shown that both waiting times and event duration distributions are fitted better when using non-Poisson distributions.In this work we present a probabilistic approach which is based on the modelling of solar proton flux time-series.The virtual enhancements À solar proton event radiation (VESPER) model utilizes the European space agency's solar energetic particle environment modelling (SEPEM) (Crosby et al., 2015) Reference Dataset v2.0 (RDSv2) (Heynderickx et al., 2017) and employs a datadriven approach to produce Virtual Time-Series (VTS) of differential proton fluxes.In turn, the statistical analysis of the VESPER outputs, the time-series, provides distribution functions used in the probabilistic estimation of quantities such as the Cumulative Fluence and Peak Flux over the duration of a space mission.The fact that the model produces full timeseries to describe solar proton flux enhancements presents unique advantages, since such an output can be used to produce radiation effects time-series (e.g.ionizing or nonionizing dose) as well as to be combined with similar outputs of radiation belt models, such as the AE9/AP9 (Ginet et al., 2013) models, that also produce time-series outputs of either fluxes/fluences or radiation effects.Furthermore, by applying a Magnetospheric Shielding model one can derive proton cut-off rigidity values for a user-defined orbit that crosses earth's magnetosphere, and map the output VTS along this orbit.This can provide a full description of the SPE environment in terms of proton flux time-series.Thus, we are confident that the VESPER model will prove valuable to the scientists and engineers that design future missions.In what follows, we present the basic premises of the VESPER model, the way it is constructed and we evaluate its output and compare it with those of the JPL, ESP and VTM models.It is shown that characteristic statistical properties of VESPER outputs (time-series) are consistent with those of the RDSv2 database it uses and that there is good agreement with the aforementioned models, especially with the VTM, despite the fundamental differences in their approach.

SEPEM reference dataset
We have used the SEPEM RDS v2.0 dataset (RDSv2) comprised of proton differential flux time-series.The dataset spans more than 40 years, 1974-07-01-2015-12-31, and contains processed data from the NOAA energetic particles sensor (EPS), part of the space environment monitor (SEM) package on-board GOES and earlier SMS satellites.These data have been cross-calibrated to find the effective energy using data from the goddard medium energy (GME) instrument onboard the IMP-8 spacecraft.A description of the crosscalibration is available in Sandberg et al. (2014).Finally, the spectra for each time stamp were re-binned into a set of 11 logarithmically spaced energy bins.In this work we have used the first 10 energy bins shown in Table 1.The re-binned data are continuous with 5 min time intervals without time gaps or missing values with ∼4 Â 10 5 measurements in total.It is worth mentioning that the RDSv2 integral fluxes have been recently validated using STEREO-A/HET and STEREO-B/ HET as well as IMP-8/GME integral proton flux data by Rodriguez et al. (2017).
For the purposes of the model construction, we have introduced the new variable LF 2 which can be defined as the 2nd moment of the flux spectrum along the energy axis in logarithmic space.
Here, f(E) [cm À2 MeV À1 s À1 sr À1 ] denotes the differential proton flux intensity and E [MeV] the proton energy.The minimum and maximum energies used here are those of the first and tenth channel of RDSv2 (see Tab. 1).The calculation of the LF 2 series for the entirety of the dataset results in the creation of a single time-series that characterizes RDSv2.The LF 2 variable is defined to serve the goals of the model while still being firmly based solely on the flux time-series.As it will be shown, it plays a key role in the construction of VESPER, and it is also used for the event definition process. 5. 00-7.23 7.23-10.46 10.46-15.12 15.12-21.87 21.87-31.62 31.62-45.73 45.73-66.13 66.13-95.64 95.64-138.3

SPE definition
In this study, we define as Solar Particle Event the enhancements of the LF 2 values above their time-varying background level that last for at least 12 h and peaked at a value at least two times larger than the temporary background level.In order to apply this SPE definition, we have employed a novel scheme to determine the time varying background level which is governed by the galactic cosmic ray (GCR) modulation.A "clump" is defined as the group of at least 3 consecutive measurements above a test threshold.For a given time-window, the background is defined as the threshold level for which the number of "clumps" is maximized.This scheme produces a time varying threshold which follows the GCR modulation and allows an accurate identification of the flux enhancements.Since this extraction of the SPEs is applied on the LF 2 time-series, the proton flux enhancements across all energies are taken into account.
In Figure 1, we present the LF 2 time-series, the calculated background level and the extracted SPE flux enhancements.Using this event definition, we define 241 SPEs and we calculate their respective: duration times; -SLF 2 ¼∫ t end t start LF 2 ðtÞdt (sum of the LF 2 values over the event duration); wait-times.
We note that we use the convention used by SEPEM where the duration of an event is the same for all energy channels.This way all flux time-series have common start and end time points.The Wait-Times are defined as the time intervals between the end of an Event and the start of the next one.
Figure 2 shows the differential proton flux series at the ten different energies and the calculated LF 2 for two SPEs.Here, the way the LF 2 variable takes into account the flux enhancements across all energy ranges is clear.For Event B, the flux of the 1st energy channel at the beginning of day 2 (Fig. 2b À red arrow) is two orders of magnitude higher than the flux of Event A at its onset (Fig. 2a À red arrow).However, whereas at the onset of Event A contributions from all energy channels are strong, for Event B the higher energy channels barely rise above background level.This results in the calculated LF 2 values at these time points being comparable precisely due to the spectral profiles of the flux time-series.

Structure of the VESPER model
The VESPER model implements a probabilistic approach to model the occurrence and characteristics of SPEs with the goal of creating virtual time-series of differential proton fluxes.This is achieved by creating Virtual Events (VEs) from measured Events extracted from the database, which we dub "seeds".The profiles/time-series of these measured Events are rescaled in time and flux which essentially amounts to "stretching" or "compressing" time-series in their two dimensions, time and intensity, with the goal of creating VEs with new profiles which are however based on existing ones.The rescaling of the SPE flux series is performed in the time and intensity scales in such a manner that it preserves the statistical properties of the SPEs (macroscopic) variables (such as the peak flux and fluence distributions) as well as the inherent spectral properties of the flux series during SPEs.The timeline of the SPE VTS depends on the fitting of the Duration and the Wait-Time cumulative distribution functions (CDFs) while the creation of the virtual SPEs is based on the observed dependence between the SLF 2 and the Duration of the extracted events (see Fig. 4a).The process by which SPE VTS are constructed consists of three main steps: random sampling of the analytical fits on the CDFs to produce Virtual Wait-Times and Duration values; calculation of the Virtual SLF 2 using the SLF 2 -Duration linear dependence in the log-log space and "seed" event selection; scaling of the seed Event to the selected Duration and SLF 2 values to create a VE.
Each step is explained and described in detail in the following paragraphs.

Sampling of duration and wait-time cumulative distribution functions
The fitting and the random sampling of the Wait-Times and Duration CDFs of the extracted SPEs determines the occurrence of the VEs within the examined timespan of a mission duration, e.g. 10 years.The occurrence frequency of SPEs is strongly dependent with solar cycle phases, i.e. active or quiet, thus for the modelling of Wait-Times between SPEs we have employed separate descriptions for each solar phase and derived two distinct distribution functions.However, since no correlation has been demonstrated between Event duration or intensity and solar phases the SLF 2 and Duration values are taken from the list of defined SPEs from all years of the RDSv2 regardless of whether the period was a solar active or quiet one.The solar cycles were split into 7 active/maximum and the remaining (∼4) quiet/ minimum years, in-line with the work of Feynman et al. (1990).For the solar active phases, there were 2.5 years before the peak times and 4.5 years after.The peak times for cycles 20-22 are taken from Xapsos et al. (1998Xapsos et al. ( ) as 1968Xapsos et al. ( .9, 1979Xapsos et al. ( .9 and 1989.9.9, for cycle 23 it was taken as 2000.6 in order to ensure the inclusion of January 2005 in the solar maximum period.For the last cycle the situation cannot be absolutely clear until the active period is complete, however 2009 should still be solar minimum and it's likely that 2011-2017 will be solar maximum.In Figure 3, the experimental and the fitted CDFs are presented.For all variables, we have employed exponential cut-off power-law functions which provide a good fit/approximation of the measured data.At 95% confidence bounds, for the Durations fit the RMSE value is equal to 0.081, and for the Wait-Times in solar active and solar quiet phases equal to 0.128 and 0.142, respectively.It is evident, that for a given mission length, e.g. 7 years, the sum of the sampled Durations of the Virtual Events and their respective Wait-Times must be equal to that mission length.As a consequence, the sampled Wait-Times and Durations determine the total number of VEs that will occur within a given mission duration.Wait-Times have quite higher values than Durations and therefore play a dominant role since their underestimation or overestimation can heavily affect the output of the model, especially the resulting Fluence, for a given mission duration.

SLF 2 calculation, Duration adjustment and seed Event selection
After the random selection of the Virtual Duration value from the fitted CDF, a Virtual SLF 2 value is calculated and a seed event is selected.This triplet (Duration, SLF 2 and seed event) is later on used in the construction of the VE.The SLF 2 variable has a strong correlation with the SPE Durations which is much higher than those between Durations and SPE Fluences of any energy bin.In this step we employ the singular free parameter in our model, dubbed tolerance, for which we have used the value 0.1.Tolerance (t) is used to calculate two factors (1 þ t, 1Àt), essentially percentages, applied to the selected log(Duration) to calculate an upper and a lower bound for the subsequent calculation of SLF 2 and the selection of a seed event.These bounds in turn define the vicinity of a virtual event and dictate how wide or narrow it will be as seen in Figure 4a.For a randomly sampled Virtual Duration (VD) and tolerance value (t) we determine three points, defined by (cf.Fig. 4a): -The lower bound pair: LB x = (1 À t) * ln(VD), LB y .
-The upper bound pair: The lines perpendicular to the linear fit on a log-log scale that cross these three points create the bounds that define the upper and lower parts of the vicinity of the VE as seen in Figure 4a.The distances of all the Events in the vicinity from the fitting line are initially calculated, and their standard  deviation is used as the s (sigma) parameter in a Gaussian distribution sampling.The Gaussian curve is centred at the [ln (VD), MV y ] point and is also perpendicular to the log-linear fit.A value is randomly sampled and used as the distance of the VE from the fit.This way the SLF 2 and the adjusted Duration values for the VE are calculated allowing for the scatter seen in the linear regression on a log-log scale.The resulting pairs for the VEs and their log-log linear fit adequately reproduce the scattering around the log-log linear fit of the actual Events (cf.Fig. 4b).Lastly, a seed Event is randomly chosen from a pair of Events which have been randomly selected one from the upper and one from the lower vicinity of the VE.

Flux scaling and construction of Virtual Event time-series
For each VE to be created the flux time-series of its seed Event are initially interpolated to the selected Virtual Duration creating an "intermediate" time-scaled event.We underline here that since the seed events are chosen from the vicinity of the virtual Duration no extreme under-or over-sampling occurs that could heavily alter the profiles of the time series.Next, the (interpolated) flux time-series are re-scaled to match the virtual SLF 2 value.The re-scaling scheme essentially alters the intensity of each proton spectrum within the event while maintaining its spectral profile.This way the resulting VEs consist of spectra with energy slopes very close to measured ones avoiding the creation of unnaturally abrupt or hard/soft spectra.This process consists of the steps described below.
-The SLF 2 value of the (intermediate) time-scaled event is calculated.The division with the Virtual SLF 2 value selected from the Gaussian sampling, produces a scaling factor, sf, defined by: -The proton flux spectra are assumed to be of the general form by f (E) = aG (E) where a is the intensity of the spectrum and G(E) the intrinsic energy distribution described by some function, e.g. for power-law G (E) = E b .Therefore in the logarithmic space this is expressed as: Proton spectra can be adequately fitted with exponential or power-law functions, however such specific assumptions are not made here, only the very general form is assumed.
-Using the general assumption for the spectrum and equation (1), the LF 2 integral can be analytically expressed as: where C is a constant.-The re-scaled LF 0 2 ðtÞtime-series is produced by multiplying the time-scaled event's LF 2 (t) values with the scaling factor sf, which can be also expressed as -Using equations ( 4) and ( 5) new intensity ln(a 0 ) values for all spectra are directly derived from the LF 0 2 values.This produces the new intensities of the spectra.
-Equation ( 6) provides the factor to directly alter the intensity of the spectrum.Using equation ( 4) the rescaled fluxes are calculated by adding the factor in log-space to the existing spectrum.
As can be seen in equation ( 7), only the scaling factor and the LF 2 value of the spectrum are actually required for the scaling while the use of a specific analytic function that fits the spectrum is not.This property stems from the fact that in the rescaling scheme essentially only the intensity is altered and neither a nor G(E) need be determined explicitly by fitting the spectrum.Using the process described VEs are constructed having been scaled both in time and flux in a consistent way.An example of such a rescaled event is shown in Figure 5.
The output of the process is a set of Virtual Events described by their flux time-series and their intermediate Wait-Times.For a given mission duration a large number of iterations or "scenarios" are run and from each iteration important quantities of the encountered SEP environment À such as the Cumulative Mission Fluence, the Worst-Case SPE Fluence and Peak Flux À are calculated.For a sufficient number of iterations, the values of these quantities are used to construct the resulting probability and cumulative density functions and derive further probabilistic outputs.

Model outputs and dataset reproducibility
In what follows, we present outputs of the VESPER model for mission durations of 1 and 7 years during solar active period.From every iteration the sum of all virtual fluxes results in the mission Cumulative Fluence and the maximum value is the Highest Peak Flux.We run 1 50 000 iterations for 1 year mission and 50 000 for 7 year mission to produce the cumulative distributions shown below in Figure 6.
An interesting feature appears in the way particular statistical quantities scale from 1-to 7 year mission.In the Cumulative Fluence CDFs for proton energies of 6 MeV, significant differences À of the order of 20 À appear at the high probability of exceeding = 0.9, while the differences become much smaller À of the order of 3 À at the low probability of exceeding = 0.01.This is reasonable since at low probabilities one would expect few very energetic events to largely determine the cumulative fluence for a mission and this should not differ dramatically whether it is a 1 year or a 7 year period.Conversely, at low probabilities the cumulative fluence is determined largely by many more frequent and less energetic events which will result in very different total values between the two cases of mission duration.This is also reflected in the Peak Flux CDFs, however the differences are less dramatic for high probabilities as only single flux measurements are examined and the effect of the summation of multiple events does not come into play.Also, an interesting feature is that the VESPER outputs result in curves that are not always smooth but show varying curvatures and "knees" as in the case of Figures 6a,  6c.This is due to the inherent statistic behavior of the RDSv2.It is therefore a feature of the data that is fed into the model and we consider it a strength of the model since it seems to reproduce the statistical behavior of the measurements in the Event list.
We also perform a comparative statistical analysis on the RDSv2 database and the output of the VESPER model.This analysis serves to verify the self-consistency of the model in the sense that it reproduces basic statistical characteristics of the database it is based upon.Flux intensity and total cumulative fluence over a mission duration are two major considerations for any probabilistic model and thus it is important that the VESPER model reproduces characteristics of its progenitor database.We use a moving threshold approach to evaluate the output in terms of total fluence and median flux.
All measurements of the flux time-series whose LF 2 values lie above a threshold are identified and summed to calculate the total fluence and their median flux.The process is repeated for multiple threshold values resulting in the graphs shown in Figure 7.The total fluence is normalized per year to compare 42 years of the RDSv2 to 10 000 years produced by the VESPER.As can be seen in Figure 7, the curves defined by the moving threshold method applied to the virtual time-series reasonably reproduce those derived by using RDSv2.

Comparisons with existing models
We compare the output of our model to three well tested probabilistic models that are widely used for the estimation of probability functions of quantities such as the Cumulative Fluence and the Peak Flux over a mission duration.We have used here the ESP, JPL and VTM models which are available in the on-line ESA SEPEM application server (http://dev.sepem.oma.be/).Figure 8 shows a comparison of the output of the three models with the resulting Cumulative Fluence and Peak Flux distributions from VESPER using 50 000 scenarios for a mission duration of 7 year in solar active period.
The comparison of the models' outputs demonstrates that VESPER's virtual time-series result in Fluence and Peak values whose cumulative distributions are of the same broader magnitude with those produced by existing models.VESPER is within an order of magnitude with the ESP model and shows good agreement with the VTM model while JPL results are typically higher as one would expect from this model.Especially for the VTM model we note that for lowprobabilities/high values the two models show a remarkable degree of agreement.For high probabilities VESPER produces typically somewhat lower values, however the differences are not dramatic remaining within a factor of 2.
In the VESPER model we have employed the "virtual timeline" approach introduced by the VTM model, where a duration and a wait-time distribution are sampled to determine the occurrence of Virtual Events.However, despite this common element, the two models differ fundamentally in their respective approaches and implementations.The most indicative difference is, of course, that VESPER models flux time-series of SPEs, while VTM models macroscopic variables such as Fluence and Peak Flux by fitting their distribution functions.Therefore, we consider the good agreement with VTM an important indication of quality of the VESPER model, especially since it has been designed to be solely data-driven and not reproduce the results of any other model.

Conclusions
In this work, a new paradigm for the modelling of the solar proton radiation environment is presented.In contrast to the existing SPE models, that are based on probabilistic descriptions of SPE characteristics (peak flux, cumulative fluence, etc.), the VESPER model produces as fundamental output virtual time-series of proton differential fluxes that can be used afterwards for the derivation of (secondary) products depending on user defined selected energies and flux intensity thresholds.
It is shown that LF 2 demonstrates a high correlation with Event Duration, in its summed SLF 2 form, while it also offers a way of scaling individual proton spectra in a consistent way while maintaining their profiles.We show that the model outputs demonstrate a good degree of agreement with the Event list produced using the SEPEM Reference Dataset v2.0.Furthermore the model demonstrates a quite good agreement with the VTM model regarding the cumulative probability functions produced for Cumulative Fluence and Peak Flux.
The production of virtual flux time-series environment offers new possibilities in SPE modelling and opens new avenues for the creation of a radiation effect system resulting from the combination of different radiation effect times-series.VES-PER's outputs can be readily combined with tools that calculate radiation effects (Seltzer, 1980;Lei et al., 2002) to produce time-series of the ionizing or non-ionizing dose, internal charging and other such effects.Finally, combining these results with the corresponding outputs of radiation belt models, such as AE9 (Ginet et al., 2013), and/or a GCR model (ISO-15390, 2004;O'Neill, 2006;Matthiä et al., 2013), we will be able to provide a complete estimate of the radiation effects on a specified orbit, covering all possible sources of radiation and thus vitally contributing to the planning of future missions.Additionally, we plan to integrate magnetospheric shielding effects in VESPER in a time-series manner.This will allow further in-detail modelling of SPE fluxes in magnetospheric regions where magnetospheric shielding effects should not be ignored.This can be used for modelling SPE fluxes and their effects in orbits were these considerations are vital such as low-earth orbits (LEO) and highly elliptical orbits (HEO).Finally, in the future, we will further our studies and possibly update the VESPER model.Among other things we plan to run the model using the RDS database with the background subtracted for each channel and compare its outputs with the current version.In addition, we will investigate in further detail the impact of the tolerance factor and the establishing of the upper and lower limits on the regression.

Fig. 1 .
Fig. 1.The LF 2 time-series calculated for the entire SEPEM reference dataset v2.0.The background is denoted by the green line and the flux enhancements denoted by red.

Fig. 3 .
Fig. 3. CDFs for (a) the Duration and (b) the Wait-Time variables and their fits with exponential cut-off power-laws.

Fig. 4 .
Fig. 4. Cross plots of Duration versus SLF 2 .The red points correspond to the calculated values based on the analysis of RDSv2.(a) The dashed lines denote the vicinity of the VE which is used to calculate an SLF 2 value, adjust the Duration value and select a seed for the Virtual Event.(b) Black dots correspond to calculated values for the VEs.

Fig. 6 .
Fig. 6.(a) and (b) CDFs of cumulative differential fluence produced from the VESPER model for 1 and 7 year solar active periods, respectively.Each curve presents one of the channels in the 5-200 MeV proton energy range, (c) and (d) same for peak differential flux.

Fig. 7 .
Fig. 7. Comparison of (a) total fluence and (b) median flux values for RDSv2 and VESPER virtual data derived using a moving threshold on the respective LF 2 time-series.Crosses denote RDSv2 and continuous lines virtual data.

Fig. 8 .
Fig. 8.Comparison of VESPER CDFs with ESP, JPL and VTM distributions for a 7 year mission in a solar active period.(a), (b) and (c) Cumulative fluence at 6 MeV, 38 MeV and 166 MeV, respectively.(d), (e) and (f) same for Peak flux.

Table 1 .
The first 10 energy bins and log-mean energies of the RDSv2 proton flux dataset.