What can the annual 10Be solar activity reconstructions tell us about historic space weather?

– Context : Cosmogenic isotopes provide useful estimates of past solar magnetic activity, constraining past space climate with reasonable uncertainty. Much less is known about past space weather conditions. Recent advances in the analysis of 10 Be by McCracken & Beer (2015, Sol Phys 290: 305 – 3069) (MB15) suggest that annually resolved 10 Be can be signi ﬁ cantly affected by solar energetic particle (SEP) ﬂ uxes. This poses a problem, and presents an opportunity, as the accurate quanti ﬁ cation of past solar magnetic activity requires the SEP effects to be determined and isolated, whilst doing so might provide a valuable record of past SEP ﬂ uxes. Aims : We compare the MB15 reconstruction of the heliospheric magnetic ﬁ eld (HMF), with two independent estimates of the HMF derived from sunspot records and geomagnetic variability. We aim to quantify the differences between the HMF reconstructions, and speculate on the origin of these differences. We test whether the differences between the reconstructions appear to depend on known signi ﬁ cant space weather events. Methods : We analyse the distributions of the differences between the HMF reconstructions. We consider how the differences vary as a function of solar cycle phase, and, using a Kolmogorov-Smirnov test, we compare the distributions under the two conditions of whether or not large space weather events were known to have occurred. Results : We ﬁ nd that the MB15 reconstructions are generally marginally smaller in magnitude than the sunspot and geomagnetic HMF reconstructions. This bias varies as a function of solar cycle phase, and is largest in the declining phase of the solar cycle. We ﬁ nd that MB15's excision of the years with very large ground level enhancement (GLE) improves the agreement of the 10 Be HMF estimate with the sunspot and geomagnetic reconstructions. We ﬁ nd no statistical evidence that GLEs, in general, affect the MB15 reconstruction, but this analysis is limited by having too few samples. We do ﬁ nd evidence that the MB15 reconstructions appear statistically different in years with great geomagnetic storms.


Introduction
Records of solar magnetic activity are critical for further developing our understanding of the Sun's magnetic field, as well as of the resulting space weather and space climate phenomena driven by solar magnetic activity. Past solar magnetic activity can be estimated from a range of different proxies, with varying reliability, temporal resolution, and extent into the past. Three well established proxies are found in sunspot observations, geomagnetic activity records, and the concentrations of cosmogenic radionuclides stored in natural archives (Owens et al., 2016a,b). These estimates of historical solar variability are invaluable for placing the comprehensive space-age solar observations in a proper context (Barnard et al., 2017), and for investigating plausible scenarios of future space climate change (Abreu et al., 2008;Steinhilber et al., 2012;Matthes et al., 2016).
Both the sunspot observations and geomagnetic activity records rely on a network of human observers and so can provide information on the Sun as far back as reliable measurements have been recorded and preserved (Lockwood, 2013;Usoskin, 2013). In practice these sunspot and geomagnetic activity records provide information on solar variability extending back several centuries, albeit with increasing uncertainty into the past, owing to changes in instrumentation, and observing procedures, and problems of intercalibrating the results of different observers.
A different kind of proxy is available in the form of the concentrations of cosmogenic radionuclides stored in natural archives, such as 14 C fixed in tree trunks, and 10 Be stored in polar ice sheets. These records do not rely on the past efforts of an observer network, but instead require present day retrievals of environmental samples which require significant levels of expertise and expense to process and interpret. For example, converting the deposited cosomogenic iosotope concentrations into the cosmogenic isotope production rate throughout Earth's atmosphere requires a detailed understanding of the transport and deposition processes of that cosmogenic isotope through the Earth system (McCracken, 2004;Heikkilä et al., 2007). Additionally, the inversion procedure needed to understand the implications for solar activity requires a model of galactic cosmic ray (GCR) transport through the heliosphere, and estimates of the local interstellar spectrum of GCRs. However, using suitable inversion techniques with cosmogenic radionuclides it is possible to infer solar variability over several millenia. Although, due to the timescales of transport and deposition of cosmogenic radionuclides through the Earth system, this extended view into the past is at lower temporal resolution and higher uncertainty than is achieved with the sunspot and geomagnetic reconstructions (Lockwood, 2013;Usoskin, 2013). Recently annual records of solar activity extending back several centuries have been developed from cosmogenic isotope records, for example, from 10 Be by McCracken & Beer (2015).
Given the complexity and independence of these different proxy records, they each illustrate a remarkably consistent picture of long-term space climate variability over the past few centuries (Owens et al., 2016a,b). However, there is much more uncertainty about the historical space weather environment. Geomagnetic observations provide a good quantitative record of geomagnetic storms back to 1868 (Kilpua et al., 2015;Lefèvre et al., 2016;Vennerstrom et al., 2016) and increasingly sparse and qualitative information on geomagnetic storms back to approximately 1845 (Lockwood, 2013).
Much less is known about the historic occurrence of solar energetic particle (SEP) events and ground level enhancements (GLEs), with the first observation of GLEs in ionisation chambers in 1942 (Forbush, 1946). It was previously thought that enhanced atmospheric nitrate production due to SEP fluxes may leave a measurable increase in polar ice nitrate concentrations, and that consequently impulsive enhancements in polar ice nitrates could serve as a proxy for historic SEP events (Dreschhoff & Zeller, 1990;Shea et al., 1999;McCracken et al., 2001;Kepko et al., 2009). Subsequent investigations questioned the reliability of this potential proxy record (Wolff et al., 2012;Duderstadt et al., 2014), and currently it is widely held that polar ice nitrate concentrations are not a valid proxy for SEP fluxes.
It has also been argued, both through observations and physical modelling, that extreme SEP events may measurably perturb the 14 C and 10 Be cosmogenic isotope records (Usoskin et al., 2006;Miyake et al., 2012Miyake et al., , 2013Usoskin et al., 2013;Herbst et al., 2015;McCracken & Beer, 2015;Dee et al., 2016). Therefore, cosmogenic isotopes may provide the only viable proxy record for historic SEP fluxes. Such a proxy would be of significant value, as the effective mitigation of space weather hazards requires the development of relevant extreme event scenarios, and it may be a poor approximation to extrapolate our experience of current space weather into the future, when solar activity could be significantly different. In addition, the detection, accurate quantification, and removal of SEP-generated cosmogenic isotopes would generate a better record of past variations in GCR fluxes.
We aim to assess if there is evidence for an SEP induced bias in the annual 10 Be reconstruction of McCracken & Beer (2015) (hereafter MB15). MB15 provided two annual reconstructions of the average heliospheric magnetic field (HMF), based on the analysis of 10 Be. The first assumed no 10 Be production from SEP events, while the second suppressed impulsive enhancements in 10 Be thought to be due to GLEs and atmospheric nuclear bomb tests. The motivation for this is two fold; firstly, to develop a better understanding of uncertainties in the MB15 reconstruction and assess how effective the SEP excision from 10 Be record was; secondly, if a bias is present, can it be used to infer anything about historical SEP fluxes, and does it motivate the pursuit of a next generation of reconstructions with improved temporal resolution and/or sensitivity?

Cosmogenic isotope HMF reconstruction
McCracken & Beer (2015) produced annually resolved estimates of the near-Earth HMF magnitude from 1391-1983, derived from the analysis of 10 Be concentrations provided by the Dye-3 ice-core and North Greenland Ice Core project (NGRIP). Furthermore, MB15 presented evidence of impulsive enhancements in 10 Be concentrations following known GLE events (in 1942GLE events (in , 1949GLE events (in , and 1956, as well as after highaltitude nuclear bomb tests (in 1962). By analysing the magnitude of the impulsive 10 Be enhancements associated with the solar events, MB15 conceived an algorithm to identify other years in the 10 Be record which may have been affected by extreme space weather, over the period 1800-1983. We note that MB15 also analysed historic great geomagnetic storm (GGMS) occurrence, which provided additional circumstantial evidence that the years of impulsive 10 Be enhancements were associated with extreme space weather events. Consequently, MB15 considered it prudent to provide two estimates of the HMF; the first, B c1 , was based on the original 10 Be record, without further consideration of the impulsive enhancements assumed to be associated with extreme space weather; the second, B c2 , aimed to suppress the effect of the impulsive 10 Be enhancements by replacing the data in these years with interpolated estimates based on surrounding years. Here we use both the B c1 and B c2 records, over the period 1845-1983.

Geomagnetic HMF reconstruction
Over the last two decades, much progress has been made in estimating the near-Earth solar wind and HMF conditions from geomagnetic variability (Lockwood et al., 1999;Svalgaard & Cliver, 2005Lockwood, 2013;Lockwood et al., 2013a,b, Lockwood et al., 2014aHolappa et al., 2014a,b;Svalgaard, 2014;Mursula et al., 2016). Typically these techniques work by establishing a statistical relationship between the observed geomagnetic variability indices and near-Earth solar wind conditions. These relationships are calibrated over the space age, for which we generally have insitu spacecraft observations of the near-Earth solar wind. Then, to estimate historic solar wind conditions, the statistical relationships are extrapolated, applying them to geomagnetic observations from before when in-situ observations are available. For technical reasons, related mainly to factors such as the short-term variability in the HMF orientation and the variability of the spatial distribution of ionospheric conductivities, it has so far only been possible to estimate annual means of the HMF magnitude from geomagnetic indices (Lockwood, 2013;Lockwood et al., 2016b). Focusing on annual reconstructions of the HMF intensity, Owens et al. (2016a) recently reviewed the progress in this area, and published a consensus estimate of annual mean HMF computed from geomagnetic indices. The consensus was computed from a weighted mean of the estimates provided by Lockwood et al. (2013b) and Svalgaard (2014), and spans the period 1845-2013. We use the Owens et al. (2016a) consensus geomagnetic reconstruction, B g , in our comparison with the B c1 and B c2 records.

Sunspot HMF reconstruction
The annual mean near-Earth HMF magnitude can also be estimated from sunspot observations. One approach to this is similar in nature to the geomagnetic reconstructions, which is to establish a statistical relationship between the observed HMF and sunspot number. For example, it has been shown that an approximately linear relationship exists between the annual mean HMF magnitude and the square root of the annual mean sunspot number (Svalgaard & Cliver, 2005). Another approach is to model the open solar magnetic flux (OSF), the source of HMF, as a continuity equation (Vieira & Solanki, 2010;. For example, in the  model, the source of the OSF is parameterised in terms of the sunspot number, while the loss is parameterised in terms of the heliospheric current sheet tilt. With an estimate of the OSF, an empirical relationship between the OSF and HMF is used to compute the HMF. An additional consideration with these reconstructions is which sunspot record to use? Both the international and group sunspot numbers have been significantly revised recently, and multiple different methodologies and corrections have been proposed, with varying degrees of acceptance Cliver, 2016;Lockwood et al., 2016a). Owens et al. (2016a) also produced a consensus HMF reconstruction from sunspot observations, spanning the period 1750-2013. Owens et al. (2016a) used both the Svalgaard & Cliver (2005) and  methodologies with four records of the sunspot number (2 international and 2 group numbers) to compute the consensus HMF reconstruction, B r , which we use here. The aim of using both reconstruction methods with multiple sunspot records was to help compute the uncertainty in HMF estimate from both the sunspot records and HMF reconstruction methods.

Ground level enhancement catalogue
We use a catalogue of GLEs provided by the National Geophysical Data Center, available at https://goo.gl/ztktlZ. This list is compiled from both the world wide neutron monitor network, whose spatial coverage varies in time, and also a sparser network of neutron monitors and ionisation chambers from before the development of the world wide network. Here we use this record to determine years with and without GLEs, over the period . We note that the sparsity of neutron monitors both before and during the development of the world wide network means that there is a chance, particularly earlier in the catalogue, that GLEs were missed. Note also that there is a >500-fold dynamic range between the smallest (<10%) and largest (5000%) GLE in the catalogue, and we may expect that the contributions of the smaller (<200%) GLE will be proportionally smaller than those considered by MB15.

AA geomagnetic index
We use the aa geomagnetic index to analyse the occurrence of geomagnetic storms. We choose to use the aa-index as it is the longest continuously available index that is also sensitive to geomagnetic storms. The aa-index is a 3 h range index, which, at any given time, is computed from geomagnetic variability observed at two approximately antipodal observatories, one in the UK and one in Australia. However, both the UK and Australian observatories have moved twice over the duration of the aa-index, and so in fact the aa-index is a composite index formed from 6 pairings of cross-calibrated observatories. We obtained the 3-hourly aa-index data for the period 1868present from http://isgi.unistra.fr/indices_aa.php.

Fractional differences
To compare the different HMF reconstructions we compute the fractional differences of the form where B x refers to either the geomagnetic or sunspot reconstructions, B g or B r , and B cn refers to either B c1 or B c2 , the MB15 reconstructions that have not or have been modified to account for impulsive 10 Be enhancements.

Solar cycle phase and polarity cycle phase calculations
We will compare how the differences between the HMF reconstruction methods vary as a function of both the Schwabe solar cycle phase, and also the polarity cycle phase. By polarity cycle, we mean the cycle in the modulus of the dipole component of the solar magnetic moment. The solar cycle phase is computed according to where t is time in years, and t start and t stop are the start and end dates of the Schwabe cycle, taken from https://en.wikipedia.
org/wiki/List_of_solar_cycles. The polarity cycle phase is computed according to where t n, max and t nþ1, max are the times of the sunpot maxima in the Schwabe cycle, for the nth and the nth þ 1 cycles respectively. The À1 in the numerator accounts for the fact that, on average, the solar polar fields reverse approximately 1 year after sunspot maximum . Of course, for the more recent cycles we could use magnetogram data to estimate polarity cycle timings more accurately. But as this is not possible for the entire record, we choose to use this more approximate method, which we can apply to the whole data sequence uniformly.

Geomagnetic storm definition
To define geomagnetic storms in the aa-index, we follow a procedure similar to Kilpua et al. (2015). We define storms in the aa-index as periods where aa exceeds 100 nT. Storm start and end times are set to the times of the nearest 3-hourly value above the 100 nT threshold (approximately the 99th percentile of the full aa distribution). We also compute the peak aaindex value in each storm, and integrated intensity. Secondary storm maxima with no intermediate drop below the storm threshold are not split into separate events, and there is also no minimum time threshold between consecutive storms. This procedure generates 1919 storms over the period . Finally, we are interested particularly in a record of the largest storms, so we select from this list only storms with maximum intensities greater than the 90th percentile of storm maximum intensities. This results in a list of 186 large geomagnetic storms, which we hereafter refer to as GGMS. The 90th percentile threshold is arbitrary, but the results we later discuss based on this list of GGMS are robust to moderate changes to this threshold (between the 85 and 95th percentiles).

Results
Panels A-D of Figure 1 show the time series of the four HMF reconstructions, with B c1 , B c2 , B g , and B r in panels A, B, C and D, respectively. This window is restricted to the period of overlap between the four reconstructions, 1845-1983. The red squares mark the years in B c1 that are affected by impulsive 10 Be enhancements (McCracken & Beer, 2015). Some of these points clearly appear to be outliers, being widely separated from the local trend, more than uncertainty in the values would appear to explain.
Each of the HMF reconstructions shows similar Schwabe cycle variations, although these are better resolved, with smaller uncertainties for B g and B r . Note that this is not a criticism of the B c1 and B c2 series; considering the very different nature of the B c1 and B c2 series, we consider it remarkable that they agree with B g and B r so closely.
Panel E of Figure 1 shows kernel density estimates of the distribution of each of the HMF reconstructions, using the same color scheme as panels A-D. Kernel density estimation provides an empirical estimate of a parameter's probability distribution function by assigning a specified analytical distribution function (e.g. Gaussian, top-hat, cosine etc.) to each observation, and assuming that the normalised sum of these individual distributions provides a fair representation of the parameter's true distribution function (Wilks, 1995).
Overall, the distributions of these four reconstructions appear quite similar, with similar modal values. Both the B c1 and B c2 series appear to be located at slightly smaller HMF magnitudes (≈0.1 nT); the median values of B c1 and B c2 are 6.08 nT and 6.16 nT, while the median values of B r and B g are 6.27 nT and 6.30 nT. The largest differences between B c1 and B c2 and B g and B r are found in the lower tail of the distributions, where there is more probability at lower values in the B c1 and B c2 series. Conversely, the upper tails of the four distributions appear quite similar. The impact of the modification applied to B c2 to account for the 10 Be impulsive enhancements is apparent, with the low tail having less probability at lower HMF values than for B c1 .  Figure 2 presents the time series of the fractional differences between the HMF reconstructions, F g1 , F g2 , F r1 , and F r2 in panels A-D, respectively. Each time series appears to show cyclic variations which correspond approximately with Schwabe solar cycles, which will be analysed in more detail in Section 4.1. Furthermore, we note the close correspondence between F g1 and F r1 series, as well as the F g2 and F r2 series, which arises due to the remarkable agreement between B g and B r . Finally, we note that, visually, there does not appear to be any secular trend in the fractional difference time series.
Kernel density estimates of the distributions of the fractional differences are shown in Figure 3. The distributions for F g1 and F g2 , and F r1 and F r2 are shown in panels A and B, respectively. The correspondingly colored dots mark the values of fractional differences in the years of impulsive 10 Be enhancements, where the vertical coordinates are random numbers, to make clear the spread of the values. For F g2 and F r2 , these points are clustered closer to the zero-difference line than for F g1 and F r1 , showing that in these years B c2 is in improved agreement with B r and B g relative to B c1 . This demonstrates the impact of the McCracken & Beer (2015) procedure to suppress the effects of the impulsive 10 Be enhancements. The four distributions are approximately centered on zero, with similar modal values of 0.01, 0.08, 0.00, and 0.07 for F g1 , F r1 , F g2 and F r2 , respectively. The distributions are moderately asymmetric, with each showing more weight below the modal value than above. It is not clear what causes this asymmetry, but it could possibly result from extra 10 Be production due to SEP fluxes. Were extra 10 Be to be produced by SEPs, that was assumed to be due to GCRs, it would lead to a underestimate of the HMF magnitude in the MB15 reconstructions. Such a negative bias would increase the occurrence of negative fractional differences, and so might explain the asymmetry shown in Figure 3. Be reconstructions with the geomagnetic and sunspot reconstructions appear to show regular variations with the solar cycle. Here we investigate this further by looking at composites of the HMF and fractional differences over the solar cycle. Each year of the reconstructions is assigned a solar cycle phase, as described in Section 3.2. We divide the solar cycle into 10 equally spaced and contiguous phase bins, and for each bin, we compute the mean of each HMF reconstruction, as well as the mean of the fractional differences F g1 , F r1 , F g2 and F r2 . The overlapping extent of the 4 reconstructions spans solar cycles 9-21, although neither the beginning of cycle 9 nor the end of 21 are fully sampled. Because of this, and because of the variable length of the Schwabe cycle, each phase bin is not equally sampled, and minimum number of samples is 11 annual values, the maximum is 16 values, while 6 of the 10 bins have 13 or 14 samples. However, although the phase sampling is not equal, the differences are modest, and each bin has a reasonable number of samples, so we assume that this doesn't influence our subsequent analysis. Figure 4 presents the results of the solar cycle composites, with the HMF, F g1 and F r1 , and F g2 and F r2 , shown in panels A, B, and C, respectively. In each panel, the error bars are 1s errors of the mean computed in each phase bin. This confirms that there is cyclic variability in the differences between the 10 Be reconstruction and the geomagnetic and sunspot reconstructions, which peaks in the declining phase of the solar cycle, around ' s = 4 rads. Furthermore, the close agreement between the independently derived B g and B r series (panel A), suggests to us that these differences arise from the 10 Be reconstruction. Additionally, as they are present for both B c1 and B c2 , with nearly identical magnitudes, we conclude that these differences and are not Time series of the fractional differences between the B c1 and B c2 reconstructions with both B g and B r . Computation of the fractional differences is explained in Section 3.1: F g1 is the fractional difference between B c1 and B g , F g2 between B c2 and B g , F r1 between B c1 and B r , and F r2 between B c2 and B r . Error bars are 1s errors of the mean. caused or resolved by the procedure to suppress the impulsive 10 Be enhancements. This result is robust to changes in the number of solar cycle phase bins, with qualitatively similar results obtained with 8-12 phase bins.

Differences with Schwabe cycle phase and polarity cycle phase
The question then, is which processes could cause the systematic disagreement between the 10 Be reconstructions with the geomagnetic and sunspot reconstructions in the declining phase of the Schwabe cycle? One plausible candidate is the modelling of the GCR heliospheric transport, results from which are employed by McCracken & Beer (2015) to estimate the HMF from the 10 Be observations. GCRs propagate diffusively through the turbulent HMF to near-Earth space, but they also undergo systematic drift patterns due to large scale structure and polarity of the HMF. Full details of the MB15 inversion procedure and GCR transport model are provided in McCracken & Beer (2015). However, to summarise, in their description of the 10 Be inversion procedure, McCracken & Beer (2015) detail how the systematic effects of the Hale cycle modulations remain in the HMF series, as the inversion procedure necessarily averaged over the positive and negative phases of the Hale cycle. These polarity cycle effects are largest in the declining phase of the Schwabe cycle, as the new polarity cycle commences with the polarity reversal of the solar polar magnetic field.
Compositing over Schwabe cycles aliases the effects of the differing heliospheric GCR modulations with polarity cycle phase (Thomas et al., 2015). This aliasing can be avoided by instead compositing over both positive and negative polarity cycles, which we do here following a similar procedure to Owens et al. (2015). Following the standard nomenclature, we here on refer to the positive and negative polarity cycles as qA > 0 and qA < 0, respectively (where q is the charge polarity of the GCR and A is the solar field polarity with A > 0 and A < 0 describing field that is towards the solar poles in, respectively, the southern and northern hemisphere). As with the composite analysis presented in Figure 4, we split the positive and negative polarity cycles into 10 phase bins, and compute the means of the HMF and the fractional differences in each phase bin. Subdividing the data further into positive and negative polarity cycles means that the sampling in each phase bin is poorer than for the Schwabe cycle analysis, but each phase bin still has between 6 and 9 samples. The results of this compositing are shown in Figure 5, with the composites of HMF, F g1 and F r1 , and F g2 and F r2 , shown in panels A, B, and C, respectively. Here the largest systematic difference between 10 Be reconstructions and the geomagnetic and sunspot reconstructions appears in qA > 0 cycles.
Our interpretation of the systematic difference between the MB15 reconstructions with the geomagnetic and sunspot reconstructions is that this is primarily an artefact of the GCR heliospheric transport modelling used in the MB15 10 Be inversion. Furthermore, this analysis shows that the systematic difference doesn't persist over the whole solar cycle, but is apparent in the declining phase of the Schwabe cycle, and can lead to differences of approximately 10-20% relative to the geomagnetic and sunspot reconstructions. Recent observations of the Hale cycle modulation of the GCR intensity should make it possible to use a GCR heliospheric transport model that includes particle drift effects (e.g. Strauss & Effenberger 2017) to examine long term trends, possibly reducing the differences seen in Figures 4 and 5.

HMF reconstructions differences conditional on large space weather event occurrence
In this section we will analyse the distributions of the reconstruction differences F g1 and F r1 , conditional on whether or not large space weather events were know to occur. As we wish to assess the possible sensitivity of the 10 Be record to significant space weather events, we exclude F g2 and F r2 from further analysis, as the B c2 series was processed to minimise the effects of known GLE events.

Historical space weather
As we look further into the past, particularly prior to the space age, reliable records of space weather events become harder to obtain (Barnard et al., 2017). Furthermore, the past development of observatories and instrumentation often means that it is not possible to obtain homogeneous quantitative records of space weather occurrence over long time periods. Here we will assess the historic occurrence of large space weather events in terms of the occurrence of GLEs (Sect. 2.4) and GGMS defined in the aa geomagnetic index (Sects. 2.5 and 3.3). Given the hypothesis that strong SEP events and GLEs may bias the cosmogenic isotope reconstructions, we would, ideally, only consider records of SEPs and GLEs. However, the GLE record only extends back to 1942, having a short 41 year overlap with the MB15 reconstructions, which end in 1983. Therefore we will also consider GGMS, the record of which extends back to 1868, having 115 years of overlap. In doing so we make an assumption that in years where GGMS occur, it is more likely that strong SEP events and GLEs will also occur.
MB15 provided some circumstantial evidence that supports this assumption, showing that the majority of the 3s increases in 10 Be before 1942 were associated with GGMS. Figure 6 summarises these two metrics of historic space weather. Panel A shows a time series of the 3-hourly aa index (black), with the maxima of GGMS marked by red circles. Furthermore, the timings of known GLEs are marked by the vertical cyan lines. Panels B and C shows these same data, but as a function of Schwabe cycle phase and polarity cycle phase. Additionally, Figure 7 presents an alternative view of these data. Panel A shows the annual count of GGMS and GLEs, while panels B and C show the count of GGMS and GLES with both Schwabe cycle phase and polarity cycle phase. For the later two, the Schwabe and polarity cycles were split in to 10 equal width phase bins, and the number of events in each bin was counted. This figure is intended to show the general trend of the occurrence of GGMS and GLE in time and with solar cycle phase. Note that we do not take account of the slightly uneven sampling of ' s and ' p , nor do we scale the counts to account for the 115 year duration of the GGMS record relative to the 41 year duration of the GLE record; the GGMS and GLE counts should therefore be compared in waveform, but not in amplitude. Taken together, Figures 6 and 7 reveal that there is cycle-to-cycle variability in the occurrence of GGMS and GLE, but that typically these events are more likely to occur near the middle of the Schwabe cycle. A similar result was found for the occurrence of gradual SEP events .

HMF reconstruction differences with GLEs
To discern whether GLEs, in general, measurably affect the MB15 reconstruction we will use a statistical approach that compares the distributions of F g1 and F r1 in years with and without GLE events. We are restricted to the 41 year overlap period between the MB15 reconstructions and the GLE record, for which 19 years contained a GLE and 22 years did not contain a GLE. For both the GLE and no-GLE conditions, we compute the empirical cumulative distribution function (ECDF) of F g1 and F r1 . Computation of ECDFs is discussed in Appendix A.1. Furthermore, we compute 100 bootstrap estimates of the ECDFs by drawing random samples from the F g1 and F r1 series, with same number of years as the GLE and no-GLE conditions, but irrespective of whether or not GLEs occurred (more information on the bootstrap procedure, and an example, are provided in Appendix A.2). If the ECDFs of Fig. 5. Panels A-F show composites of the HMF reconstructions and the fractional differences F g1 , F r1 , F g2 , F r2 , with polarity cycle phase. The composites are split according to solar magnetic polarity, with qA > 0 cycles in panels A-C, and qA < 0 cycles in panels D-F. For each panel, the error bars are 1s errors of the mean. F g1 and F r1 for the GLE and no-GLE conditions are markedly different from the ECDFs of the 100 bootstrap samples, then this is some evidence that years with and without GLEs affect the MB15 reconstruction differently. Taken in conjunction with a viable mechanism for GLEs affecting 10 Be, such a difference might suggest that GLEs do, in general, affect the MB15 reconstruction. These ECDF data are presented in Figure 8, with the F g1 and F r1 data analysed in the left and right columns, and the no-GLE and GLE conditions given in the top and bottom rows. The observed ECDFs for the two conditions are drawn in red and blue for F g1 and F r1 , while the bootstrap ECDFs are drawn in grey. In each case, the ECDFs of the GLE and no-GLE conditions are located within the spread of the bootstrap ECDFs, and this test provides no evidence that GLEs, in general, affect the MB15 reconstruction. This may be due to the limited sample of years. Additionally it may be that only the largest GLE can significantly perturb the 10 Be record. For example, MB15 estimate that an individual GLE needs exhibit a >100% increase in Neutron Monitor count rate(s) before it is likely to generate a quantity of 10 Be large enough to stand out from other sources of variability at annual resolution. Only approximately 15% of recorded GLE have been estimated to meet this criteria. However, we also note that at annual resolution it is possible that an accumulation of smaller GLE, and SEP events that do not register as GLE, could plausibly deposit the same net fluence of 10 Be producing energetic charged particles. Ideally we would further subdivide the data to assess the impact of GLE magnitude, but unfortunately there are too few samples for this to be reasonable.

HMF reconstruction differences with GGMS
We will now assess whether the distribution of F g1 and F r1 values depends on the occurrence of GGMS. We do not hypothesise that GGMS are directly affecting the production of cosmogenic isotopes, but do assume that the GGMS record serves as a proxy for the occurrence of large space weather The number of geomagnetic storms (red) and GLE events (cyan) as a function of polarity cycle phase. In panels (B) and (C), phase bins 2p/10 wide were used. No account was taken of the uneven sampling of solar cycle phase, or the fact the GLE record only spans 1937-present; these plots only show coarse trends with solar/polarity cycle phase, and the absolute number of geomagnetic storms and GLE events should not be directly compared. We note that, due to the availability of 10 Be data, we only use the geomagnetic and GLE observations up until 1983. events which may in turn affect the 10 Be record. There is an overlap of 115 years, from 1868-1983, between the GGMS record and the HMF reconstructions, of which 67 contained a GGMS and 48 did not contain a GGMS. Figure 9 presents the ECDF data with the F g1 and F r1 data analysed in the left and right columns, and the no-GGMS and GGMS conditions given in the top and bottom rows. In this instance, for both F g1 and F r1 , there is a clear shift in the location of the ECDFs for the GGMS and no-GGMS conditions relative to the bootstrap ECDFs. The ECDFS of F g1 and F r1 are shifted towards more negative values in years with GGMS, as expected by the increased 10 Be production idea.
A two-sample two-tailed Kolmogorov-Smirnov (KS) test was used to assess the hypothesis that the distributions of F g1 under the GGMS and no-GGMS conditions were drawn from the same underlying distribution (Wilks, 1995) (see Appendix A.3). Essentially, this test computes the probability (p-value) that the differences between the observed distributions would be this large due to drawing two random samples from the same underlying distribution. The test cannot confirm that the distributions are different, but does quantify how unusual it would be to obtain these samples if they came from the same underlying distribution.
In this instance, the KS test returns a p-value of 0.004, meaning a 0.4% chance that differences this large would be observed under the no-GGMS and GGMS samples if they were in fact drawn from the same distribution. Therefore, this test provides evidence that there are significant differences between the distribution of F g1 under the no-GGMS and GGMS conditions. This analysis was repeated for F r1 , which returned a p-value of 0.057, again showing that the differences between the F r1 distribution for the no-GGMS and GGMS conditions are larger than we would expect due to random sampling. These and further statistics are summarised in Fig. 8. (A) The ECDF of F g1 is given in red, computed for only the 22 years without GLE events. The grey lines show 100 bootstrap estimates of the F g1 ECDF, computed by randomly sampling 22 years from the F g1 series. Panel B has the same structure as panel A, but instead shows the ECDF of F r1 in blue. Panels (C) and (D) have the same structure as (A) and (B), but instead show the ECDFs of F g1 and F r1 from only those 19 years with GLE events. Table 1. We cautiously interpret these results as evidence that space weather events do bias the MB15 reconstruction towards lower values of the HMF.
However, this analysis could be influenced by aliasing between the Schwabe/polarity cycle variations in the fractional differences F g1 and F r1 (Figs. 4 and 5) and the occurrence of GGMS (Fig. 7); GGMS are more likely to occur in years where the polarity phase composites show F g1 and F r1 are, on average, more negative. Therefore, we investigate this further by splitting the polarity cycle into active and quiet periods, where the occurrence frequency of GGMS varies much less within these periods. We define the active phases as ' p p/2 and ' p ≥ 3p/2 and quiet phases as p/2 < ' p < 3p/2. There are 59 active years (46 with GGMS and 13 with no-GGMS), and 56 quiet years (21 with GGMS and 35 with no-GGMS). Figure 10 repeats the ECDF analysis for active years, and Figure 11 shows the results for quiet years. Sub-sampling by polarity cycle phase means that Fig. 9. (A) The ECDF of F g1 is given in red, computed for only the 48 years without GGMS events. The grey lines show 100 bootstrap estimates of the F g1 ECDF, computed by randomly sampling 48 years from the F g1 series. Panel B has the same structure as panel A, but instead shows the ECDF of F r1 in blue. Panels (C) and (D) have the same structure as (A) and (B), but instead show the ECDFs of F g1 and F r1 from only those 67 years with GGMS events. there are fewer samples in each category, and the results are less clear, particularly for the no-GGMS condition in active years, which has only 13 samples. However, for both the active years and the quiet years the same trend is apparent; the GGMS distribution tends to be located at more negative values, to the left hand side of the bootstraps elements, than the no-GGMS condition, which tends to be located to the right hand side of the bootstrap elements. We also repeated the KS test procedure on these active and quiet phase subsets, with the statistics included in Table 1. This reveals that for the active and quiet samples there is a much higher probability that differences between the GGMS and no-GGMS conditions could be due to random sampling, and so we have low confidence that these distributions are actually different.

Discussion and conclusions
We have compared the McCracken & Beer (2015) (MB15) annual 10 Be HMF reconstructions with two independent HMF reconstructions derived from sunspot records and geomagnetic activity. The four series span different periods of time, and the comparison is made only over the period of overlap, spanning 1845-1983. More restricted windows are also considered, in our analysis of GGMS (1868GGMS ( -1983, and also GLEs (1940GLEs ( -1983. Comparing the distributions of yearly values of each HMF reconstruction revealed that the MB15 reconstructions display a modest negative bias relative to both the sunspot and geomagnetic reconstructions (Fig. 1). However, this bias is smaller than other sources of variability between the reconstructions, as demonstrated by the distributions of the fractional differences between the sunspot and geomagnetic reconstructions with the MB15 series (Fig. 3); these distributions were approximately centered on zero, and displayed a small asymmetry, such that there was more probability below the modal values than above. This comparison also demonstrated that the MB15 procedure to The ECDF of F g1 is given in red, computed for only the 13 years without GGMS events, from polarity phases corresponding to high solar activity (' p p/2 and ' p ≥ 3p/2). The grey lines show 100 bootstrap estimates of the F g1 ECDF, computed by randomly sampling 13 years from the F g1 series. Panel B has the same structure as panel A, but instead shows the ECDF of F r1 in blue. Panels (C) and (D) have the same structure as (A) and (B), but instead show the ECDFs of F g1 and F r1 from only those 46 years with GGMS events. mitigate the impact of impulsive 10 Be enhancements on the HMF reconstruction does improve the agreement with the sunspot and geomagnetic HMF reconstructions; the fractional differences between B c2 with B g and B r are smaller than the differences computed with B c1 .
The time series of the fractional differences (Fig. 2) are suggestive of a solar cycle trend. Compositing the fractional differences as a function of solar cycle phase revealed a systematic variation in the mean fractional differences over the solar cycle. These variations are similar for each of the F g1 , F g2 , F r1 , and F r2 series. Given the independence of the sunspot and geomagnetic reconstructions, this implies that the solar cycle variation arises primarily from aspects of the MB15 reconstructions. We note that as the variations are similar for both B c1 and B c2 , they appear to be independent of the MB15 procedure to suppress impulsive 10 Be enhancements. The most likely explanation is that the solar cycle variation in the reconstruction differences is due to limitations in the 10 Be inversion procedure, specifically with the GCR transport model. MB15 describe how it was necessary to employ GCR transport coefficients which were an average over both solar polarity cycle, while it is known that GCR transport to near-Earth space varies significantly and systematically with polarity cycle phase.
We considered whether there was statistical evidence that GLEs and GGMS do, in general, affect the 10 Be record, and subsequent estimates of the HMF. The hypothesis for GLEs affecting the 10 Be record is simple, with the GLE acting as new temporary source of energetic charged particles, increasing the total population of particles capable of producing 10 Be in the upper atmosphere. MB15 presented evidence of an association between some impulsive 10 Be enhancements and known GLE events, as well as upper atmospheric nuclear bomb tests. We considered a larger set of GLE events, and investigated if the differences between the reconstructions were significantly different in years with and without GLE events. Using a two-sample K-S test, we could not resolve statistically significant differences between the distributions of Fig. 11. (A) The ECDF of F g1 is given in red, computed for only the 35 years without GGMS events, from polarity phases corresponding to low solar activity (p/2 ' p 3p/2). The grey lines show 100 bootstrap estimates of the F g1 ECDF, computed by randomly sampling 35 years from the F g1 series. Panel B has the same structure as panel A, but instead shows the ECDF of F r1 in blue. Panels (C) and (D) have the same structure as (A) and (B), but instead show the ECDFs of F g1 and F r1 from only those 21 years with GGMS events. reconstruction differences conditional on whether or not GLE occurred. This null result could arise from the limited sample size available to us, with only 41 years of overlap between the HMF record and GLE catalogue. Alternatively it could also indicate that GLEs do not, in general, significantly perturb the 10 Be record, which is consistent with modelling work performed by (Herbst et al., 2015). For example, as previously discussed, MB15 estimate that an individual GLE must exhibit a >100% increase in neutron monitor count rate before it can generate enough 10 Be to stand out above background variability and statistical uncertainty. Unfortunately we were unable to further subdivide the GLE data to test for the effect of GLE magnitude as there are too few events to reasonably do so. Given the limited extent of overlap between these records, and the generally low occurrence of GLE events, it is likely that that the question of if, and by how much, GLE affect the 10 Be record is best answered by physical modelling of 10 Be production and transport for different GLE scenarios. Such modelling work has begun, for example Usoskin & Kovaltsov (2012); Herbst et al. (2015) but will hopefully be extended to include more detailed GCR transport modelling, for more GLE scenarios and a wider range of solar activity states.
We also tested whether the differences between the reconstructions were significantly different in years with and without GGMS. In this instance, it was found that there was a statistically significant difference between the distributions of fractional differences in years with and without GGMS; the fractional residuals tended to be more positive in years with GGMS (consistent with an underestimate of the MB15 HMF by additional production of 10 Be by SEPs/GLEs). However, the correlation between the occurrence frequency of GGMS and polarity cycle phase means that this result may arise from aliasing between the observed solar cycle trend in the reconstruction differences and any effects due to space weather events. To try and isolate this possibility, we further divided the data into "active" and "quiet" phases, and repeated the procedure to assess if the reconstruction differences were significantly different in years with and without GGMS. The same trend persists, with the distribution of reconstruction differences typically being more positive in years with GGMS, but this is less clear, being heavily impacted by the sample size reduction from sub-sampling. Applying the same K-S testing procedure, we could not resolve statistically significant differences between the reconstruction differences in these categories. Therefore we conclude there is some evidence that the MB15 reconstructions do vary in response to GGMS, but that this variation is small compared to other sources of variability. There are several plausible hypothesis about why we may expect the MB15 HMF reconstruction to depend on the occurrence of GGMS. Most simply, it is reasonable to assume that the occurrence of GGMS is a weak proxy for the occurrence of SEP and GLE events, as the energetic CMEs that drive GGMS may also possibly generate significant SEP fluxes and GLEs. Correspondingly, Figure 7 shows a weak correlation between the annual count of GLE and GGMS. However, there are also other possible mechanisms, which may have a non-negligible impact. For example, magnetospheric dynamics during GGMS modifies the access of energetic charged particles into the magnetosphere, and consequently the upper atmosphere. Finally, we do not rule out that changes in the global structure of the heliosphere over the solar cycle may have some role in explaining these differences.
Recent research has shown that analysis of cosmogenic isotopes may be the only available proxy for historic extreme SEP events (Usoskin et al., 2006;Miyake et al., 2012Miyake et al., , 2013Usoskin et al., 2013;Herbst et al., 2015;McCracken & Beer, 2015;Dee et al., 2016). This would be a hugely valuable data source in constraining the limits of extreme SEP events, which have quite poor statistics over the space age , and are critical for assessing the hazards and risks to our continuing exploration and development of the solar system (Cannon et al., 2013), especially for manned missions. Furthermore, the accuracy of reconstructions of the historic GCR environment and large scale solar magnetic activity will improve if the impacts of SEP fluxes on these records are resolved and removed.
As future technological advances facilitate cosmogenic isotope records with higher temporal resolution and higher precision it should be expected that limitations in the cosmogenic isotope inversion procedures become increasingly apparent. We consider the results of this study strong motivation for investigating the role of more advanced GCR transport models in the inversion of high resolution cosmogenic isotope records, as well as developing robust procedures to identify the possible impacts of space weather events on cosmogenic isotope records.

Appendix A: Statistical methods
Below we provide further details regarding some statistical methods employed throughout the analysis in this article. Further details on all of these techniques are described in detail in Wilks (1995).

A.1 Empirical cumulative distribution functions
The ECDF provides a non-parametric estimate of a variables cumulative distribution. For a variable, x, the ECDF, F(x), is calculated as where i is the order statistic of x, and n is the total number of samples.

A.2 Bootstrap re-sampling
In Section 4.2 we use bootstrap resampling to help assess whether the ECDFs of the reconstruction differences show a dependence on whether or not GLEs of GGMs occurred. Here we illustrate this process by way of an artificial example. The left panel of Figure 12 shows a scatter plot of the daily mean aa index against the daily sunspot number, R, from 1868 to 2015. All these data are shown with black dots, except those values with R > 400. In this example we are interested in whether or not the distribution of aa is different for R > 400, than for other times. The right panel of Figure 12 shows the ECDF for all daily mean aa values in black, and the ECDF for aa values corresponding to R > 400 in red, for which there are 66 values. Comparing only the black and red lines, it is difficult to assess whether there is evidence that the R > 400 condition significantly affects the distribution of aa. However, the grey lines are 100 bootstrap estimates of the aa ECDF, calculated by randomly sampling 66 values without replacement from the full aa distribution. Unsurprisingly these grey lines of the bootstrap estimates all cluster around the full aa distribution. However, the red line lies predominantly outside of the cluster of bootstrap lines, suggesting that the R > 400 condition does yield a sample which is unusual compared to a random sample of the same size from the full distribution.

A.3 Two-sample Kolmogorov-Smirnov tests
The two-sample KS test is a statistical hypothesis test which can be used to compare two observed distributions, and assess the plausibility of the null-hypothesis that both observed distributions are in fact random samples from the same underlying distribution. As previously mentioned, the KS test cannot confirm that the distributions are different, but does quantify how unusual it would be to obtain two sets of samples if they came from the same underlying distribution. This is a non-parametric test, with the test statistic D being given by where F 1 (x 1 ) and F 2 (x 2 ) are the ECDF's of the two observed distributions, such that D is the maximum difference between them. From the D statistic, a p-value can be calculated which gives the probability of obtaining this maximum difference between the two distributions, had they been randomly sampled from the same underlying distribution. Consequently, small pvalues imply that the null hypothesis is not very plausible, and the differences between F 1 (x 1 ) and F 2 (x 2 ) can probably not be explained by random sampling. Of course, this does not mean any alternative hypothesis are necessarily true, just that there is a difference that likely merits further investigation.