Time-of-day/time-of-year response functions of planetary geomagnetic indices

,


Introduction
Geomagnetic indices are widely used to quantify the level of activity in the terrestrial magnetosphere-ionosphere-thermosphere system, and have sometimes been used without consideration being given to their response characteristics and whether or not they are appropriate to the application in question. In this paper, we study the response characteristics of three widelyused 3-hourly global geomagnetic indices compiled from observations by mid-latitude observatories; indices which primarily respond to the substorm current wedge (see review by Lockwood, 2013). These are the "mondial" (meaning "global") am index (and its components an and as in the northern and southern hemispheres, respectively), the "planetary" ap index (equivalent to Kp) and the aa index (and its hemispheric components, aa N and aa S ). For each of these indices, we study the response as a function of the time-of-day (i.e., Universal Time, UT), the time of year (quantified by the fraction of the calendar year, F) and the level of geomagnetic activity.
Maps showing the stations currently contributing to these indices are presented in Figure 1. Lists of stations, their locations and the intervals over which each was used, are given for each index in Appendix A. The am index is designed to give good and even longitudinal coverage in both hemispheres, currently employing 14 stations in the northern hemisphere and 10 in the southern. There have been relatively minor changes to the network of stations deployed and the use of area-weighted averaging over longitude sectors has minimised the effect of these changes. It is available continuously from 1 January 1959 to the present day. The ap (and Kp) index is currently made from data from 11 stations in the northern hemisphere and 2 in the southern. It is available continuously since 1 January 1932, although the number and distribution of stations has varied considerably: initially there were 10 stations all in the northern hemisphere, the first southern hemisphere station being added in 1958. The ap and Kp indices have always been dominated by data from European stations. The aa index uses just two stations, one in each hemisphere, and although suppressing the resulting spurious annual variation in the response rather well, aa shows a large spurious diurnal variation (e.g., Lockwood et al., 2018b). The aa index is continuously available for the longest interval (1 January 1868 to the present day) and has been relatively homogeneous in its construction. It is this longevity that gives aa its importance. Recent work has shown that it is greatly improved by corrections to allow for the effects of secular change in the intrinsic geomagnetic field and the locationdependent sensitivity of the stations deployed, yielding the "homogeneous aa index" aa H (Lockwood et al., 2018a, b). Note that above we discuss the 3-hourly indices am, an, as, ap, aa, aa N , aa S and aa H : the same considerations apply to their respective daily means, Am, An, As, Ap, Aa, Aa N , Aa S and Aa H , and their 8-point (24-hour) running (boxcar) means Am*, An*, As*, Ap*, Aa*, Aa N *, Aa S * and Aa H * (Allen, 1982). Figure 2a shows an example of a 7-day interval of 3-hourly geomagnetic index data and Figure 2b shows the corresponding 24-hour (8-point) running means for the same interval. The interval is 27 October 2003 to 2 November and so includes the "Halloween storms" that generated considerable GIC events (Kappenman, 2005). This example is chosen here because it gave the highest daily means in the interval 1995-2017. The values given for the am, ap, aa and aa H indices in Figure 2a (and their corresponding 8-point means, Am*, Ap*, Aa*, and Aa H * in Fig. 2b) are the published values but the ap and Ap* values have been multiplied a factor f = hami all /hapi all , the ratio of overall means of am and ap, which allows for the difference in the scaling of ap and the other indices. Figure 2b shows the most clear-cut difference between the indices is that the scaled ap (and hence Ap*) values during the storm are consistently larger than the corresponding values for am, aa, and aa H whereas before and after they are proportionally much more similar. This demonstrates that differences between the responses of the indices can depend on the level of geomagnetic activity. Given the normalisation using f, this means there must be other times when f Â ap is lower than am, aa and aa H . This paper investigates how systematic this difference is with time-of year. In comparison, the Aa*, Aa H * and Am* values are much more similar. The 3-hourly values shown in Figure 2a show considerable point-to-point variation in each index and point-to-point variation in the relationships between the various indices. There is no evidence for a systematic difference between the indices with UT (by which points in Fig. 2a are colour-coded to aid comparisons). Such differences would be convolved with random effects, such as the timing of perturbations within the 3-hour intervals over which the range indices are measured, and so are likely to emerge only in statistical surveys that also allow for the effects of time-of year and activity level.

K indices
K values, on which all these indices are based, were introduced by Bartels et al. (1939). They are derived for each magnetometer station from the range of variation observed in each of eight 3-hour intervals (0-3 UT, 3-6 UT, up to 21-24 UT). Originally scaled manually, K-value derivation was increasingly automated as magnetometer data recording moved from analogue to digital (Riddick & Stuart, 1984). The range of the irregular variations (i.e., after subtraction of the regular daily variation) in either of the horizontal components (X northward or Y eastward, whichever gives the larger value), DH X or DH Y , is ranked into one of 10 classes using quasi-logarithmic band limits that are specific to the observatory and to which a K value of 0-9 is assigned. The original idea of this procedure was that the scale of threshold values used to convert the continuous DH X or DH Y range values into the quantized K values is adjusted for each station to allow for its location and characteristics such that the K value is a standardized measure of the geomagnetic activity level, irrespective of where it is measured. In practice, the range limits for all K bands for a given station are all set by just one number, L, the lower limit of the K = 9 band, the lower limit for the K = 0 band being always set to zero (Menvielle & Berthelier, 1991). This is because the same relative scale is used at all stations with the thresholds for the K bands given in Table 1.
The very high correlations between the range indices and auroral electrojet indices such as AE and AL (e.g., Adebesin, 2016;Lockwood et al., 2019a) indicate that geomagnetic activity at mid-latitude observatories is dominated by the ionospheric segment of the substorm current wedge, i.e., the main westward auroral electrojet (e.g., Saba et al., 1997;Finch et al., 2008;Lockwood, 2013). (See further discussion in Sect. 1.5). As a result, the value of L used for a station is set by its closest proximity to the midnight Magnetic Local Time (MLT) sector of a nominal auroral oval which is where the range response of a station is greatest (Clauer & McPherron, 1974;Finch et al. 2008;Chambodut et al., 2013). The L value used is decreased with decreasing magnetic latitude because the range observed decreases with increasing distance from the auroral oval (e.g., Rostoker, 1972). In practice, this is quantified by the geocentric angle, d, between the station and the point of closest approach of the nominal auroral oval (which occurs near midnight MLT) which is taken to be along a corrected geomagnetic latitude K CG of 69°. Station K-indices are converted to a K values (in nT) using a standard scale called "mid-class amplitudes", for which the range threshold for the K = 9 band is L = 500 nT: the conversion table for implementing this scale is referred to as K2aK and is given in the third column of Table 1. All the planetary and hemispheric indices discussed in this paper are based on these observatory K-values.

The aa index
The simplest of the indices that we study on this paper is the aa index, which was devised by Mayaud (1971Mayaud ( , 1972Mayaud ( , 1980 to give a continuous, well-calibrated and homogeneous record of geomagnetic activity that extends back to 1868. This index uses just two stations at similar geomagnetic latitudes, one in each hemisphere. The northern and southern hemisphere aa indices, aa N and aa S , are the a K values from the station in that a) b) c) Fig. 1. Maps of networks of stations currently contributing to (a) the Kp (and hence ap) index, (b) the am index and (c) the aa index. In each map, the light grey bands are typical locations of the auroral oval and dark grey bands are ideal middle geomagnetic latitudes for stations to give a K-index value, being close enough to give a large signal, but far enough away that the response is monotonic because, for all but the very largest disturbances, the auroral oval approaches the station as the activity level increases. Details of these stations, and others used in the past, are given in Appendix A. Images courtesy of the International Service of Geomagnetic Indices (ISGI).  Lockwood et al. (2018a) have shown that anomalies in the secular variations of aa N and aa S are removed if time-dependent factors, calculated from models of the global intrinsic ("main") field, are used. The hemispheric aa indices are then averaged together to give aa = (aa N + aa S )/2. This averaging, to a large extent, gives cancellation of the seasonal variation in the geomagnetic response to solar forcing that is found in aa N and aa S individually. Lockwood et al. (2018b) have demonstrated how good cancellation is effectively achieved for the annual variation but that, because the best available stations are roughly 10 h apart in local time (instead of the ideal 12 h), the diurnal variations at the two stations do not cancel as completely as the annual variation. Recently, Lockwood et al. (2018b) have used a model of the stations' response (and made the correction to allow for the effects of the secular change in the geomagnetic field) to generate the "homogeneous" version of the aa index, aa H , so named because it largely eliminates well-known hemispheric asymmetries between the mean values and distributions of aa N and aa S .

The am index
The stations used to compile the am index (Mayaud, 1980) are situated at sub-auroral latitudes close to corrected geomagnetic latitude K CG = 50°. They are grouped into longitude sectors, with five such groups in the Northern hemisphere, and four in the Southern. The K indices for stations in a longitude sector are averaged together and the result is converted into a sector a K value using the standard K2aK scale. Weighted averages of these sector a K values are then generated in each hemisphere giving an and as, the weighting factors accounting for the differences in the longitude extents of the sectors. The index am is equal to (an + as)/2. Note that, like aa, am is compiled using only mathematical operations. We here employ all available am data up to the end of 2017 and that after the end of 2014 these data are classed as "provisional" which means they have passed initial quality checks and can be used, but not yet been through the final review that defines them as "definitive". We here apply additional checks to the data for 2015-2017 by testing for, and removing, any outliers in the scatter plots (more than 3r from the mean) with the SuperMAG SME index or the Auroral Electrojet AE index (see Sect. 1.5).

The ap (Kp) index
The ap index is currently made using K index data from 11 northern and two southern hemisphere stations between corrected geomagnetic latitudes |K CG | between 44°and 60°. The K indices are first converted into standardised K S values to account for the time-of-year and UT response characteristics of the observing site and to, as far as possible, normalise them to the values seen simultaneously by Niemegk, which was chosen as the reference station. The Kp index is the arithmetic mean of the 3-hour standardized K S values for the observatories employed. The 3-hourly Kp values are converted into ap values using a standard table that is constructed such that ap may be regarded as the range of the most-disturbed of the two horizontal field components, expressed in units of 2 nT, at a station at dipole latitude of 50°.
The standardization from K to K S is achieved using conversion tables for each observatory that were defined for the original stations by Bartels (1949Bartels ( , 1957. These give a multiplication factor K S /K that depends on the station location, UT, timeof-year, F, and the K value and hence application of these factors is a non-linear operation. The present conversion tables used are for three seasons and many were generated using selected data from 1943 to 1948 only. The three seasons are: (1) the months around winter solstice (January, February, November and December); (2) the months around the equinoxes (March, April, September and October); and (3) the months around summer solstice (May, June, July and August). The network of stations used to compile the ap and Kp indices has varied, and the intervals over which each station was used are given in Appendix A.
In theory, if the K to K S conversions were always ideal, the distribution of ap stations would be of no consequence as the various K S values would be all be different measurements of the same thing. However, to some extent the tables will fall short of the ideal so the index response pattern will also depend, to some extent, on the distribution of stations. Even if the K to K S conversions were ideal, converting the data to what Niemegk would have seen causes the index to take on the UT-F response characteristics of the Niemegk site. Note that we here quote ap (and hence ap C , Ap C * and [Ap C *] max ) in the most widely used formnamely as an index without units: the standard ap values are an index in units of 2 nT and hence the values in units of nT would be double those given in this paper (Menvielle & Berthelier, 1991).

Relationship of mid-latitude K-based indices to auroral electrojet indices
In this section, we compare the mid-latitude range indices (am, ap and aa) with the SME and SML indices from the northern hemisphere SuperMAG magnetometers (Newell & Gjerloev, 2011) and the standard AE and AL auroral electrojet indices from the ring of 12 northern hemisphere stations at Table 1. Bands of range values used to generate quantized K-indices for a station with a lower limit of the K = 9 band of L. DH X or Y is the range between extreme values in the 3-hour intervals of the northward or westward horizontal component, whichever is the larger. The right hand column gives the quantized a K values ascribed to the K-levels using the "K2aK" or "mid-class amplitudes" scale.
auroral latitudes (Davis & Sugiura, 1966). The AE indices are based on 1-minute data on the quiet-day-subtracted horizontal magnetic component from the 12 stations. The upper and lower envelopes of the distribution of values at any one time define AU and AL, respectively, and AE is the width of the envelope between these maximum and minimum values, AE = AU À AL. The AL values are negative and driven by the westward auroral electrojet that is the ionospheric segment of the substorm current wedge and so are usually recorded by a station in the midnight MLT sector and AL responds mainly to the enhancement of the electrojet during substorm expansion phases (see review by Lockwood, 2013). On the other hand, AU responds primarily to the eastward electrojet associated with westward convective flow in the afternoon MLT sector and is enhanced during both the growth and expansion phases of substorms. The SuperMAG SME and SML indices are computed in exactly the same way as AE and AL but using 1-minute data from a much greater number of stations, which varies between 93 and 118 over the interval studied in this section . These indices capture better the extreme deflections which define them. The SuperMAG network is global but has an excess of northern hemisphere stations that are particularly clustered in the European and American longitude sectors and SME and SML are generated from northern hemisphere stations only. However, because SME and SML, like AE and AL, use extreme rather than average values, the clustering is not an issue and the network is extensive enough for these indices to be considered almost free of spurious diurnal variations. However, the use of only northern hemisphere stations will mean these indices, like AE and AL, will have an annual variation due to seasonal effects. The am, ap and aa indices are all based on the range of variation in 3-hour intervals, so it makes sense to compare them with the largest SME value (SME max ) and the minimum SML value (SML min ) in the same 3-hour intervals. However, as shown by Table 2, strong correlations are also found with the average SME and SML for the coincident 3-hour intervals (hSMEi s=3h and hSMLi s=3h ), although not quite as strong as for SME max and SML min . Correlation coefficients are typically between 0.8 and 0.9 and always stronger for the 24-hour smoothed values than the 3-hourly values and for the Super-MAG indices than for the traditional AE indices. In most cases, the am index yields the best correlations, but those for aa are very similar: correlations for ap are invariably lower. Note that the correlations for AL are slightly weaker than those for AE, as are those for SML compared to SME. This indicates that the mid-latitude range indices do respond to the directly driven system, as detected by AU and SMU, as well as the strong influence of the substorm unloading system, as detected by AL and SML. Figure 3 illustrates examples of the relationships of (a) am, (b) ap, and (c) aa to SME max for the years 1996-2017 (inclusive). The general features are the same in corresponding plots for hSMEi s=3h , ÀSML min , ÀhSMLi s=3h , AE max , hAEi s=3h , ÀAL min , or ÀhALi s=3h . In each panel a scatter plot of the three-hourly values is given by grey dots and of their 8-point running means (24-hour averages) by orange dots. Note the grey dots show the quantization of both aa and ap levels, whereas am (like Am*, Ap* and Aa*) is essentially continuous. Black dots are means in 1-percentile bins of the mid-latitude range index along the x-axis (giving 614 samples in each bin) and the horizontal and vertical error bars are plus and minus one standard deviation for that bin. The mauve lines are fourth-order polynomial fits to the 3-hourly values. The RMS deviation of the observed 3-hourly SME max values from the fitted polynomial values, D RMS are given in each panel, as are the linear correlation coefficients r and r* for the 3-hourly values and the 8-point running means, respectively. Note that the polynomial fit order m = 4 that was used was the largest that could be employed (desirable as it preserves the non-linearity of the variation) that gave a good fit to the tail of the index distribution; the latter criterion being tested by evaluating the D RMS 2 / (nm À 1) value for the n = 614 samples in the top percentile of the index dataset and checking it was not significantly larger (at the 1 À r level) than the minimum value for any m. The cyan lines are linear fits to the 3-hourly values for quiet-times (SME max < 750 nT), and are plotted to gauge the deviations from linearity of the data for larger SME max . Figure 3 shows that all three mid-latitude range indices have a non-linear response, with values being overestimates, relative to SME max values, at the highest activity levels. However, the tendency is much stronger for ap than for am and aa: Figure 3a and c shows that the deviation from linearity is only present in the top 1% of samples for am and aa; however, for ap the deviation is persistent for the top 35% and is significant (greater than 1 standard deviation) for the top 17% of ap samples. The scatter (quantified by the RMS deviation of samples from the fourth-order polynomial fit, D RMS ) is smallest for am but largest for aa, but the linear correlations r and r* for aa are higher than for ap because the polynomial fit deviates from linearity much less for aa than for ap. Indeed, for 24-hour running means, the correlation for Aa* is even very slightly higher than that for Am*. Table 2. Linear correlation coefficients between mid-latitude range indices and the SuperMAG and auroral electrojet indices: r and r* are for 3-hourly values and the 8-point running means, respectively. The data used are for 1996-2017 (inclusive). This yields 61368 3-hourly samples (am, ap and aa) and 6136124 running-mean samples (Am*, Ap* and Aa*). For all correlations, the large number of samples ensures that the correlation significance level, derived by comparison with the AR1 red noise model, is 100% to within at least three decimal places for all cases. The maximum SME and minimum SML in each 3-hour intervals is SME max and SML min respectively.

Relationship of mid-latitude K-based indices to interplanetary conditions
In Section 2 we employ the concept of the sensitivity of station to "solar forcing". This sensitivity depends only on the station's location, all other site factors (such as instrumentation sensitivity and ground conductivity) being accounted for by the instrument calibration. This sensitivity gives the station's response to all solar forcings, including photon and particle ionization and heating effects (Aksnes et al., 2002;Ieda et al., 2014) as well as the enhanced electric fields and the associated expansion/contraction of the auroral oval due to solar windmagnetosphere coupling (Lockwood et al., 1990;Cowley & Lockwood, 1992;Milan et al., 2012). With this definition, not all of this solar forcing comes from the solar wind, because of the effects of solar EUV and X-ray ionizing and heating radiations on ionospheric conductivities. Hence "solar wind forcing" is a part of, but not all of, "solar forcing".
To quantify solar wind forcing, a number of "coupling functions" have been proposed as predictors and explainers of geomagnetic disturbance (see review in Lockwood et al., 2019a). These are combinations of parameters characterizing the near-Earth planetary environment, combined with various coefficients and exponents that are free fit parameters that are derived empirically to get the best fit to the observed geomagnetic activity response. The recent work by Lockwood et al. (2019a) highlights a serious problem with most previous coupling function studies that have generally neglected the effect of gaps in the interplanetary data series on the grounds that they occur at random and so their effects will average out. Lockwood et al. use Monte-Carlo analysis by inserting synthetic data gaps at random into near continuous data to show that this is far from being a valid assumption and that the effect of data gaps is to add considerable noise into solar windgeomagnetic activity correlation studies. This is true irrespective of the method used to deal with the data gaps (for example interpolation, piecewise removal of geomagnetic data, or simply ignoring their effects). This raises the potential for "overfitting" which is a serious problem in multiple regression analysis of geophysical time series that have internal noise: it is a recognized pitfall in areas where quasi-chaotic behaviors give large internal noise such as climate science (e.g., Knutti et al., 2006) and population growth (e.g., Knape & de Valpine, 2011) but had not been considered in solar-wind/magnetosphere coupling studies. Overfitting occurs if a fit has too many degrees of freedom which allows it to fit to the noise in the training subset, and hence is not robust in general. Including all of the factors with their own weighting factors and/or exponents can result in extremely good fits that can, nevertheless, give details that are statistically meaningless as each additional fit parameter reduces the statistical significance of the correlation. Such fits can have limited, and in extreme cases, zero predictive capability because they have fitted noise rather than the signal. The addition of noise by neglecting the effect of gaps in interplanetary data (which before 1995 were common and often of long duration) means that overfitting is a serious problem. Another complication is that the relative performance of different coupling functions depends strongly on the data averaging timescale and the averaging timescale used to generate the best fit exponents (Finch & Lockwood, 2007).
As a result of these considerations, we here use just one coupling function, P a , that has just a single free fit parameter, the coupling exponent, a, which estimates the power input into the magnetosphere, using the theory of Vasyliunas et al. (1982). This theory is based on the fact that the dominant energy flux in the solar wind is the bulk flow kinetic energy flux of the Fig. 3. Scatter plots for 1996-2017 (inclusive) of the mid-latitude range indices with the maximum of the SME index, SME max , seen in the same 3-hour intervals by the SuperMAG global magnetometer network. (a) (grey) 3-hourly SME max values as a function of 3-hourly am and (orange) 24-hour running means of SME max as a function of corresponding running means of am, Am*. Black dots are means in 1-percentile ranges of Am* (giving 614 samples in each bin) and the horizontal and vertical error bars are ±1 standard deviation. The mauve line is a fourthorder polynomial fit to the 3-hourly values. The RMS deviation of the observed 3-hourly SME max values from the fitted polynomial value for the corresponding am, D RMS is given, as is the correlation coefficient r between 3-hourly SME max and am values. (b) The same as (a) for 3-hourly ap and its 24-hour running mean, Ap*. (c) The same as (a) for 3-hourly aa and its 24-hour running mean, Aa*. In each panel the cyan line is a linear fit to the 3-hourly values for SME max < 750 nT, and is plotted to gauge the deviation from linearity of the data for larger SME max .
M. Lockwood et al.: J. Space Weather Space Clim. 2019, 9, A20 Page 6 of 27 particles (and not the Poynting flux assumed by the much-used but flawed epsilon parameter). The fraction of this energy flux that is converted into Poynting flux by currents flowing in the bow shock, magnetosheath and tail magnetopause is taken to be the (necessarily dimensionless) factor M A

À2a
, where M A is the Alfvén Mach number of the solar wind and a is an unknown coupling exponent. The value of a could vary from zero (which would mean all incident power could enter the magnetosphere if the orientation of the interplanetary magnetic field, IMF, were favorable) and a large positive value (which would mean a vanishingly small fraction could enter the magnetosphere).
Typically, values of a between 0.35 and 0.5 have been derived from the fits to data which, for a typical M A of 10, means that 5-10% of solar wind power is available to enter the magnetosphere. The coupling function P a contains terms in solar wind mean ion mass, number density and velocity and the IMF orientation, but the exponents for each are all self-consistent, all being set by the single fit parameter a and the theory. (Note that, in some studies the exponent of the IMF orientation factor has been treated as a separate fit variable, but we employ the procedure recommended by Vasyliunas et al. that allow it to be computed self-consistently from the data once the optimum a has been determined). Finch & Lockwood (2007) found that the optimum coupling exponent a used in generating P a depended on averaging timescale, a result that implied that there was at least one other mechanism at work and that P a was failing to capture all of the relevant physics. However, Lockwood et al. (2019a) have shown that this variation in a was an artefact caused by data gaps and that when steps are taken to minimize the effect of such data gaps, a is effectively constant on all timescales from 1 min up to 1 year. Figure 4 shows scatter plots of ap (top), the new homogeneous aa index, aa H (Lockwood et al., 2018a, b) (middle), and am (bottom) as a function of the power input into the magnetosphere, P a (normalized by dividing by its average for the whole interval, hP a i all , in order to cancel some constants). The left hand panels are for daily means and the right hand panels for annual means. We use daily means because at shorter timescales the lag introduced by substorm growth phases becomes a significant factor (Lockwood et al., 2019a). For aa, aa H and am the best fit coupling exponent (at all timescales) is a = 0.44, whereas for ap it is a = 0.48 (Lockwood et al., 2019a). The correlation coefficients for the daily means Ap, Aa H and Am with daily means of P a /hP a i all are very high, being 0.866, 0.893 and 0.923, respectively. The root-mean-square (RMS) fit residual for a linear regression of all the data, e, was also computed, giving e/hApi all = 0.570 for Ap, e/hAai all = 0.401 for Aa and e/hAmi all = 0.342 for Am. Hence both these metrics give best agreement for am, and worst agreement for ap. We tested the significance of the difference between the correlations using the Meng-Z test (which allows for intercorrelations between the datasets (Meng et al., 1992)) and found that the p-value for the null hypothesis that they are actually the same was undetectably small. Hence the agreement with the coupling function is significantly better for am than for aa and that for aa is significantly better than that for ap on this 1-day averaging timescale. For the traditional aa index, the correlation was 0.883, which is slightly lower than for aa H but the Meng-Z test gives that the p-value for the null hypothesis that these two correlations are the same is only 5 Â 10 À5 . Thus the improvement of aa H over aa is small but statistically highly significant.
For the annual means shown in the right hand panels, the ranking order of the correlations is reversed. In this case, the correlations for ap, aa H (and aa) and am are 0.992, 0.988 and 0.987. These correlations are exceptionally high and differences are small. The Meng-Z test gives p-values against the null hypothesis that they are the same of 0.314 for ap and aa H and of 0.371 for ap and am. Hence the difference between the correlations for ap and aa H is just significant at the 1 À r level but it is not quite significant at the 1 À r level for ap and am. The RMS residuals (as a ratio of the overall mean value) are lowest for aa, but higher for am than for ap. In conclusion, both metrics show that am outperforms ap in daily averages but is out-performed by ap in annual averages.
The linear regression lines shown indicate why this is the case. For annual means (right-hand panels) the cyan lines are linear regression fits to all data but for the daily-averaged data (left-hand panels) the data have been linearly regressed in three subsets: (1) 91 days around the June solstice (giving the red line); (2) 91 days around the December solstice (blue line); and (3) 91 days around either equinox (orange line). In Figure 4a the regression lines for the three seasons are different, with the Ap values for equinox at a given P a /hP a i all being larger, whereas for northern hemisphere winter they are smaller. For Aa and Am these differences are much smaller. The annual and semianual variations in the response of aa and am are real and in a later publication we will show that they are highly significant features of magnetospheric behaviour, however they are both exaggerated in ap. On the other hand, when we average out these 6 and 12 month periodicities by taking annual means, ap performs better than the other two indices. Figure 5 explains the seasonal variations of the correlations shown in the left hand panels of Figure 4 by showing the semiannual variations in the mean values of the geomagnetic indices and their best fit of the corresponding variation in P a . The power input P a shows a clear semi-annual variation with peaks at the equinoxes, the September peak being somewhat more pronounced than the March one. There is a slight difference between the solstices with the June minimum being slightly deeper than the December one. This variation results almost completely from the sin 4 (h/2) term in P a (where h is the IMF clock angle in the Geocentric Solar Magnetospheric, GSM, frame of reference) and is caused by the Russell-McPherron effect of the Earth's dipole tilt on IMF orientation in the GSM frame (Russell & McPherron, 1973). Figures 4 and 5 use daily means to avoid complications associated with the lag of the geomagnetic response due to the variable length of substorm growth phases. Lockwood et al. (2016) show that for averaging timescales greater than 12 h the F-UT response pattern due to the Rusell-McPherron effect becomes "axial" in form, i.e., showing the equinoctial peaks but no UT variation. Note that on shorter timescales the F-UT pattern of geomagnetic response becomes "equinoctial" in form (best demonstrated by the am index, but can also be detected in aa H , Lockwood et al., 2018b) whereas P a shows the Russell-McPherron form (Lockwood et al., 2016) which confirms that the geomagnetic indices are not responding to the "directly-driven" system. This difference in the F-UT patterns is therefore associated with the "storagerelease" system to which mid-latitude range indices primarily respond Lockwood, 2013). Hence the origins of the equinoctial pattern must be associated with the variable lags cause by the durations of substorm growth phases. Figure 5 shows that the index that most closely reflects the semi-annual variation in P a is Aa H , for which the fits are, surprisingly, slightly better than for Am. The fits are not as good for Ap, which exaggerates the September equinox peak and has a deep minimum in December. Hence Figure 5 explains the seasonal differences in the regression fits in Figure 4 for Ap, and why they are smaller for Aa H nor Am.

Analysis
Finch (2008)  for any given type of single-station geomagnetic activity measure g by where I S is a measure of the input solar forcing (which includes the effects of both induced currents in near-Earth space driven by solar wind-magnetosphere coupling and of conductivity due to ionizing EUV and X-ray radiations from the Sun or particle precipitations). Finch defined s to be a function of only the instrument co-ordinates because instrument and local site characteristics are accounted for by other inter-calibration procedures. By taking ratios of g seen simultaneously at many pairs of different stations, the I S factor is cancelled and the ratios of the station sensitivities is known. Note that this concept is the same as that introduced by Bartels (1949) and that is still used today in the compilation of the ap (and kp) index: this is because Bartels' look-up tables used to convert K data from a given station (XXX) into what Niemegk (NGK) would have seen (the K S value) are tables of average empirical values of s NGK /s XXX (as a function of UT, F and activity level). If the data from different stations are combined into a geomagnetic index using linear mathematics, then the sensitivities are similarly combined. For example, if the g data from N stations are averaged together with weighting functions x to give a planetary index G, where S is the sensitivity of the index which is the weighted mean of the station sensitivities, s i . From comparisons of the ratios for many pairs of stations, Finch (2008) derived a functional form for computing the sensitivity of a station as a function of its geographic coordinates, date, time-of-year and time-of-day: where A and B are constants, v is the solar zenith angle (a function of location, UT and F), T is the MLT of the station (in hoursalso a function of location, UT and F and, on long timescales, of the intrinsic geomagnetic field), F is the fraction of the year and F = F 1 at the spring equinox (taken to be 100/365.25 for the northern hemisphere and 283/365.25 for the southern hemisphere). Lastly, m is a normalising factor that ensures that the average value of s, over all UT and all times-of-year (F) and activity levels, is unity for a given station and year: it is used to retain calibrations that allow for instrument characteristics and local site effects.
The Biot-Savart law states that the field disturbance at an observatory O, DB o , is proportional to the integral over space of J np /r 2 for all points P, where J np is the current density at P normal to the line OP and r is the distance OP. The inversesquare dependence on r means that there can be a range of contributions to the observed signal at a mid-latitude station from large variations in J np in the auroral oval (i.e., at larger r) to smaller fluctuations in J np more local to the observatory (at smaller r). It is well known that substorm electric fields can penetrate the shielding provided by the region-2 fieldaligned currents (that connect to the ring current) and so give "bays" in mid-latitude magnetometer records (e.g., Caan et al., 1978;Kikuchi et al., 2000). Hence the conductivities of the ionosphere over the observatory can influence the local J np and hence DB o . On the other hand, the large and variable currents flowing along the midnight-sector auroral oval during substorms will also have an effect that will depend strongly on how close this sector of the auroral oval is to the observatory. The first term on the right of equation (3) allows for the effect of solar zenith angle v on the ionospheric conductivity over the station due to solar EUV and X-ray radiation and thus depends on the station's geographic coordinates, the UT and the timeof-year, F. If the Sun is below the horizon, v is set to (p/2): hence the coefficient A controls the extent to which the effect of dayside conductivity at a given v is enhanced over residual nightside values. Note that there are small changes to the precise formulation of Finch (2008), who used a cos 0.5 (v) dependence, as predicted by Chapman production-layer theory and as also used in a great many prior applications. However, Ieda et al. (2014) have shown that a conductivity dependence on cos 0.7 (v) fits better with observations and is also predicted by theory when the upward gradient of the neutral atmospheric scale height is accounted for. Using the conductivity over the observatory is an approximate parameterisation as there will be contributions to the total DB o that arise from currents that are between the observatory and the auroral oval.
The second term on the right-hand-side of equation (3) is the station's sensitivity due to its distance from the location of peak response, which is at an MLT of T* in the midnight sector. The sine term in equation (4) is used to model the known earlier onset of enhanced substorm activity in summer. Equation (4) yields T* of 1 h MLT and 22 h MLT for the winter and summer solstices, respectively. This is based on the survey of midlatitude station responses to substorm expansion phases by Finch (2008) and agrees well with the results of Liou et al. (2001), who found substorm onset was typically at T = 22 h in summer but 23.5 h in winter. Similar behaviour was deduced by Wang et al. (2007). We note that we are most interested in the MLT where auroral electrojet currents have peak effect on mid-latitude K indices: this is close to, but not the same as, the MLT of substorm onset (Clauer & McPherron, 1974;Chu et al., 2014).
In this paper, the solar zenith angle at a given station is computed as a function of time (year, fraction of year, F, and UT) using an ephemeris that gives the solar declination at that time. The MLT for a given UT is computed for that date using the IGRF-15 model of the geomagnetic field (Thébault et al., 2015). Finch (2008) assumed that the factors A and B were constants and had considerable success in modelling the average response of different stations and indices. However, there are reasons to also think that the relative importance of the two terms in equation (3) might change systematically with the level of geomagnetic activity. Firstly, particle precipitation fluxes are higher during enhanced activity over a wide range of locations (including mid-latitudes (e.g., Shiokawa et al., 2005)), which could mean that photon-induced conductivity is less important, and hence the dependence on cos 0.7 (v) is weaker: as a result, the factor A would be reduced at higher activity levels. Secondly, the auroral oval expands equatorward when activity is enhanced, making the second factor in equation (3) (associated with the spatial proximity of the auroral electrojet) more important. The factor B sets the amplitude of the diurnal variation seen by the station because of the variation in its proximity to the peak of the substorm current wedge. For these reasons the factors A and B are here both treated as functions of geomagnetic activity level. Lockwood et al. (2018b) quantified the factors A and B by assuming that the large number of stations used to derive the am index, and their even longitudinal spacing, results in the sensitivity of am index, S am , being always unity, independent of both UT and time-of-year, F. We here use a more exact and iterative procedure, but get results which are very similar to those found by Lockwood et al. (2018b). In general where the station of sensitivity s K gives a K-index value that transforms to a K using the standard K2aK scale. We here quantify the geomagnetic activity level using the am index and divide it into eight (generally overlapping) activity ranges: 0 am < 10 nT, 10 am < 20 nT, 20 am < 40 nT, 30 am < 50 nT, 40 am < 60 nT, 50 am < 90 nT, 60 am < 110 nT, and am ! 70 nT for which the years 1959-2017 give N b = 58183, 51083, 40894, 22691, 13157, 8302, 10869, and 6060 samples, respectively, and the mean am values are 5. 32, 13.96, 27.50, 37.71, 47.76, 63.56, 73.90, and 109.14 nT. The distribution of am values and these band limits are shown in Figure 6.
In the iterative procedure we adopt, we make the initial assumption of uniform S am (S am1 (UT, F) = 1). This gives initial estimates of s K (UT, F) for each of the eight am ranges for the Canberra and Hartland aa stations studied (we here denote these initial values as s K1 ). We then obtain initial A and B estimates (A 1 and B 1 ) for each of the am bins, using equations (2)-(4) by fitting modelled sensitivity ratios s K /S am to the a K /am ratio values in the relevant am subset, using the fitting procedure in the UT-F parameter space described by Lockwood et al. (2018b). The modelled sensitivities were computed for 24 UT values (1 h apart) and 365 F values (daily) and their ratios then averaged into the same bins as for the observational data (namely, 3 h width in UT and 1/20 width in F). From these initial A 1 and B 1 values we can use equations (3) and (4) to compute the corresponding sensitivity value for each of the am stations, and then use equation (2) to re-compute the sensitivity of the network of stations used in compiling the index, S am , giving a new estimate S am2 (UT, F). Using these values in equation (5) gives revised (first iteration) values of s K (UT, F), values (s K2 ). This loop was repeated until the RMS deviation of the modelled s K /S am values for CNB and HAD from the observed a K /am ratios converged on a constant, minimum value (to within an adopted uncertainty level of 0.001%). This iterative procedure yielded The right-hand columns of Figures 7 and 8 show the UT-F plots of the observed ratios ha K i/hami for the two current aa stations, respectively Hartland and Canberra. The rows are for the eight am activity level bins shown in Figure 6, with the highest activity at the top and the lowest at the bottom. The left-hand column of both figures gives the UT-F plots of the modelled sensitivity of the am index, S am , and the middle panels in both figures give the corresponding plots of the best-fit modelled sensitivity, s K /S am .
Figures 7 and 8 both show that the local ionospheric conductivity term is more important at low geomagnetic activity with a strong peak around the minimum of the solar zenith angle v (at F = 0.5 and 14 UT for Hartland and F = 0 and F = 1 and 4 UT for Canberra). This is reflected in the large values A for the lowest activity levels. This peak becomes increasingly less pronounced with increasing am activity level (i.e., A decreases) and at the largest am the pattern is determined mainly by the distance of the station from the midnight auroral oval.

Response functions of geomagnetic indices
From the derived values of A and B for each am activity level bin, we can use equations (2)-(4) to compute the (F-UT) response pattern of any mid-latitude range indices that is compiled using a mathematical algorithm. Rather than show results for all eight am activity bins in every case, we here show just two illustrative ones, representative of low and moderately high activity levels. We choose 10 am < 20 nT for low activity and 60 am < 110 nT for high activity. We avoid the lowest am bin because the lowest row of Figure 8 shows that the fit is not always as good as it is for other panels: this is a sensitivity effect associated with the lowest activity level that can be detected by a single station, compared to that for the am or ap index; that limit being lower for the indices, by virtue of the averaging of data from a number of stations. This means that there are times (as in the case of the bottom row of Fig. 8) when the station is not detecting any activity yet the am index is: in Figure 8 (for Canberra in the southern hemisphere) this is particularly true in mid-winter (F = 0.5). We avoid the largest activity bin (am ! 70 nT) because it is based on the smallest number of samples. Instead we use the second largest and the second smallest activity bins as examples. In all cases, the response functions are computed from the fitting procedure described in Section 2: the modelled sensitivities were computed for 24 different UTs (1 h apart) and 365 F values (daily) and for plotting of the UT-F patterns these were then averaged into the same bins as for the observational data (namely, eight 3-hour bins in UT and 20 18.25-day bins of F). We also made 3-hourly means of the sensitivity (for the same UT bins as the observations, i.e., 0-3 UT, 3-6 UT, etc.) and identified the maximum and minimum 3-hourly value of each UT-F pattern (S max and S min , respectively) as well as the mean (hSi and standard deviation (r S ) of the 160 values (eight UT values and 20 F values) in each pattern. As well as taking the maximum percentage deviations of from the mean (S max and S min ), we quantify the percentage standard deviation V = (100r S )⁄(hSi) for each am activity band. To evaluate the average behaviour, patterns were made for each of the eight am bands defined in Figure 6 and a weighted mean of the V values taken: where N b is the numbers of am observations in the band (given earlier).
We also computed the average response pattern for the index, S av (UT, F) as the similarly weighted means of the eight patterns for the different activity levels Table 3 summarises the results by comparing the largest positive and negative deviations of S av (UT, F) for the tested indices as percentages of the mean, along with the metric V av for various geomagnetic indices: results are given for both the modelling described above and from an empirical comparison with the am index.

Modelled response functions of the an,
as and am geomagnetic indices Figure 9 compares the modelled response patterns (the UT-F plots of index sensitivity) of the hemispheric an and as indices to that for am = (an + as)/2. These are all evaluated for the spatial distribution and weighting functions of stations for a selected example year which is 2014. At this time the IAGA codes of the stations in use are MGD, MMB, PET, POD, ARS, IRT, NVS, CLF, HAD, NGK, FRD, OTT, NEW, TUC, and VIC in the northern hemisphere and CNB, EYR, AMS, GNG, CZT, HER, PAF, AIA, PST, and TRW in the southern hemisphere (see Appendix A for the corresponding observatory names, locations and the intervals over which they were used to construct a given index). The computation procedure for deriving am and the longitude-sector weighting functions is described at http://isgi.unistra.fr/Documents/am_LWFs_ example.pdf.
At both the moderately high and the low activity level examples shown (in the left and right and columns of Fig. 9, respectively) and in both hemispheres, there is a clear seasonal variation with enhanced index sensitivity in summer, S an being largest around F = 0.5 and S as being largest around F = 0 (which is the same as F = 1). If the longitudinal distributions of stations were ideal, the contours would all be vertical in these plots as there would be no UT variation. This is not quite the case, as S an is slightly but consistently larger around 18 UT, and slightly lower around 04 UT at all times of year and all activity levels. S as shows the converse behaviour, being similarly larger around 04UT and lower around 18 UT. Because many features in the S an pattern are the converse of those in the S as pattern, they are averaged out in am and the S am patterns shown in the bottom panels of Figure 9 are much more uniform, especially for high geomagnetic activity. For 3-hourly values, the largest value is S max = 1.0284 for the lower activity range with a minimum of S min = 0.9741 and hence the largest percentage deviations from hS am i = 1, are 100 (S max À hS am i)/hS am i = 2.8% and 100 (S min À hS am i)/hS am i = À2.6%, respectively. Hence, in this case (for which hami = 14 nT) the time-of-year/time-of-day response pattern for am is uniform to within maximum deviations of ±2.8%. For the higher activity range (hami = 74 nT) the corresponding extrema are much smaller, being +0.21% and À0.25%. The patterns are complex and general features very weak but include very slightly stronger annual variation at 10-15 UT and a band of very slightly higher S am around 5 UT. For weighted means of the average UT-F response pattern over all activity levels, S av (UT, F) as defined by equation (7), the extrema are +1.4 and À1.2%, as given by the top row in Table 3. The other metric that we use to quantify the flatness of the average UT-F response pattern is V av defined by equation (6). Table 3 shows that the modelled V av value for the am index is 0.65%.

Modelled and observed response functions
of the aa geomagnetic index Figure 10 shows the F-UT response patterns for aa, S aa (UT, F) for the same high-and low-activity ranges of am used in Figure 9. Note that the colour scale range is considerably expanded in Figure 10 compared to Figure 9 because S aa shows much greater deviations from unity than S am , as is to be expected because aa is derived from data from just two stations. The response pattern will depend on which pair of stations is employed to generate aa. In Figure 10 Table 4 shows that the minimum of the 3-hourly aa sensitivity, S aa , is reasonably constant for these years being À18.5% in 1890 and À16.7% in 2010 for low activity and À18.9% in 1890 and 16.1% in 2010 for high activity. The corresponding maximum values are almost constant at 12.8% for low activity and 12.0% for high activity. As shown in the next section, these extreme deviations from unity are actually smaller than those for Kp (ap) because, although aa is compiled using just two stations, those stations have been chosen to give as much cancellation of the stations' diurnal and annual response sensitivity variations as possible (being in opposite hemispheres and about 10 h apart in local time). The "checkerboard" response pattern for aa seen in Figure 10 is present for both low and high activity and is found in the aa data, as demonstrated by Figure 11. This plot shows the sensitivity of the aa index S aa = (haai/hami)S am , where the ratio haai/hami is taken from data for all available years . Note that the modelled sensitivity of the am index, S am , is very close to unity at all F and UT and so the patterns for haai/hami are almost identical to those shown. The left-hand and right-hand panels are for high and low geomagnetic activity respectively. To get enough samples for this plot, high activity is defined in Figure 11 as am ! 40 nT both when averaging the observed haai and hami values and when calculating the model sensitivity of am, S am . Table 4 shows that the peak deviations from unity are actually very similar in the two activity-level cases. Note also that the response patterns for the hemispheric aa indices aa N and aa S are given, for the current pairing of aa stations, by Figures 7 and 8. Figure 10 shows that there are no strong differences in the modelled aa sensitivity patterns for the years studied. This means that the effects of station changes and of secular drift in the magnetic field on the response pattern for aa have been small. There is an important point to clarify here about the effect of secular change in the intrinsic geomagnetic field. All the UT-F response patterns in this paper have been normalised to unity. This means that any changes because of the drift in the average geographic latitude of the auroral oval will not be included. These effects have been studied by Lockwood et al. (2018a) and a new "homogeneous" aa index, aa H , presented that makes allowance for this drift. The patterns presented in Figure 10 are the time-of-year/time-of-day variations around the annual mean, and although the annual mean estimates will have varied because of secular drift, the patterns will not have changed much at all. However, there is a small secondary effect of the secular change in the field that is included in the patterns  presented and this is in the UT at which midnight MLT occurs, which has an effect via the term T in equation (3). The fact that there is no detectable effect in Figure 10 shows that this effect on the response pattern is very small. The fourth row of Table 3 gives the the largest percentage deviations of the average pattern for the aa index and the V av metric, both modelled and from data (by comparison with simultaneous am data).

Response function of the ap (kp) geomagnetic index
Our model cannot be employed to analyse the response pattern of the ap and Kp indices. This is because equation (2) requires the index to be compiled by linear mathematics so that the solar forcing term can be isolated and cancelled. In the compilation of Kp, the K S /K factor used is a complex function of K and hence the process is achieved by a look-up table, rather than an analytic function and is also non-linear as the look-up table used depends on the activity level. Hence we cannot use the same model analysis as applied above to am and aa. However, we can apply the data-based approach (comparison with the am index) that was used in the previous section for aa. As for aa, we can estimate the response function for ap using the equation Figure 12 compares the am and ap index response patterns over the full interval over which we have both, namely 1959-2017 (inclusive). The middle column of Figure 12 presents the UT-F patterns of the ratio hapi/hami (again for eight 3-hourly UT values, and twenty bins in F that are DF = 0.05 wide, and the eight am activity level bins shown in Fig. 6 and as employed in Figs. 7 and 8). The left hand column gives the modelled am sensitivity patterns, S am (UT, F). Using these patterns, we can estimate the ap sensitivity S ap using equation (8). The results are shown in the right-hand column of Figure 12. Note that the patterns for S ap are very similar indeed to those for (hapi/hami) in the middle column because S am is so close to unity at all UT and F. Figure 12 shows a consistent pattern in S ap (UT, F) with response at 0-8 UT being greater in northern hemisphere winter but that at all other UT being greater in northern hemisphere summer. At low activity levels, the 0-8 UT variation tends to dominate but the 8-24 UT variation increasingly dominates with increasing activity level.
The values of S am are very uniform and close to unity (note that the S am scale is the lower one in Figure 12 and covers a smaller dynamic range than for the middle and right columns by a factor of six). This is particularly true for moderate and high activity. However, even at low activity, the differences between the pattern of hapi/hami and the corresponding pattern of (hapi/hami)S am cannot be detected. The ap index is often treated as homogeneous but, in reality, the network of stations used has changed considerably even over the interval 1959-2017. Given the variability that this introduces, the use of the iteratively-derived factor S am is probably not justified and in the remainder of this section we assume S am is unity and just look at the ratio (hapi/hami). This has the advantage of making the analysis purely empirical (instead of the empirical-model mixture involved in S ap estimates). Figure 13 studies the relationship between the ap and am indices. The grey dots in Figure 13a are 3-hourly values (ap and am) and the cyan points are 8-point (1 day) running means (Ap* and Am*). The black dots are means (with error bars that are ±1r) taken in bins of am that are Dam = 10 nT wide (only bins with six or more samples are shown). The mauve line shows am = ap and the plot shows that ap is consistently smaller than am. In addition, the difference grows with activity levels so that Ap* values are increasingly smaller than Am* values for larger activity. Lockwood et al. (2019b) point out that this is a significant factor when geomagnetic storms during the recent grand solar maximum, as defined using the Ap* data, are compared to those seen during 1868-1932 in the Aa* data or to the estimates for during the Carrington storm (that are often quantified by a proxy equivalent for Aa*). The overall mean of am is larger than that for ap by a factor g = 1.5339.
Note that to a large extent taking 24-hour running means reduces the effect of the UT variation in the sensitivity of any one 3-hourly index; however, it does not remove it completely. This is because the 3-hourly index value will generally have some variation within each 24-hour interval and the phasing of this variation, relative to the UT variations in the index sensitivities, will influence the Ap* and Am* values and influence them differently. Figure 13b and c compares the annual variations, by taking mean values in 20 bins of the fraction of the year, F, that are DF = 0.05 wide (18.25 days). The black dots and line show means of am, the mauve dots and line means of g Â ap. It can be seen that the annual variations are very similar, but that the well-known semiannual variation (Cortie, 1912, Chapman & Bartels, 1940Russell & McPherron, 1973;Cliver et al., 2002;Le Mouël et al., 2004) is proportionally slightly larger, on average, in ap than in am. The average variation with F of the ratio of hap/ami is shown in Figure 13c. Figure 13b and c show that ap also has an annual variation as it underestimates geomagnetic activity during northern hemisphere winter, which is perhaps not surprising given the dominance of northern hemisphere stations and the lower ionospheric conductivities in winter. Figure 13d and e presents the same analysis for diurnal variations. Figure 13d shows that there is a slight diurnal variation in am and a larger one in ap. The diurnal variation in the ratio of the two indices is shown in Figure 13e.
The average S ap (UT, F) pattern gives percentage deviations of 11.2%, 25.4% and À21.6% for r S , S max and S min , respectively. Table 3 shows this is the least uniform of the indices tested in this paper. In Section 4 we develop and test an empirical correction for this.

Response function of the homogeneous aa index, aa H
For comparison, it is useful to apply to the new homogenized aa index, aa H , the same tests as used in the last section to study the constancy of the ap index. It is not instructive to test this index against the sensitivity model because the model is used to correct aa and give aa H . Figure 14 compares with the am index and so corresponds to Figure 12. The left-hand column again gives the S am (UT, F) patterns, the middle column (haa H i/hami) (UT,F) and the right hand column S aaH = (haa H i/hami) (UT,F) Â S am (UT, F). The fluctuations around unity in the middle and right column are of smaller amplitude than for ap and in character are more of a regular UT variation (at all F), which means that averaging over a calendar day or making 8-point running means (to give Aa H and Aa H *, respectively) will further reduce the effect of the non-uniformity of S aaH (UT, F). Figure 15 corresponds to Figure 13 for the aa H index. The agreement of aa H and am (grey dots) and of Aa H * and Am* (cyan dots) in part (a) are both good and linear. In parts (b) and (c), the annual variations of haa H i and hami with F are very similar (with aa H being just very slightly lower values round F = 0.5 and very slightly higher around F = 0 and F = 1). Parts (d) and (e), show there is a UT variation in average aa H that is somewhat greater than that in am. The mean S aaH (UT, F) pattern gives percentage deviations of 11.20%, 25.4% and À21.6% for r S , S max and S min , respectively. Table 3 shows that the average pattern gives percent deviations of 5.54%, +12.6% and À12.3% for r S , S max and S min , respectively for aa H which is considerably better than for the uncorrected aa data and therefore also better than for ap.

A corrected ap index, ap C
In this section, we present a method for empirically correcting the ap index to allow for its non-uniform response function. We do this using the same basic principle as introduced by Bartels and still used today to compile ap, namely activity-dependent look-up tables to correct the time-of-day/time-of-year response. A difference is that we are applying it to an index and not the data from a single station. The tables are provided in the form of arrays of values that can be interpolated to give the multiplicative correction factor required for a given ap, UT and F. We recommend use of Piecewise Cubic Hermite Interpolating Polynomial, (PCHIP) interpolation because it maintains smooth changes in gradient through the data points. Tests show that, although in general that it can sometimes give excessive oscillation between points, in this case it gives more accurate value than a linear interpolation).
Activity level in previous sections was quantified using am index bands, but that cannot be applied here; instead, ap must be used as the aim is to correct ap even at times when am was not measured. Figure 16 studies the distribution of ap values. The blue line gives the c.d.f of the (quantised) 3-hourly ap values and the mauve line the (almost continuous) Ap* values. The vertical grey and white bands divide the Ap* distribution into 20 percentiles, each containing N 20 = 8612 samples. The separation of these percentiles is smaller than the separation of the ap levels below 22 and so all these levels contain considerably more than N 20 samples. The ap = 22 level containts very close to N 20 samples level and the ap = 27 and ap = 32 levels combined give slightly more than N 20 samples, but we have to combine all ap levels greater than 39 to get more than N 20 samples in the tail of the distribution. Hence we derive corrections for ap levels of 2, 3,4,5,6,7,9,12,15,18,22,27 combined with 32, and 39 and greater. We also deal with each of the eight UTs of the ap index separately. For each of these UT-ap combinations we then compute the empirical correction factor f c (UT, F, ap) = hapi/hami in 20 equal-width bins in F. These can be used to convert an ap value for a general F and UT using PCHIP interpolation between the relevant 20 f c (F) values. Figure 17 presents contour plots of daily f c (ap, F) values for the eight UT ranges.
This allows us to turn every ap value into a corrected value ap C = ap Â f c (UT, F, ap). Figure 18 compares the ap C and am values, in the same format as Figures 13 and 15. Part (a) shows that there is very good agreement between ap C and am and between Ap C * and Am*. Parts (b) and (c) shows that the average annual variations of ap C and am are very similar and parts (d) and (e) even their UT variations are well matched. The mean S apC (UT,F) pattern gives percentage deviations of 1.78%, 5.0%, and À4.6% for r S , S max and S min , respectively. Table 3 shows this is a major improvement compared to ap.
The coefficients needed to implement the correction to ap are given in the Supporting Information file attached to this paper. The metadata given in the header to that file also gives some MATLAB program code that converts ap into ap C .

Conclusions
We have presented analysis of the time-of-day (UT)/timeof-year (F) response patterns of various planetary geomagnetic   indices. In general, responses depend on the level of geomagnetic activity because the effects of EUV/X-ray generated solar conductivity local to the observatory dominate the response at low geomagnetic activity, but the effects of the great circle distance between the observatory and the midnight sector auroral oval dominate at high geomagnetic activity.
The diurnal variations in the hemispheric an and as indices are small because of the use of station groupings that are as even in longitudinal separation as possible. On averaging to give the am index, the seasonal variations cancel to a very large extent and so the index sensitivity, S am , is close to unity at all F and UT. This is valid for three-hourly values to within extrema of ±2.9% at low geomagnetic activity, falling to just ±0.5% at high geomagnetic activity. As an overall average, the modelled (UT-F) response pattern of am is constant to within a standard deviation of 0.65% with extreme deviations of +1.4% and À1.2%.
The ap (Kp) index station distribution is not uniform but the use of the K to Ks conversion tables makes allowance for this. These tables were constructed using data from a limited number of years when solar and geomagnetic activity were at moderate levels only. Hence it not that surprising that they may not be ideal over all the 1957-2017 interval tested here. The general similarity of the average annual and diurnal variations of ap and am in Figure 13 indicates that the tables are performing well in that they do not introduce major errors. The right hand column of Figure 11b, however, shows that there are spurious time-of-year variations in the response of ap that depend strongly on UT. For 0 < UT < 10 h there are peaks at almost all activity levels around F = 1 (northern hemisphere winter), whereas at all other UT the peak is around F = 0.5 (northern hemisphere summer). Averaging over all UT gives peaks in the sensitivity at the equinox at that can be seen in Figure 13c, which reveals that that ap exaggerates slightly the semiannual variation in geomagnetic activity. There is also a spurious net annual variation with values lower around F = 1 than F = 0.5. Note that Figure 4a is also consistent with this, showing the greatest average response in ap to solar wind forcing at equinox and a greater response around the June solstice than around the December solstice. Figure 13d shows there is also a persistent net diurnal variation in the ap response with a minimum at 9-12 UT. These average values however hide the fact that the ap response function is a complex and variable function of UT, F and activity level, as shown by Figure 11. We note that these spurious diurnal and annual variations in ap arise because this index employs concentrations of stations in certain regions (particularly Europe). But this, and making the data from all stations mimic the Niemegk reference station by converting to K S, does also have advantages in noise suppression because one is averaging different estimates of the same thing and, when averaged over a whole year of continuous data (for which hS ap i s=1yr = 1), the spurious variations with UT and F are largely averaged outhence the ap index almost certainly provides the most reliable and accurate estimate on annual timescales. This is reflected in the response to solar wind forcing shown in the right hand panels of Figure 4 which reveals that ap performs slightly, but significantly, best on annual averaging timescales. However, we also note that the uneven pattern of S ap will introduce some random sampling noise, depending on the time of year and UT at which the quasi-randomly-occurring largest interplanetary disturbances happen to hit Earth's magnetosphere.
The sensitivity modelling indicates that the aa index response pattern has remained very constant over time and is comparable, and actually slightly better than, that derived empirically for ap. This is because the southern hemisphere data is given equal weight in aa, whereas it has much lower weight in ap.
Our analysis shows that the new "homogeneous" aa index aa H (Lockwood et al., 2018a, b) performs considerably better than aa, having a flatter UT-F response at all activity levels. Naturally, being based on just one station in each hemisphere it cannot match the performance of am, but it has the advantage (for studies of long-term change or requiring large sample numbers) of extending back to 1868 whereas am extends back to only 1959.
We have also presented a method for correcting the ap index for its uneven response pattern. The correction presented here uses all ap and am data since 1959, despite the fact that the network of stations used to generate the am index has undergone (relatively minor) changes in that interval and that used to generate the ap index has undergone considerable changes. This means that, in effect, we are accepting the tabular K-to-K S conversions inherent in ap for the purposes in deriving a correction factor to ap, f c (ap, F, UT), but this is one step removed from accepting them for the purposes of compiling ap itself. Hence this is a first order correction. It would be possible to make a more detailed correction and analyse the response and associated f c (ap, F, UT) factors for each of the various combinations of stations used to construct ap after 1959, as given by Appendix A. However, this would introduce other uncertainties caused by the reduction in the number of samples in each case. In addition, this would still not account for the several changes to the ap network made before 1959. The optimum correction to ap would be to correct the K indices for each station individually using the station sensitivity model and then average them together as this could be done in a consistent way for all the data (since 1932). However, this is a large task and well beyond the scope of this paper. Table 5 quantifies the improvements to the aa and ap indices made by aa H and ap C , respectively, using am as the calibration standard. Note that in the case of aa H , the correction is made through application of the station sensitivity model, whereas for ap C it is purely empirical. The most revealing division of activity levels is to consider the lowest 20% of samples (as determined by am), the highest 20% and the remainder in the inter 20-percentile range. The bottom two rows of Table 5 give the factor improvement brought about in the corrected indices: ideally these values should be infinite as am is used to correct the data and then used to test the corrected index. However, this is not going to be the case for a variety of reasons: in the case of aa H , the main reason is that by averaging many stations am will always suppress geophysical and instrumental noise more effectively than aa which is based on just two stations; in the case of ap C , the main reason is the effect of changes in the ap observation network. It can be seen that improvements, in terms of lowering r S (flattening the response function) are always in excess of three and that for extrema the improvement in S max is always better than that in S min . In both cases the improvement for the bulk of the distribution is always better than for the lower 20% and the upper 20%.
Because taking running means over 24 h averages out the UT dependence, the improvements in Aa H * and in Ap C * are not as great as for the 3-hourly values aa H and ap C. Nevertheless, the ranking order of storm days in these indices can be considerably different from those for Aa* and in Ap*. The occurrence of storm days, as quantified by Aa H * since 1868 will be the subject of another paper. Lockwood et al. (2019b) have studied the behaviour and ranking order of extreme events since 1932, as quantified using the Ap C * index.

Recommendations
The major differences between the indices studied in this paper arise from the geographic distribution of stations used to compile them. The aa index uses just two stations in order to give a long data sequence. The am index has been designed to make the stations' distribution as even as possible and so give the most even time-of day/time-of-year response that is allowed by the availability of land that is suitable for housing a magnetometer. On the other hand, the ap index is, and has always been, dominated by European stations and is constructed in a way that re-calibrates all data to a European station (Niemegk): we have shown that because it is averaging more data that is similar (or is made similar), this gives it better sensitivity and more accurate annual means but an un-even time-of-day/timeof-year response. Lockwood et al. (2018a, b) have presented a model-based means for correcting the aa index, to give the "homogeneous" aa index aa H , that allows for both long-term secular changes and the unevenness of the time-of-day/ time-of-year response pattern caused by the use of just two stations. In this paper, we have used empirical comparisons with the am index to generate a corrected ap index, ap C , that allows for the unevenness of the time-of-day/time-of-year response pattern of ap identified in this paper. Because of the differences between these corrections, their potential applications are different. The corrections to aa are based on physical model and employ parameters that can be projected into and measured in the future (the stations' geographic and geomagnetic coordinates and the solar declination angle): hence we recommend that it is used for both re-evaluation of past studies and for studies that extend into future. On the other hand, the ap corrections (to give ap C ) are empirical and based on comparison with past am index data and so will become increasingly unreliable with time for future data. Hence ap C will be useful in re-evaluation of some past work for which evenness of the time-of-year/time-ofday response to solar wind forcing is important (for example studies involving the ranking order of the largest geomagnetic storms, Lockwood et al., 2019b, c) but it would not make sense to recommend its use for future work with that requirement, because as long as the am index is available it is the better option in this context. That is not to say, that the ap index is not of value in other contexts. For example, in annual means of the data, the time-of-year/time-of-day response is averaged out and so long term trends are well monitored and for such Table 5. Standard deviations and maximum and minimum percentage deviations from unity of observed 3-hourly index sensitivities, S, estimated assuming the am index is ideal for 1959-2017, for low, middle and high geomagnetic activity levels. The bottom two rows give the factor by which the corrected index is improved in terms of the uniformity of its response. applications keeping the same index (and keeping its compilation as homogeneous as possible) is important. We note in that ap extends back 27 years further into the past than am and so for studies where the sub-annual response is important, ap C can be used to extend the am data sequence back a further 27 years to 1932. Looking to any potential development of the networks of stations, it is important that any changes made enhance, rather than undermine, the strengths of the index in question. From the above, we would argue that in the case of aa the strength is longevity. In the case of ap it is part longevity and part sensitivity brought about by averaging many nearby stations and calibrating all station to Niemegk. For longevity, homogeneity of compilation is by far the most important consideration, meaning that both past and future station changes are undesirable. In the case of aa, the station sensitivity model can be used to minimise the effects of station changes, as does the scaling to Niemegk in the case of ap. Hence moving ap stations to make the geographic distribution more even (and make ap more like am) would not be desirable change as it would undermine the advantages that ap has. For am, the choice of stations and the method of their combination have been shown to be good in that it gives a very flat time-of-day/time-of-year response pattern, which is the index's most important strength. This is true at all levels of geomagnetic activity, except the very quietest which suggests a sensitivity issue. In terms of improving the network distribution for am, the major limitation is the availability of accessible land on which a magnetometer can be placed and operated. However, there may be advantages for some applications in increasing the numbers of stations in all the longitude sectors to give increased averaging out of local site and instrumentation effects and so increase sensitivity. However, so as not to disrupt the evenness of the response, one would want to make similar, matching, improvements in all sectors.