Validating the Space Weather Modeling Framework (SWMF) for applications in northern Europe-Ground magnetic perturbation validation

In this study we investigate the performance of the University of Michigan’s Space Weather Modeling Framework (SWMF) in prediction of ground magnetic perturbations (DB) and their rate of change with time (dB/dt), which is directly connected to geomagnetically induced currents (GICs). We use the SWMF set-up where the global magnetosphere provided by the Block Adaptive Tree Solarwind Roe-type Upwind Scheme (BATS-R-US) MHD code, is coupled to the inner magnetosphere and the ionospheric electrodynamics. The validation is done for DB and dB/dt separately. The performance is evaluated via data-model comparison through a metrics-based approach. For DB, the normalized root mean square error (nRMS) and the correlation coefficient are used. For dB/dt, the probability of detection, the probability of false detection, the Heidke skill score, and the frequency bias are used for different dB/dt thresholds. The performance is evaluated for eleven ground magnetometer stations located between 59 and 85 magnetic latitude and spanning about five magnetic local times. Eight geomagnetic storms are studied. Our results show that the SWMF predicts the northward component of the perturbations better at lower latitudes (59 –67 ) than at higher latitudes (>67 ), whereas for the eastward component, the model performs better at high latitudes. Generally, the SWMF performs well in the prediction of dB/dt for a 0.3 nT/s threshold, with a high probability of detection 0.8, low probability of false detection (<0.4), and Heidke skill score above zero. To a large extent the model tends to predict events as often as they are actually occurring in nature (frequency bias 1). With respect to the metrics measures, the dB/dt prediction performance generally decreases as the threshold is raised, except for the probability of false detection, which improves.


Introduction
The awareness of the importance of space weather with respect to safety and life on Earth has increased and gained international interest. Among the space weather effects of concern are ground magnetic perturbations, which arise as a result of currents from the solar wind-magnetosphere-ionosphere coupling/ interaction. The rate of change of the ground magnetic perturbations with time (dB/dt) is directly linked to geomagnetically induced currents (GICs) which have potential impact on highpower voltage transmission systems (e.g., Boteler et al., 1998;Pirjola, 2005). In addition to this, unexpected geomagnetic perturbations (DB) can disrupt navigation activities that are dependent on the ground geomagnetic field direction, such as directional drilling used by the oil and gas industry (e.g., Reay et al., 2005). Therefore, the ability to forecast or predict these effects of the solar wind-magnetosphere-ionosphere coupling in space and on the ground is crucial. A number of challenges and campaigns have been carried out in the past related to evaluating how well the existing models perform in reproducing observations (e.g., Pulkkinen et al., 2011Pulkkinen et al., , 2013Rastätter et al., 2013). The most recent challenge for the ground magnetic perturbations is the community-wide validation of geospace model magnetic perturbations predictions to support model Topical Issue -Geomagnetic Storms and Substorms: a Geomagnetically Induced Current perspective transition to operation in Pulkkinen et al. (2013). The University of Michigan's space weather modelling framework (SWMF) (Tóth et al., 2005(Tóth et al., , 2012 was then selected and transitioned to operation at the NOAA Space Weather Prediction Center (SWPC). In our study, we further assess the performance of the SWMF in predicting regional and localised ground magnetic perturbations in the northern Europe sector. With quantifiable metrics, we aim at further understanding the strengths and weaknesses of SWMF in prediction of ground magnetic perturbations.
The SWMF is a software framework that enables coupling of different space physics systems in a single simulation. In this study, for the global magnetosphere domain the Block-Adaptive Roe-type Upwind Scheme (BATS-R-US) MHD code (Powell et al., 1999) is used and coupled to the Rice Convection Model (RCM) of the inner magnetosphere (IM) (Wolf et al., 1982;Sazykin et al., 2002;Toffoletto et al., 2003), and the Ridley ionosphere model (RIM)  which is an ionospheric electrodynamics (IE) solver. BATS-R-US is an adaptive mesh magnetohydrodynamics (MHD) code that solves the ideal MHD equations throughout the magnetosphere (Powell et al., 1999;De Zeeuw et al., 2000).
Understanding model capabilities to reproduce observed features in the signal of interest is a key element of space weather monitoring and forecasting. One of the key elements investigated in this work is the rate of change of the horizontal component of ground magnetic perturbations with time, dB/dt. The primary argument for studying dB/dt is that the time derivative of ground magnetic field can be used as an indicator for level of geomagnetically induced electric field or geoelectric field, on the surface of the Earth (e.g., Viljanen et al., 2001). Hence regional and local predictions of the ground geomagnetic perturbations are a necessity for modelling and forecasting space weather impacts in areas such as power grids and resource pipelines. The other element studied in this work is the magnetic perturbation on the ground DB which is important in navigation aspects such as directional drilling, where the geomagnetic field is followed for navigation.
The paper is organised as follows. The setup settings of the SWMF simulations, as well as the storm events and ground magnetometer used, are described in Section 2. Section 3 details the metrics used to quantify the model performance. The results are presented in Section 4 followed by a discussion of our results in Section 5 and finally a summary is given in Section 6.

Validation setting 2.1 SWMF configuration
The SWMF is driven by solar wind plasma (velocity and density) and interplanetary magnetic field input data at the upstream L1 location from the advanced composition explorer (ACE) and WIND satellites. The other input is the F10.7 cm flux. Using a tool provided on the Community Coordinated Modeling Center (CCMC) website (https://ccmc.gsfc.nasa.gov/requests/ GetInput/get_ace_K.php), data were propagated from the satellite position to x = 33 R E which is just outside the upstream boundary of the simulation box, with averaged V x velocity. The results presented in this paper are for runs with highest resolution of 0.125 R E close to the inner boundary and lowest resolution of 8 R E in the distant tail. The computational domain extends from 32 R E upstream to À224 R E downstream in the x direction and ±128 in the y and z GSM coordinates. The inner boundary is 2.5 R E from Earth's center. The Sokolov scheme (Sokolov et al., 2002) with Koren's limiter (Koren, 1993) and b = 1.2 is used. The integrated density and pressure are applied as outer boundary conditions for inner magnetosphere model assuming a 80% H + to 20% O + number density ratio.
The global magnetosphere (GM) component of the SWMF transfers energy and mass from the solar wind to the magnetosphere and it is responsible for the convection in the magnetosphere . The GM component provides the IE component with field aligned current density and the IM component with magnetic field structure and plasma density (De Zeeuw et al., 2004). In return, the GM component receives density and pressure information from the IM component and electric field from the IE component. The IM component also receives electric potential from IE component.

Storm events
Eight storm events are examined in this study. Six of these (events 1-5, 7) were used in the GEM Challenges (e.g., Pulkkinen et al., 2010Pulkkinen et al., , 2011Pulkkinen et al., , 2013Rastätter et al., 2011Rastätter et al., , 2013, while event 8 is one of those recommended by Welling et al. (2018) for inclusion in the validation suite and event 6 is a new event included in our validation. Events 3, 7, and 8 are known major storms, the AGU, the Halloween and the St. Patrick's Day storms, respectively. A list of all the 8 storm events as well as their F10.7 flux, maximum AE index, and the minimum SYM-H index is shown in Table 1. Table 2 shows the names and geomagnetic coordinates of the eleven ground magnetometer stations used in this study. It should be noted that not all the stations had data for all of the eight events. The last column in Table 2 shows the events for which we do not have data at the corresponding stations. The geomagnetic coordinates of the stations are entered as inputs of the SWMF and act as locations of virtual magnetometer stations where the model predicts/calculates the ground geomagnetic perturbations. These are the predicted perturbations that are compared to the actual perturbations measured at the ground-based stations. The magnetometers used here are located in the northern Europe region between magnetic latitude 59°and 85°, spanning 5 h of MLT as shown in Figure 1. The data from the stations were transformed from geographic coordinates into geomagnetic dipole coordinates using the Apex python tool which uses the IGRF-12 with coefficients from 1900 to 2020 (Thébault et al., 2015).

Selected metrics
The performance of the model for each event is evaluated via model-data comparison. The first parameter evaluated is the ground magnetic field perturbation DB. Only the horizontal components of the ground geomagnetic field are investigated in the analysis presented in this paper. The DB comparison is done using the normalized root mean square error (nRMS) defined in N.K. Kwagala et al.: J. Space Weather Space Clim. 2020, 10, 33 equation (1) and cross correlation coefficient. This is done independently for the northward (DB n ) and eastward (DB e ) components. The subscripts "p" and "o" in equation (1) refer to prediction and observation, respectively: The average is taken over the entire simulation period. A nRMS error of zero indicates that the model exactly reproduces the observations, while nRMS less than one indicates that the model prediction is in good agreement with the observations. A nRMS error above one indicates that the model prediction significantly diverges from the observations, i.e., may possibly have opposite trends or large offsets. The second parameter evaluated in this work is the dB/dt, similar to what was used in prior work (e.g., Pulkkinen et al., 2013). The magnitude of the horizontal dB/dt is directly related to the geomagnetically induced currents (GICs) since it generates the geoelectric field (e.g., Viljanen et al., 2004;Pulkkinen et al., 2006). Following Pulkkinen et al. (2013), for each event and station dB/dt is defined as where B n and B e are the two horizontal components (northward and eastward) of the magnetic field. The magnetometer stations record data at different intervals of 1 s, 10 s and 20 s. For the validation, we take the data point every minute to match the time interval of data extracted from the model. The central difference is used to calculate dB/dt for both observation and model data.
An event-based analysis is carried out, where an event is defined as the absolute value exceeding an event threshold within a forecast window 0 t t f . For this study we use thresholds 0.3, 0.7, 1.1, and 1.5 nT/s and a forecast window of 20 min. These are the same thresholds that have been used in the previous validation studies (e.g., Pulkkinen et al., 2013). For each storm event a contingency table is obtained containing the number of correctly predicted threshold crossings

Probability of detection (POD)
POD is defined as which is a measure of the fraction of observed threshold crossings that were correctly forecast. POD ranges from 0 to 1 with 1 being the perfect score. POD > 0.5 implies that the model tends to predict more correct crossings than it misses.

Probability of false detection (POFD)
POFD is defined as which measures the fraction of observed no crossing events that are incorrectly forecast as crossings. POFD ranges from 0 to 1 with 0 as the perfect score. POFD < 0.5 implies that the model tends to predict more correct no crossings than false crossings.

Heidke skill score (HSS)
HSS is defined as which measures the fraction of correctly predicted threshold crossings after eliminating those predictions that would be correct purely by random chance. It ranges from negative infinity to one. Negative values indicate that random forecast is better than model prediction while zero is no skill, just as good as random. 1 is the perfect score.

Frequence bias
Frequence bias (FB) is defined as which measures how well the forecasts correspond to the observations. FB < 1 indicates that the model predicts events less frequently than nature, whereas FB > 1 indicates that the model predicts events more frequently than nature. 1 is the perfect score.  The cyan line shows the mean at each magnetometer station while the red line shows the median. A high correlation coefficient indicates that the model predicts similar trends as the observations, whereas nRMS error below 1 indicates that the model predicted magnitudes are comparable to those observed. Generally, our results show that the model tends to predict the northward component perturbations better at lower (sub-auroral) magnetic latitude (59°-68°), with mean nRMS 1 and correlation coefficient above 0.5. On the other hand, for the eastern component, the model tends to perform better at higher magnetic latitudes (>68°). For the polar cap stations, though, the performance is similar for both components.

nRMS error and cross correlation coefficient at different magnetic latitudes
One storm event is selected for further analysis. The selected event is the St. Patrick's Day storm of 2015 (Event 8). The simulation for Event 8 is three days long making it the longest of the eight events simulated in this study. Figures 3 and 4 show the data-model comparison for the northward and eastward components, respectively for Event 8. Generally, the model seems to capture the large scale trend of the ground perturbations, but sometimes tends to exaggerate the magnitude of the northward component especially at the high magnetic latitudes (Fig. 3). This overestimation would contribute to the nRMS error >1 at these latitudes. Another striking feature is the large amplitude perturbation that is short lived at lower latitude stations (TRO, LYC, DOB), and which is completely missed by the model at around 18:00 UT on 17 March.
For the eastward component shown in Figure 4, the model generally reproduces the observed perturbations, particularly the longer period variations, to a good degree at the high latitudes both in magnitude and trend. However, the model performs poorly at the lower latitudes. The short duration large amplitude perturbations observed in the northward component, are also observed in the eastward component and the model completely misses them here as well. The fact that these pulses are missed in both components suggest that they likely arise from highly localised enhancements in currents that are not captured by the model (e.g., Yu & Ridley, 2008). The other feature is during the early hours of 18 March, where the model exaggerates the negative eastward component at the high latitude stations.
We further investigate the eastward and northward component perturbations and the modelled contributions from the different current sources. For this analysis, three of the ground magnetometer stations, THL (85°) at the highest latitude (polar cap) in the study, BJN (71.3°) for auroral latitude, and the lowest latitude (subauroral) DOB (59.1°) for Event 8 are selected. Figure 5 shows the contribution from Hall current (green line) compared with the observed (black line) and modelled (red line) ground perturbations. The figure shows that the Hall currents contribute the most to predicted total perturbation for both northward and eastward components, with the exception of the eastward component at the lowest latitude station DOB.
The total contribution from the field aligned currents (FACs) and Pedersen currents is shown by the cyan line in Figure 6. For vertical magnetic field lines, uniform Hall and Pedersen conductances and a horizontal ionosphere, the ground magnetic perturbations due to FACs and Pedersen currents should exactly cancel (e.g., Schield et al., 1969;Ridley et al., 2004). If they did we would expect the total contribution from FACs and Pedersen currents to be zero at the polar cap station THL where the magnetic field lines are most vertical. These ideal conditions clearly do not apply in reality since Figure 6a shows a non-zero contribution from the FACs and Pedersen currents at THL. The total modeled eastward component at the lowest latitude (DOB) is also dominated by the FACs and Pedersen current contributions. On comparing Figures 5 and 6, the magnitude of the exaggeration during the recovery period clearly arises from exaggeration of the Hall currents contribution. The contribution from the other magnetosphere currents (MHD) is shown by the magenta line in Figure 6 and it is negligible at the magnetic latitudes shown.  Figure 7 presents the four dB/dt metrics POD, POFD, HSS, and FB for all the stations at different magnetic latitudes for all events for a dB/dt threshold of 0.3 nT/s. Generally the model performance varies from event to event and for different latitudes. To capture the general performance, we plot the mean (cyan line) and median (red line) at each station as well. The results show that the model generally performs reasonably well with high mean and median POD score of %0.75 and 0.8, respectively, and low POFD (mostly <0.4) and prediction skill better than a random guess i.e., HSS > 0 although mainly below 0.5. The model also tends to predict events as often as they actually occur in nature at lower latitudes with a mean FB close to one and overestimates the frequency of occurrence at higher latitudes. When the model prediction frequency diverts from nature (FB 6 ¼ 1), it tends to underestimate the frequency of occurrence of events at the lower latitudes. To summarize the model performance at 0.3 nT/s threshold, we plot POD against POFD in Figure 8a and HSS against FB in Figure 8b. The dashed lines in Figure 8a mark probability 0.5. For a perfect model prediction all the points should be in the left upper corner. This is where the majority of the points are, however some of the high POD scores are associated with high POFD, above 0.5 (right upper corner). The vertical dashed line in Figure 8b marks the perfect FB score (FB = 1) and the horizontal line marks the no skill score (HSS = 0). Ideally, we want all the points to be along the vertical line and above the horizontal line and this is where most of the points are observed. Since the above results are based on the 0.3 nT/s threshold, the result is not very meaningful for very intense storm events like Event 7 (Halloween storm 2003). For this reason, the most intense events 3, 7, and 8 are investigated further at all thresholds 0.3, 0.7, 1.1, and 1.5 nT/s. Figure 9 shows the summary plot similar to Figure 8 for the four dB/dt thresholds (0.3, 0.7, 1.1, and 1.5 nT/s) for Events 3, 7, and 8. Generally, the performance with respect to the dB/dt metrics decreases with increasing threshold, except for the POFD which improves with increasing threshold. This implies that any predicted high threshold crossings are most likely to be true events. The Heidke skill score gradually drops at higher thresholds. The frequency bias FB value shows that the model tends to predict at a frequency closer to that of nature i.e., FB = 1 at the lowest threshold shown (0.3 nT/s), however, the frequency of prediction diverges from nature as the threshold is increased.
Similar to the previous analysis of the DB perturbations, we now look at the direct data-model comparison for dB/dt for event 8 at all ground magnetometer stations. The comparison here is done using smoothed data (both model and observations) with a running mean window of 20 min which is the forecast window we use in the dB/dt metrics analysis. Figure 10 shows the comparison. The stations are arranged decreasing in magnetic latitudes from (a) to (j). This shows that for event 8 the model tends to overestimate the amplitudes at higher latitudes and understimate at the lower latitudes.   N.K. Kwagala et al.: J. Space Weather Space Clim. 2020, 10, 33

Summary and discussion
In this work, we have investigated the performance of the SWMF in predicting ground magnetic perturbations in the northern Europe sector. Both the perturbations DB and dB/dt have been investigated. The performance has been evaluated using a metric-based analysis similar to those used in the past validations (e.g., Yu & Ridley, 2008;Pulkkinen et al., 2011Pulkkinen et al., , 2013. We have presented the model performance per station per event. It is clear that the model performance varies at each event, however, we have presented the mean and median as a proxy for the general performance (Figs. 2 and 7).
The first part of the analysis focusses on investigating how well the model predicts the horizontal components of the ground magnetic perturbations DB n and DB e . From the mean and median of the normalized root-mean-square error and the cross correlation coefficients in Figure 2, the model tends to perform better at lower latitudes (close to subauroral latitudes) in prediction of the northward component. On the other hand, the model tends to perform better in the predicting the eastward component at higher latitudes (auroral and polar cap latitudes). A more detailed analysis of event 8 reveals that the model sometimes tends to exaggerate the amplitude of the northward perturbations at auroral latitudes. This enhancement is likely to contribute to the nRMS error above one at these latitudes. The model captures the trend but sometimes exaggerates the magnitude. For example, Figure 5b at BJN suggests that the exaggeration arises from the predicted Hall current contribution. Through the Biot-Savart law, the northward ground magnetic perturbations would be induced by an eastward current in the ionosphere (e.g., Yu et al., 2010). This indicates that the model is predicting a higher (Hall) current density in the eastward N.K. Kwagala et al.: J. Space Weather Space Clim. 2020, 10, 33 direction than what actually exists. A deeper investigation into this is a topic for a follow up study.
There are a number of features which may arise from misplacement of the currents in the SWMF with respect to the magnetometer stations. One of such features is the eastward component of the perturbations that is in the opposite direction to the observed perturbations, for example at the lower latitudes from FHB to DOB (71°-59°Figs. 4g-4j) at 10-18 UT on 17th March. This could explain the poor DB e prediction performance at these latitudes. These are directly connected to the northward current density in the ionosphere. Such oppositely directed eastward component perturbations were also reported by Yu & Ridley (2008).
The short lived large amplitude perturbations observed at lower latitude stations FHB, TRO, LYC, and DOB that are greatly underestimated or completely missed (Figs. 3 and 4) also correspond to a large dB/dt (Fig. 10) lasting just a few minutes. These perturbations could be due to very localised current structure as the signature is only observed from 18.50 to 19.60 MLT between 59°and 66.70°magnetic latitude. We however note that we do not look at all stations in this region to confirm this.
The second part of our analysis focusses on investigating how well the model predicts the rate of change of the ground magnetic perturbations dB/dt, a parameter crucial in the derivation of the geoelectric field. The geoelectric field is a key contributor to the GIC. With respect to all the four metrics POD, POFD, HSS, and FB, the model performs reasonably well. Unlike the DB, there is no clear magnetic latitude dependence seen in the dB/dt metrics analysis mainly because it is the magnitude of horizontal dB/dt that is investigated. Therefore the errors of the DB estimates shown in event 8, like the overestimate in magnitude and oppositely directed perturbations, would have no effect, especially for the lower dB/dt thresholds. However, a direct dB/dt data-model comparison shown in Figure 10 reveals that the model tends to miss large dB/dt at the lower latitudes while at the higher latitudes it tends to overestimate the dB/dt amplitudes. The summary in Figure 8 clearly shows a good performance at the lowest 0.3 nT/s threshold. This performance decreases with increasing threshold except with respect to POFD, which actually improves. This implies that any event prediction made by the model at large thresholds will most-likely be an actual event in reality. The results reported by Pulkkinen et al. (2013), which were obtained by integrating over several stations and all their studied events, showed that both POD and HSS were below 0.5 for threshold 1.5 nT/s. This is in agreement with our results except that our results show POD > 0.5 at this threshold for some stations. The higher POD could also be attributed to the higher grid resolution (highest 1/8 R E ) in the SWMF configuration we use. Increasing the grid resolution has been reported to improve the prediction performance of global MHD models (e.g., Pulkkinen et al., 2010Pulkkinen et al., , 2011. One of the possible sources of the discrepancy between the SWMF predictions and the observations could arise from the uncertainty in the solar wind data driving the model. There is a possibility that the solar wind seen by the satellite at a point in the L1 position is not necessarily what reaches the magnetopause and in addition, the methods used to propagate the L1 measurement to the bow shock can also introduce errors to the solar wind input (e.g., Morley et al., 2018). Morley et al. (2018) showed that the SWMF prediction skill can be improved by taking into account the uncertainty in the solar wind input.
Different configurations of SWMF can also affect the discrepancy between the predictions and observations. Some of the configurations or imperfect specifications that could lead to uncertainty in the model include empirical ionospheric conductance (e.g., Welling et al., 2017) and low grid resolution (e.g., Haiducek et al., 2017). A detailed investigation of the different sources of discrepancy will be the topic of a follow-up study.

Summary
We performed a validation of the SWMF in predicting ground magnetic perturbations for the northern Europe sector.
The key results show that the model: -Performs better at high magnetic latitudes above 67°than at lower latitudes 58°-67°in predicting the eastward component of the magnetic perturbations on the ground and the reverse is true for the northward component. -Captures the general trend but tends to overestimate the magnitude of northward ground magnetic perturbations particularly at the high magnetic latitudes. -Generally performs reasonably well for dB/dt predictions for the 0.3 nT/s threshold. -Predicts the low threshold (0.3 nT/s) crossings at a rate very close to the frequency of occurrence in reality. This changes when the threshold is increased, in which case the model diverges from the real-world rate underestimating at lower latitudes and sometimes overestimating at higher magnetic latitudes. -Mostly underestimates the large amplitude short-lived perturbations likely associated with localised current structures. and the Center for Space Environment Modeling at the University of Michigan. The Spacepy python library (Morley et al., 2011) was used to read and analyse the SWMF output. It is available at https://github.com/spacepy/spacepy. The simulations were performed on resources provided by UNINETT Sigma2the National Infrastructure for High Performance Computing and Data Storage in Norway. This project has been funded by the European Space Agency under the PRODEX Experiment contract 4000124903. The editor thanks two anonymous reviewers for their assistance in evaluating this paper.