Neural network based model for global Total Electron Content forecasting

We introduce a novel empirical model to forecast, 24 h in advance, the Total Electron Content (TEC) at global scale. The technique leverages on the Global Ionospheric Map (GIM), provided by the International GNSS Service (IGS), and applies a nonlinear autoregressive neural network with external input (NARX) to selected GIM grid points for the 24 h single-point TEC forecasting, taking into account the actual and forecasted geomagnetic conditions. To extend the forecasting at a global scale, the technique makes use of the NeQuick2 Model fed by an effective sunspot number R12 (R12eff), estimated by minimizing the root mean square error (RMSE) between NARX output and NeQuick2 applied at the same GIM grid points. The novel approach is able to reproduce the features of the ionosphere especially during disturbed periods. The performance of the forecasting approach is extensively tested under different geospatial conditions, against both TEC maps products by UPC (Universitat Politècnica de Catalunya) and independent TEC data from Jason-3 spacecraft. The testing results are very satisfactory in terms of RMSE, as it has been found to range between 3 and 5 TECu. RMSE depend on the latitude sectors, time of the day, geomagnetic conditions, and provide a statistical estimation of the accuracy of the 24-h forecasting technique even over the oceans. The validation of the forecasting during five geomagnetic storms reveals that the model performance is not deteriorated during disturbed periods. This 24-h empirical approach is currently implemented on the Ionosphere Prediction Service (IPS), a prototype platform to support different classes of GNSS users.


Introduction
The provision of accurate ionospheric forecasts depends on a variety of constraints, including a better understanding of the phenomena ruling the interactions between the Sun and the near-Earth space, the availability of suitable measurements as well as the modelling capability of ionospheric-related quantities. Development of proper ionospheric forecasting solutions further complicates not only because helio-geophysical disturbances alter the "regular" properties and dynamics of the ionospheric plasma, but also because the users demanding for their implementation in real-time to support a variety of needs, e.g. for mitigating the ionospheric effects of space weather on modern technological infrastructures relying on space-based navigation systems (Cesaroni et al., 2015;Park et al., 2016). Accordingly, efforts are continuously devoted to upgrade the so-called "climatological models", able to provide median features of, e.g., the ionospheric electron density profile, to the real-time weather predictions. This is for example the case of the well-known International Reference Ionosphere (IRI; Bilitza, 2001;Bilitza et al., 2017) and NeQuick (Radicella & Leitinger, 2001;Nava et al., 2008; models, built on empirical approaches and multi-instruments data ingestion and assimilation (see, e.g., Nava et al., 2006;Galkin et al., 2012).
Focusing on the Total Electron Content (TEC, i.e. the integral of electron density along the path satellite-receiver at ground), nowadays the availability of dense network of GNSS receivers allows its short-term forecasting with horizon of tens of minutes in advance in real time based on physical and/or empirical approaches. Among the former, a model providing Topical Issue -Space Weather research in the Digital Age and across the full data lifecycle maps of forecasted vertical TEC (vTEC) (Mannucci et al., 1998), on the global scale is the Coupled Thermosphere Ionosphere Plasmasphere Electrodynamics Model (CTIPe), firstly built on the three-dimensional, time-dependent numerical model of the thermosphere introduced by Fuller Rowell & Rees (1981). The model has been upgraded in the last decades by including convection model as well plasmasphere and mid-low latitude ionosphere. Further extended for realtime operation (https://www.swpc.noaa.gov/products/ctipe-totalelectron-content-forecast), CTIPe shows its potential to compete with empirical models for reliable short term forecast of the ionosphere under disturbed geomagnetic conditions (Codrescu et al., 2012 and references therein).
Short-term vTEC forecasting by empirical approaches are adopted and implemented in regional as well as in global services. Among the others, the Space Weather Application Center Ionosphere (SWACI) provides in real time vTEC forecast one hour in advance over Europe as well and at global scale (https://impc.dlr.de/products/total-electron-content/forecast-tec/). The empirical approach is based on the Neustrelitz TEC Model (NTCM) (Jakowski et al., 2011), a polynomial model consisting of linear terms used as background model. The forecasted map results as a sum of the actual vTEC map and a weighted sum of the temporal vTEC gradient of the previous hour and the temporal gradient of the previous day at the same time. The NCTM and other empirical models have been recently tested over Europe revealing the NCTM capability to extend the forecast up to 24 h in advance under quiet geospatial conditions (Badeke et al., 2018).
Still within operational services, the International GNSS Service (IGS) releases actual high valuable vTEC global maps in the form of different products (rapid and final products, https://www.igs.org/products) with variable latency (in near real-time and/or days after) that users can access for their own purposes. The GLOBAL Ionospheric Maps (GIM) are based on the joint efforts for GNSS data processing by the IGS ionospheric working group (Iono-WG) and are available online since 1998 (Hernández-Pajares et al., 2009). In 2017, Hernandez-Pajares and co-authors successfully assessed the accuracy of the GIM-UQRG computed by Universitat Politècnica de Catalunya (UPC) by comparison with external data from difference of Slant TEC, based on independent global positioning system data GPS (dSTEC-GPS) and vTEC altimeter (Hernández-Pajares et al., 2017).
The potential of Global vTEC forecasting up to 24 and 48 h from GIM UPG has been investigated but, as claimed by the authors, the prediction of TEC disturbances induced by geomagnetic storms and other impulsive events was not yet accounted for in the prediction model design (García-Rigo et al., 2011).
Bearing this in mind, the challenge we address in this paper is the development of a global, empirical vTEC forecasting tool with a horizon up to 24 h under quiet as well as disturbed geomagnetic conditions for an operational service. The disturbed geomagnetic conditions are triggered by solar storm events, i.e. transient phenomena originating in the Sun that modify the regular behavior of the ionosphere through the Solar Wind-Magnetosphere-Ionosphere interaction. Among this: geoeffective Coronal Mass Ejections (CMEs) and fast solar wind streams triggered by Coronal Holes (CH).
We take advantage of a Neural Network (NN) approach, introduced since the end of 90's for ionospheric forecasting on different spatial/temporal scales. Such approach has been even considered to predict ionospheric scintillation at high latitude due to the formation of irregularities causing fluctuations of the amplitude and phase of radio waves crossing them (McGranaghan et al., 2018). Early successful attempts of NN application were addressed to single station forecasting of the ionospheric F2 layer related-parameters then, increasing the availability of both ground based GNSS as well satellite in situ data, their use has been extended to TEC as well as to other parameters describing the Sun-Earth interactions (see, e.g., Sai Gowtam & Tulasi Ram, 2017 and reference therein). NN potential was discussed for the short-term forecasting (up to 1 h in advance) of TEC over Europe on November 2013 by Tulunay et al. (2006), highlighting how the RMSE of the prediction increases with the prediction time (from 10 min to 1 h). Habarulema et al. (2011), provided performance of regional NN TEC variability forecasting indicating a decreasing performance of the model accuracy under stormy events. Short term TEC forecasting (30 min in advance) at single station by Back Propagation (BP) NN and by Radial Basis Function (RBF) NN has been tested by Huang & Yuan (2014) revealing acceptable results with a maximum RMSE of 2 TECU during the test cases. NN combined with GIMs has been explored as a strategy to support single frequency user for the ionospheric delay on the global scale (Perez, 2019). IONONet, the resulting global model for TEC prediction (1 or more days in advance) seems promising but currently unable to represent the equatorial anomaly behavior. Recurrent NN combined with TEC global map by CODE (Center for Orbit Determination in Europe, Switzerland) has been introduced for the provision of TEC forecasted maps at a global scale from 2 h up to 48 h in advance (Cherrier et al., 2017). The training data set refers to the period January 2014-May 2016, while the testing was performed from July to December 2016. The performance of the approach is tested over single stations with promising results in terms of RMSE that is found to increase as function of the forecasting horizon. In addition, decrease of the performance has been found during ionospheric disturbed conditions. Table 1, far to be exhaustive, attempts to highlight the NN forecasting approaches described above addressed to ionospheric parameters and their main features.
Inspired by the "machine learning" concept, the TEC forecasting model uses the rapid vTEC GIM maps product to instruct a nonlinear autoregressive network with exogenous (external) inputs (NARX) (Nørgård et al., 2000) that learns the vTEC behavior at selected points of the grid under different helio-geophysical conditions. Moreover, the model makes use of the NeQuick2 (Nava et al., 2008) and of an effective solar activity index R12eff to extend the forecasting at a global scale. The data and method used are described in Section 2. For validation purposes, statistical evaluation of the model performance is reported in Section 3. Such validation is based on the application of the forecasting process along one year from 1 June 2017 to 31 May 2018 and by comparison between the forecasted global vTEC maps and the final GIM UPC product, the more reliable vTEC assessment currently available. Five cases of storm events (from G1 to G4 classes) are also considered in order to assess the forecasting capability during disturbed periods. To further challenge the model, forecasting is validated against independent measurements, i.e. the vTEC from dual frequency altimeter on board of Jason 3 spacecraft. C. Cesaroni et al.: J. Space Weather Space Clim. 2020, 10, 11 The results are finally discussed in Section 4 together with the limitations of the model and concluding remarks on the operational performance of the proposed forecasting approach within the Ionosphere Prediction Service (IPS) prototype (Albanese et al., 2017;Rodriguez et al., 2018;Veettil et al., 2019).

Data and methods
The proposed model is implemented through an algorithm consisting of two separate parts, working in cascade. The first part, termed "Single point TEC forecasting", focuses onto the 24-h ahead prediction of vTEC on selected points of the globe and it is based on the use of NARX fed by Global Ionospheric Map (GIM) provided by the International GNSS Service (IGS) with a spatial-temporal resolution of 2.5°(lat) Â 5°(lon) Â 2 h (Hernández-Pajares et al., 2009). The second part, termed "Global TEC forecasting", is based on the use of the NeQuick2 Model (Radicella & Leitinger, 2001;Nava et al., 2008Nava et al., , 2011 fed by the outputs of the single point vTEC forecasting with the goal to retrieve the so-called "effective R12" (Olwendo & Cesaroni, 2016). The effective R12 is then used to extend the vTEC forecasting 24-h in advance at a global level. Splitting the algorithm into two parts makes the process more flexible giving the possibility to modify only one part of the algorithm. For example, one can increase the number of the selected points for the "single point TEC forecasting" in the first part of the algorithm or use another model (e.g. IRI) to extend the forecasting at global level in the second part. The block diagram reporting the conceptual flow of the algorithm is sketched in Figure 1 while a detailed description is given in the next two subsections.

Single point TEC forecasting
Neural Network have been successfully used for "hours ahead" ionospheric forecasting purposes (see, e.g. Cander et al., 1998;Cander & Zolesi, 2014 and references therein) since several years. A nonlinear autoregressive model with external input, NARX (Xie et al., 2009), is here adopted and tuned for 24 h ahead TEC forecasting on each of the grid points indicated by red dots in Figure 2 and listed in Table 2. The 18 grid points have been chosen in order to cover different latitudinal sectors and different local time in every forecasting step. NARX is a recurrent dynamic network, with feedback connections enclosing several layers of the network. It uses supervised training method consisting in feeding the network with input/output examples to minimize the error function.
By using the NARX approach, the ionospheric parameter y t ð Þ to be forecasted is expressed as an auto regression function of both its previous k values and the corresponding r previous values of an independent external input signal u t ð Þ (Araghinejad, 2013): In this paper, the ionospheric parameter (y) to be forecasted is the vertical TEC at a given grid point and the external input (u) is the geomagnetic index Kp, available several hours in advance (up to 3 days) from NOAA (https://www.swpc.noaa. gov). In principal, the algorithm is able to use other parameters (Dst, AE, AU, AL) as external input but, so far, Kp is the only index available 24 h in advance. The input delay and feedback delay (k and r in Eq. (1), respectively) are both set up at 24 h taking into account that the main variation on TEC is due to  Schaer et al., 1998) format with time resolution of 2 h. Kp time series is obtained by interpolating 3-h resolution data to obtain a value of the Kp every 2 h. The interpolation method used is the nearest neighbor. As the dataset is a time-series and it has been randomly divided to train, test and validate the NARX, some implicit leakage (Schelter et al., 2018) could affect the performance of the "single point TEC forecasting" and has to be considered in discussing the results.
To illustrate how the different geomagnetic conditions are accounted in the considered dataset, left panel of Figure 3 shows the Kp for the period 2005-2015 in which colors indicate the NOAA scale of geomagnetic storms: quiet conditions (G0, green), minor/moderate conditions (G1/G2, yellow) and strong/severe/extreme conditions (G3/G4/G5, red). In the same panel, black dashed curve reports the corresponding value of R12. In the right panel of Figure 3, relative percentage of the Kp conditions according to the color code of the right panel is reported. As expected, the bulk of the considered geomagnetic conditions refers to quiet times (98%), while the storm conditions mostly occur during the descending, ascending and maximum phases of the solar cycle. As a consequence, the training dataset includes mostly data under quiet geomagnetic conditions. This could limit the performance of the model in reproducing storm conditions.
The NARX design used in the TEC forecasting approach is the so-called "closed loop network" or "parallel architecture", where the output is repeated in input at Hidden Layer to minimize the error of the output. In Figure 4, a sketch of the block diagram reporting the use of NARX on TEC time series C. Cesaroni et al.: J. Space Weather Space Clim. 2020, 10, 11 is shown. In the figure, W is the weight of each input parameter (y, u) and b is the bias (Sexton et al., 2019).
All the training is done in open loop (also called seriesparallel architecture), including the validation and testing steps. Typically, the network is created in "open loop", and only when it has been trained (which includes validation and testing steps) it is transformed to closed loop for multistep-ahead prediction. In fact, the "open loop" network is able to provide the forecasting only one time step ahead (2 h) while the "closed loop" configuration allows extending the forecasting up to 12 time step ahead (24 h). This is done by using the output at a given step of the closed loop as input to the following step. On the other hand, the current prediction of y(t + 1) is used as an input to predict y(t + 2) (Sexton et al., 2019). The standard network used for fitting is a two-layer feedforward network, with a sigmoid transfer function in the hidden layer and a linear transfer function in the output layer (grey boxes in Fig. 4). To evaluate the performance of the network in order to optimize the choice of weight and biases, we adopted as "loss function" the mean squared error (MSE). This is a very frequent choice among the bilateral loss function (e.g. Mean Absolute Error, Mean Squared Logarithmic Error) usually adopted for regression problems (Nie et al., 2018).
As an example, Figure 5 reports the results of the single point TEC forecasting for different days, different latitudes and local time (LT) spanning 24 h. In particular, panel a and b report the forecasting results for the 30 May 2016 considering grid points at northern high latitude (85°N, 120°E) and southern high latitude (85°S, 120°E) respectively. Panel c and d report the forecasting results for the 21 August 2016 considering grid points at mid (50°N, 120°E) and low latitude (20°N, 120°E) respectively. In all the four panels, top plot represents the actual (orange line) and forecasted (blue line) TEC and the bottom plot represents the absolute (blue lines, in TEC unit) and relative differences (orange line, in percentage) between  forecasted and actual TEC. For each of the four cases considered, RMSE is reported in the top part of the bottom plot. RMSE is defined as: where i represents each time step during the considered 24 h. As expected, the relative error is larger at low latitude than mid and high latitudes. In all examples, the RMSE is below 3.0 TECu. Both 30 May 2016 and 21 August 2016 can be considered quiet/weakly disturbed from a geomagnetic point of view (Kp 4).

Effective R12 and global forecasting
In order to extend the forecasting at a global level, NeQuick2 (Nava et al., 2006;Migoya-Orué et al., 2017) is applied by using a data ingestion process to take into account the actual ionization level of the ionosphere. NeQuick2 model is a climatological model that needs as input the geographic coordinates, month, time of the day and the solar conditions are also used in terms of solar flux (F10.7) or sunspot number (R12). The paper by Olwendo & Cesaroni (2016) describes an ingestion method capable to retrieve the so-called "effective R12" (R12eff) that is the parameter to be used to feed NeQuick2 to model as best as possible the actual TEC. In this way, NeQuick2 model is forced to reproduce the actual ionospheric conditions by considering the effective ionization level represented by R12eff. In this paper, we extended this approach by evaluating R12eff for the epoch to be forecasted to let NeQuick2 model reproducing the global TEC at that epoch. In the specific, R12eff is a global parameter to be retrieved, for each epoch to be forecasted, by minimizing the RMSE over all the 18 grid points in Table 2. In formula: where TEC i forecasted is the forecasted TEC over the ith point coming from the corresponding NARX and TEC i NeQuick ðR12Þ is the TEC over the ith point modelled by NeQuick2 model fed by R12. By adopting this approach, we are assuming that, despite the changing of the global ionization level, the NeQuick2 model is able to reproduce the global distribution of TEC under all geomagnetic condition. The choice of RMSE  as "target function" to be minimized comes from the needs to take care about high values of the difference between modelled and forecasted TEC. In particular, this target function allows to highlight possible spikes due to possible bad performance of the NARXs. Figure 6 reports an example of the RMSE dependence on R12 for the 17 April 2018 at noon. The R12eff corresponds to the minimum of the curve as indicated by the red dot in the figure. It is worth notice that R12eff is not physically meaningful as the actual R12. This is why R12eff can be a negative number. Once the R12eff is retrieved, it is used as input of NeQuick2 to create a global vTEC map with the same spatial and temporal resolution of the IGS product. As an example, Figure 7 shows the forecasted global vTEC maps for 08 September 2017 from 00:00 UT to 22:00 UT. In the figure, the value of R12eff is reported for each map.

Validation
This section details the capability of the model in giving a reliable vTEC forecasting. To validate the performance, the maps of TEC provided as daily IONEX files by UPC as part of the International GNSS Service (IGS) Final Products are used as "ionospheric reference". In the specific, the "UPC final product"  The IGS maps, as the maps by the forecasting model, have an update rate of 2 h and a binning 5°(longitude) Â 2.5°( latitude). For validation purposes, actual Kp has been used to C. Cesaroni et al.: J. Space Weather Space Clim. 2020, 10, 11 feed the NARX in order the remove the Kp forecasting error contribution to the model error budget.
The performance is evaluated for 1 year of accumulated statistics (Sect. 3.1) and during space weather events recently occurred (Sect. 3.2).
The metrics selected for the evaluation of the performance of a prediction-model are not standard. In fact, different metrics should be adopted in order to highlight different strength and weak features of the model in different operational conditions. Moreover, the metrics of a model should reflect the user needs. In Liemohn et al. (2018), the authors suggest different metrics for the fit performance and event detection performance assessment of a prediction model for geomagnetic indices. In the former case, the authors indicate Root Mean Square Error (RMSE) or Mean Absolute Error (MAE) and Mean Error (ME) as the basic metrics to be used. Thus, in order to give more weight to the data-model pairs with larger differences, we selected the RMSE and the ME (hereafter l), giving, in addition, the standard deviation (r) of the differences between the forecasted TEC and the corresponding IGS TEC map. These statistical parameters are evaluated for the whole period under investigation and for all the grid points of the map. Moreover, to give an estimation of the bias of the model with respect to time, we consider also the time profile of the Residual Global Electron Content (RGEC) index along the considered period. RGEC at a given time t* is defined as: where the sum is over all the N grid points of the maps, identified by the (i, j) coordinates. DTEC is the TEC difference between the forecasted TEC and the corresponding IGS TEC map in the grid point (i, j).
To further provide a measure of the forecasting performance, the model is also validated against independent measurements. To the scope, cross-validation with the vTEC measurements obtained from dual frequency altimeter on board of Jason 3 spacecraft is performed for the same aforementioned space weather events (Sect. 3.3).

Results on accumulated statistics
The period used for the validation against accumulated statistics spans from 1 June 2017 to 31 May 2018. As visible from bottom panel of Figure 8, in which the Kp index for the period is reported, geomagnetic conditions were from quiet to severe (Kp max = 8+, corresponding to G4 of the NOAA scale of geomagnetic storms). Top panel of Figure 8 reports DR12, being the difference between R12eff and the actual R12. From the figure, the dependence of DR12 on Kp is evident, highlighting how R12eff is a good driver for the TEC forecasting according to the development of geomagnetic storms. In particular, the peak value of DR12 is found in early September 2017, characterized by several solar flare and CME events (Linty et al., 2018;Qian et al., 2019). We remind the reader that by definition the actual R12 is independent on the geomagnetic activity and constant over each month. Figure 9 shows the results of the forecasting performance by using the metrics previously defined. In the specific, panel a shows the distribution of DTEC, Figure 9b shows the map of the mean values of DTEC, panel c reports the time profile of the RGEC parameter and Figure 9d reports the map of the standard deviation of DTEC. The gaps visible in Figure 9c are due to a lack of data in UPC dataset (from 8 to 13 October 2017, 21 October 2017 and 28 January 2018). It is worth noticing that the distribution (Fig. 9a) is well centered on zero (l = À0.1 TECU), i.e. the forecasting is accurate and that r and RMSE are 3.5 TECU, providing the precision of the forecasting. The time profile of RGEC (Fig. 9c) shows how this quantity is quite constant all over the year, giving a first proof of the ability of the model to keep constant its performance under both disturbed and quiet conditions. As expected (Figs. 9b and 9d), the mean and standard deviation of DTEC maximizes in correspondence with the expected position of the southern and northern crests of the Equatorial Ionospheric Anomaly (EIA), being a region very

Results on case events
To better highlight the behavior of the TEC forecasting under different level of geomagnetic disturbance, five case events of recent geomagnetic storms (intensity from G1 to G4, according to the NOAA scale) plus one day under quiet conditions have been identified. Table 3 summarizes the dates of the selected events (column 1), the maximum Kp (column 2) and the minimum Dst in nT (column 3) reached in the period and a specification of the class of the storm and the solar event generating the disturbance (CH = Coronal Hole; CME = Coronal Mass Ejection; column 4). We consider as "storm event" the days between the Sudden Storm Commencement (SSC) or sudden variation of Dst due to arrival of the disturbance at the magnetopause until the recovery to the pre-disturbance conditions.
As an example of the validation during case events, Figure 10 reports the same quantities of Figure 9 considering the G4 storm occurred between 07 and 11 September 2017 (Linty et al., 2018;Qian et al., 2019). The only difference between the quantities reported in Figures 9 and 10 is that Figure 10c reports the RGEC in red and the Dst in blue. This is to better relate the performance of the model during the different phases of the storm. As in the case of the accumulated statistics validation, the forecasting results to be accurate showing a mean DTEC of 0.4 TECU and a standard deviation and RMSE equal to 3.8 TECU. It is worth noticing that the RGEC time profile, showing the time dependence of the global forecasting performance, does not significantly vary according to the different phases of the storm development. In fact, RGEC can be reasonably assumed as constant all over the storm and no clear indications of changes in the model performance are noticeable. This feature has also been found for all the considered storm cases (not shown). We remind the reader that the September 2017 storm is the worst case considered for the model validation according to the class of the storms (G4 storm). This is the most striking feature of the proposed model as it proves to be resilient in providing reliable forecasting one day in advance under harsh geomagnetic conditions. Table 4 summarizes the performance of the forecasting model for the selected case events. In the specific, the  To assess the actual contribution of the model, the validation has been also performed against the "frozen ionosphere" assumption in the case of the G4 storm occurred in September 2017. Such validation procedure is also called "recurrence test". In this assumption, the forecasted TEC is assumed to be equal to the actual TEC measured one day before at the same hour. In other words, it addresses this question: "what are precision and accuracy of the forecasting if the vTEC tomorrow is predicted to be exactly the same of today?". To give a real contribution, a model, should provide more precise and accurate predictions with respect to the "frozen ionosphere" assumption. Similarly to Figure 10, Figure 11 reports the results of the validation for the period 7-11 September 2017 under "frozen ionosphere" assumption. According to the histograms in Figures 10a and  11a, the "frozen ionosphere" assumption results into both a slightly larger mean (0.4 TECu w.r.t. 0.5 TECu), standard deviation (3.8 TECu w.r.t. 3.9 TECu) and RMSE (3.8 TECu w.r.t. 3.9 TECu) than the corresponding metrics evaluated for the model prediction. According to the maps in Figures 10b and 11b, the forecasting model is found to provide a larger overestimation at the crests of the EIA and a larger underestimation at the Equatorial Ionospheric Trough (EIT) with respect to the "frozen ionosphere" conditions. However, the standard deviation is significantly smaller for the model than for the "frozen ionosphere". Moreover, the RGEC time profile of the latter (Fig. 11c) shows a clear dependence on the Dst index, i.e. the  RGEC increases in both absolute mean value and standard deviation during the storm. This behavior is the direct consequence of the different ionization during positive and negative phases of an ionospheric storm that the frozen ionosphere fails to follow, while the forecasting model is able to predict. To better specify, at the SSC, RGEC behavior is similar for the model and frozen ionosphere. During the main phase, the modelled RGEC model (<2 TECu) is slightly smaller than the frozen ionosphere RGEC frozen (<3 TECu). Model outperform the frozen ionosphere assumption especially during substorm (mid of 08 September) and recovery phase of the storm. In fact, during the recovery phase |RGEC model | < 2 TECu and r (RGEC model ) < 5 TECu, while |RGEC frozen | < 6 TECu and r (RGEC frozen ) < 6 TECu.
To further remark the model performance against frozen ionosphere assumption, Figure 12 shows the correlation between forecasted and actual TEC for the model (green) and for the frozen ionosphere assumption (blue). The Pearson's coefficient R is found to be 0.92 and 0.91, respectively. Such difference reflects the larger spread of the scatter plot for the frozen ionosphere with respect to the model.

Results against JASON-3 measurements
To validate the model against independent measurements, an external source of global vTEC observations has been used. To the scope, vTEC measured by the dual-frequency altimeter instrument on board the JASON-3 spacecraft is considered. JASON-3 is the latest of a series of satellites missions -JASON and JASON-2 its predecessorsdevoted to track the sea-level rise and fall, orbiting at a mean height of about 1330 km. The vTEC measurements are derived by altimetry data over the oceans between latitudes of 66°N and 66°S (this restriction is due to the inclination of the JASON mission orbit), where no permanent GNSS receivers can be placed. Among other sensors, it has a dual transmitter-receiver in C-band (5.5 GHz) and , that provide TEC with accuracies, including systematic biases, of about 2-3 TECu (Ho et al., 1995). In this way, JASON-3 provides independent reference data that Fig. 11. Same as Figure 9 considering the "frozen ionosphere" assumption. Fig. 12. Correlation between forecasted and actual TEC for the model (green) and for the frozen ionosphere assumption (blue). Solid lines indicate the corresponding linear fit, whose R is reported on the top left.
C. Cesaroni et al.: J. Space Weather Space Clim. 2020, 10, 11 can be used to evaluate the performance of GNSS-derived vTEC maps over the oceans except in the high latitude regions.
In this respect, two considerations have to be accounted for. First, JASON vTEC measurements are very accurate but they are affected by a certain offset with respect to the IGS products (Azpilicueta & Brunini, 2009). Second, this is a pessimistic scenario because JASON direct measurements are compared with values at the same location that had to be previously interpolated because of insufficient collection of real data over the oceans (lack of permanent GNSS stations). But in any case, testing against the vTEC records provided by the JASON-3 altimeter instrument should be performed in order to verify that the predicted products are consistent with an external vTEC data source, even in these worst scenarios.
Similarly to Hernández-Pajares et al. (2017), a sliding window has been used to reduce the noise of the altimeter measurements. An example of the final results of such smoothing process is shown in Figure 13 where JASON-3 measurements are smoothed by using a sliding window of 15 s (red). For a direct comparison with JASON-3 measurements, forecasted maps are linearly interpolated to retrieve vTEC values corresponding to the JASON-3 tracks point.

Results on a quiet day: 8 February 2018
This section is intended to provide the baseline for the results to be expected of the inter-comparison of the model against JASON-3 when no particular geomagnetic activity is ongoing. For this purposes, 8 February 2018 is selected as a quiet day (Table 3). In addition, results are also shown for the consolidated UPC GIMs UQRG (Hernández-Pajares et al., 2016) released as rapid UPC GIM with a latency of one day. This particular UPC GIM presents the best performance when compared against co-located altimetry data, with RMSE values of 2 TECu (Hernández-Pajares et al., 2016).
A summary of the performance against JASON-3 under quiet conditions is reported in Table 5. The metrics show a very small bias for the model whenever compared against JASON-3 vTEC. Sigma and RMSE are within very reasonable limits although they almost double those for UQRG GIMs. Let us remember that the UQRG are the best performing GIM according to Hernández-Pajares et al. (2016) and Roma-Dollase et al. (2018).
When considering the results sorted by UT considering time spans of 2 h (Fig. 14), one can notice that the highest sigma and RMSE values for the model are mostly found between [0,4] and [20,24] h. Something similar can also be found for UQRG GIMs but the order of magnitude of such parameters is lower (about 0.5 times lower). In any case, the sigma and RMSE values for the model are quite compatible with those from UQRG for the central hours of the day. Regarding the mean value of the vTEC difference, UQRG presents a consistent negative value with respect JASON-3 whereas the model shows smaller values mostly positive.
When considering the inter-comparisons under a latitudinal distribution (bands of 20 degrees from Equator up and downwards, Fig. 15), it is possible to notice that the performance in terms of sigma and RMSE worsens between [À20, 20] degrees, while compatible results against UQRG GIMs are obtained from [20,60] to [À60, À20]. Regarding the mean, it is bouncing from positive to negative for the model but with values significantly smaller than those for UQRG GIMs, which are negative.
Considering the information displayed in the previous figures, it can be confirmed that the UQRG product is a very stable product comparing to JASON-3, in terms of RMSE and standard deviation, therefore becoming a reliable vTEC source as indicated by literature search. The very same can be said about the model: considering that it is a forecast product, it keeps the error at low level. This conclusion will be confirmed in the next section.  The assessment corresponding to the selected dates for the validation included in Table 3 is presented in this section.
As in previous subsection, the reference (i.e. benchmark) is provided by JASON-3 measurements.
We notice again the reader that the comparison with JASON-3 vTEC measurement is slightly affected by the missing topside-plasmaspheric component in the altimeter  C. Cesaroni et al.: J. Space Weather Space Clim. 2020, 10, 11 measurements with respect to the GNSS vTEC. However, this bias is of little entity (few TECu). Nevertheless, the current results are compatible with the study performed in Hernández-Pajares et al. (2016) where seven reliable sources of vTEC maps were compared against altimeter measurements. In that case, more than 190 million of inter-comparisons were made during 15 years, showing that most part of the biases were from 1 up to few TECu (in absolute value). As the results show, the model predictions are compatible with such results in Hernández-Pajares et al. (2016). Considering the RMSE and sigma TEC, the latitudinal distribution shows that the largest values (i.e. the worst results) are between [À20°, 20°] (Fig. 16) as expected due to the quite unpredictable behavior of the crests of the EIA during disturbed period. The hourly distribution of the TEC differences depicted in Figure 17 shows a quite independent behavior of the mean TEC, sigma TEC and RMSE with respect to the hour of the day even if a slightly positive bias (between 1 and 2 TECu) is highlighted. These results reported in Figures 14 and 15 are totally compatible with Hernández-Pajares et al. (2016) where the sigma was typically ranging from 3 to 10 TECu. It is remarkable for a forecast product to be able to meet such results, which were obtained against post processed GIMs. In addition, Ren et al. (2019) shows a performance evaluation of the real-time GIMs provided by different centers by comparing GIM vTEC against JASON 3 vTEC. The authors report bias in the range [3.88, 5.04] TECu, standard deviation in the range [2.56, 6.02] TECu and RMS in the range [4.95, 5.71] TECu. The positive bias confirms that the topside-plasmaspheric contribution is of the order of few TECu while the standard deviation and RMS are in accordance with our results confirming that our forecasted TEC maps fully agree with the TEC maps provided by IGS.

Discussion and conclusions
The model here presented is based on the use of a nonlinear autoregressive neural network with external input and it is able to predict TEC at a global scale 24 h in advance. It leverages on a two-steps process, in which the first one is based on the vTEC prediction over 18 selected points over the globe (Fig. 2 and Table 2). This prediction in based on the NARX, trained with data covering 1 solar cycle and uses as external input the Kp index. Since the model has been conceived to operate in real time, Kp has been selected as it is, currently, the only geomagnetic index available in nowcasting and forecasting (up to 3 days in advance). The second step is based on the propagation of the forecasting from local to global level, leveraging on the joint use of the NeQuick2, properly fed by an effective R12, whose determination is obtained through a minimization process, being part of this second step. The model has been thoroughly validated to provide measures of the precision and accuracy of the prediction. The main achievements are the following: The validation with 1-year accumulated statistics against UPC GIM final product shows that a mean difference of À0.1 TECu and a RMSE of 3.5 TECu, providing the measure of the accuracy and precision of the model, respectively. The validation during five selected geomagnetic storm events (from G1 to G4) against UPC GIM final product confirms the capability of the model to describe the TEC variations following the different phases of the storms. The mean difference is in the range [À0.9, 0.4] TECu, the standard deviation and the RMSE are below 5.1 TECu. The performance of the model seems to be quite independent on the storm intensity. The worsening of the model performance is found in correspondence with the expected position of the crests of the EIA and of EIT. The validation against independent vTEC measurements from JASON-3 altimeter confirms the quality of the prediction even over the oceans, i.e. regions over which ground-based measurements are not available. In fact, the results, showing a sigma in the range from 1 to 8 C. Cesaroni et al.: J. Space Weather Space Clim. 2020, 10, 11 TECu are compatible with the performance of the UPC rapid GIM, i.e. a post-processed product (Hernández-Pajares et al., 2016) and real time GIM as reported in Ren et al. (2019).
Despite the model capability of reproducing reliable ionospheric conditions even during storm conditions, the assumptions made in designing the algorithm reflect in some limitations: Assuming that NeQuick2 model is capable to describe the overall distribution of TEC, even when actual conditions are far from a climatological behavior, allows using a single parameter (R12eff) to globally drive NeQuick2 model. This is obviously a simplification that may results into a lack of capability to reproduce the local ionospheric features, e.g. the longitudinal dependence of the effects of a storm on the crests of the EIA. The use of 1 solar cycle of data is assumed to be enough to describe all the possible geomagnetic/ionospheric conditions. Actually, as reported in Figure 3, the data available during Severe/Extreme (G4/G5) geomagnetic conditions are limited (<1%). In addition, the implicit data leakage arising when time series datasets are randomly divided for training, testing and validation of the NARX, may limit the performance of the single point TEC forecasting. This can be thought a serious limitation, but the validation provided in this paper shows that the model is capable to reproduce the overall ionospheric conditions even under geomagnetic storm. The 18 grid points in step 1 has been selected to cover different geographic sectors and local times. Nevertheless, the model could benefit for a larger number of points and/or different reference frame in which the points are selected (e.g. geomagnetic coordinates). The use of Kp as the external input to NARX is not the best choice at least for low and high/polar latitudes. The use of alternative indices/parameters will be tested as soon as they will be available as forecasted values in real-time.
Making use of the forecasted Kp provided by NOAA, the model is currently implemented as a real-time service in the Ionosphere Prediction Service (IPS) prototype (Albanese et al., 2017;Rodriguez et al., 2018;Veettil et al., 2019), available at http://ips.gsc-europa.eu. In addition, products of the realtime operational model implemented at Istituto Nazionale di Geofisica e Vulcanologia are provided to the Pan-European Consortium for Aviation Space Weather User Services (PECASUS, http://pecasus.eu/), as one of the key products in the GNSS expert group.