University of Birmingham Using the local ensemble transform Kalman filter for upper atmospheric modelling

– The Advanced Ensemble electron density (Ne) Assimilation System (AENeAS) is a new data assimilation model of the ionosphere/thermosphere. The background model is provided by the Thermosphere Ionosphere Electrodynamics General Circulation Model (TIE-GCM) and the assimilation uses the local ensemble transform Kalman ﬁ lter (LETKF). An outline derivation of the LETKF is provided and the equations are presented in a form analogous to the classic Kalman ﬁ lter. An enhancement to the ef ﬁ cient LETKF implementation to reduce computational cost is also described. In a 3 day test in June 2017, AENeAS exhibits a total electron content (TEC) RMS error of 2.1 TECU compared with 5.5 TECU for NeQuick and 6.8 for TIE-GCM (with an NeQuick topside).


Background
Comprehensive, global and timely specifications of the Earth's upper atmosphere are required to ensure the effective operation, planning and management of both radio frequency (RF) systems and satellites. The atmospheric layers most effected by space weather are the ionosphere and thermosphere. The ionosphere/thermosphere is a coupled system which spans an altitude of approximately 90-2000 km above the Earth's surface. In this region, radiation from the Sun causes photoionization of neutral molecules (thermosphere) creating a plasma (ions and free electrons, ionosphere). The thermosphere causes drag on satellites and is the dominant error in propagating low Earth orbits (below approximately 600 km). Furthermore, the ionosphere affects all RF systems that operate via or through the ionosphere at frequencies below approximately 2 GHz. Thus, systems such as global navigation satellite systems (GNSS), high frequency (HF) communications, space-based Earth observation radars and space situational awareness radars can be impaired.

Modelling
Approaches for modelling the Earth's upper atmosphere (ionosphere/thermosphere) can be broadly split into three categories: empirical modelling, physics-based modelling and data assimilation techniques. Empirical models are based upon observations and the relationships between model variables are usually found via curve fitting techniques (Llewellyn & Bent, 1973;Bilitza et al., 1988). Physics-based models solve the underlying equations of the physical processes in the atmosphere. Often these are in the form of coupled ionosphericthermospheric physics models. Finally, data assimilation models combine measurements of the atmosphere with a background model.
Data assimilation (DA) is a broad term covering a number of techniques. These include: weighted least squares (Plackett, 1950), Kalman filters (Kalman, 1960), optimal interpolation (OI) (Gandin, 1963;Eddy, 1967) and variational methods (Le Dimet & Talagrand, 1986). The background model for the DA approach can be either an empirical or physics-based model. Previous model comparison work has shown that, in terms of specifying the current ionosphere (nowcasting), DA models do best, followed by empirical and then physics-based models (Shim et al., 2011(Shim et al., , 2012McNamara et al., 2013;Elvidge et al., 2014). In particular, McNamara et al. (2013) showed that two DA models (both with an empirical background model (the Electron Density Assimilative Model, EDAM; Angling & Jackson-Booth, 2011) and the GPS Ionospheric Inversion model, GPSII;Fridman et al., 2009) perform close to the minimum achievable errors for nowcasting parameters related to the peak ionospheric density.
Since both EDAM and GPSII rely on an empirical background model (the International Reference Ionosphere 2007 (IRI2007) (Bilitza & Reinisch, 2008), in the absence of data they revert back to this model. This is a weakness suffered by all empirically-based data assimilation models and means that they are not well suited to forecasting. This weakness can be overcome by using physics-based DA models, which, when forecasting, can propagate the ionosphere/thermosphere state forward in time.
An example of a physics-based DA model is the Utah State University Global Assimilation of Ionospheric Measurements (USU-GAIM) models. These models encompass a range of different modelling approaches (Schunk et al., 2016). In particular the GAIM Gauss-Markov (GAIM-GM) (Scherliess et al., 2006) and GAIM Full Physics (GAIM-FP) (Scherliess et al., 2009) models both use physics-based background models, and whilst GAIM-GM uses a Gauss Markov Kalman filter for assimilation, GAIM-FP relies upon a ensemble Kalman filter. The background model for GAIM-FP includes six ion species (NO + , O þ 2 , N þ 2 , O + , He + , H + ), as well as ion and electron temperatures and plasma drifts (Schunk et al., 2005). However in this model the thermosphere is modelled empirically and this limits the model's ability to model satellite drag.
Coupled models such as the Thermosphere Ionosphere Electrodynamics General Circulation Model (TIE-GCM) (Qian et al., 2014) have also been used in DA schemes. TIE-GCM was developed at the National Center for Atmospheric Research (NCAR) and solves for neutral and ion species in the upper atmosphere. Using TIE-GCM in an ensemble Kalman filter modelling scheme has been shown to increase skill in specifying ionospheric parameters Chartier et al., 2013Chartier et al., , 2016. This paper introduces the Advanced Ensemble electron density (Ne) Assimilation System (AENeAS). AENeAS uses TIE-GCM as its background model and uses a variant of the ensemble Kalman filter (the local ensemble transform Kalman filter, LETKF) to assimilate data. As well as describing the theory behind the model, initial benchmark results are also described.

Introduction
Kalman filters (Kalman, 1960) have found widespread use in applications ranging from signal processing to spacecraft navigation and control. They have also been used in a variety of assimilative ionospheric models. For a full derivation of the classic Kalman filter equations see Hamilton (1994).
Given a series of states at times t, x t , the Kalman filter gives the optimal (i.e. the minimum mean square error) estimate of the combination (analysis, x t a ) of a background model (x t b ) and observations y t o . This optimality is dependent on the process being linear and error statistics being Gaussian. The filter works in two repeating steps: 1. Update; 2. Prediction.

Update step
The filter starts with a background state vector and its associated error covariance matrix (x t b and B t ). The analysis (i.e. the updated state vector, x t a ) and its associated covariance matrix (A t ) are given by where K is commonly called the Kalman gain matrix, B is the background error covariance matrix, O is the observation error covariance matrix, H is the observation operator, x b and x a are the background and analysis vectors respectively, y o is the vector of observations and I is the identity matrix.

Prediction step
Using the values from the current update step, the prediction step is used to find the background values x tþ1 b and B tþ1 for use in the next update step: where M is the forecast model which moves the state from one time step to the next and Q t is the associated error covariance matrix. This process is repeated t times. The classic Kalman filter (Kalman, 1960) specifically defines a covariance matrix for the system (Eq. (1)), and consequently the computational cost of the filter scales with the square of the dimension of the model. For a fully coupled ionosphere/thermosphere model the dimension is approximately 10 7 . This is too large for most computers to be able to manipulate in a reasonable amount of time. The ensemble Kalman Filter (EnKF) (Evensen, 2009) is a Monte Carlo approximation to the Kalman filter for high dimensional or non-linear models. Since its introduction it has gained wide spread use throughout a variety of data assimilation applications such as weather forecasting (Etherton, 2007), fireline modelling (Mandel et al., 2009), power system state tracking (Li et al., 2012) and soil moisture estimation (Reichle et al., 2002). The EnKF replaces the covariance matrix (Eq. (1)) by a sample covariance found from a set of state vectors (called the ensemble) (Evensen, 1994). The ensemble members are independent until data is assimilated, at which point all of the members are modified. This approach is useful for both large dimensional systems (where estimating the covariance matrix is necessary to reduce the computational cost) and also in complex physical systems where the exact covariances between model variables are unknown.
There is no unique, optimal, method for implementing the EnKF equations. Previous EnKF work using TIE-GCM as a background model (e.g. Lee et al., 2012;Chartier et al., 2013Chartier et al., , 2016, has used the NCAR's Data Assimilation Research Testbed (DART) (Anderson, 2009). DART provides a set of tools for applying DA techniques to existing models across a range of geophysics disciplines. As well as the EnKF, DART includes the Ensemble Adjustment Kalman Filter (EAKF) (Anderson, 2001) and the Rank Histogram filter (Anderson, 2010;Metref et al., 2014).

Local ensemble transform Kalman filter
In this work, the local ensemble transform Kalman filter (LETKF) is used as the DA scheme. An outline of the LETKF equations are presented here; for a complete derivation see Hunt et al. (2007).
As with the classic Kalman filter, background values and an associated error covariance matrix are required to start the process (we drop the time subscript t from the variables for ease of reading). The single background state vector used in the classic Kalman filter is replaced by a set of k background ensemble members {x b(i) : i = 1, 2, . . ., k} where each x b(i) contains m variables. It is assumed that the best estimate of the true state (before observations are taken into account) is the background mean: The associated error covariance matrix B can then be found in the following way. Define a matrix X b to be an (m Â k) matrix of background perturbations; that is, a matrix where the ith column is defined by x bðiÞ À x b (i.e. each column of X b is an ensemble member with the ensemble mean removed). The background error covariance matrix is then estimated by This has at most rank k À 1, and only accounts for uncertainty in k À 1 directions of the model space. This can cause issues if k is small; however this can be somewhat overcome by localizing the model space (described later in this section). After an assimilation time step an analysis ensemble ({x b(i) : i = 1, 2, . . ., k}) is returned. The analysis mean and error covariance matrix are: where X a is the matrix of analysis perturbations (defined equivalently as X a ). It is required that x a and A should estimate the best analysis of the system and its associated uncertainty. Therefore to construct the analysis (post assimilation) ensemble x a(i) the matrix X a must satisfy Equation (9). Let y o 2 R s be a vector of s observations, O 2 R sÂs be the observation error covariance matrix and H : R m ! R s be an observation operator that maps from the model space to the observation space. The Kalman gain for the LETKF is found by substituting the definition of the background error covariance (Eq. (7)) into the classic Kalman gain matrix definition (Eq. (3)): The matrix HX b is the background perturbation matrix mapped into observation space. For simplicity this is rewritten as Y b , defined similarly to X b , i.e., the ith column is Hðx bðiÞ Þ À y b . Using this, the identity P T ðPP T þ QÞ À1 ¼ ðI þ P T QPÞ À1 Q T P À1 , and setting P ¼ ðk À 1Þ 1 2 Y b and Q ¼ O, Equation (10) can be rewritten as: To find the analysis covariance matrix the LETKF Kalman gain and background error covariance matrix is substituted into Equation (1): where again Y b = HX b . This can be simplified using the identity I À (I + P) the analysis error covariance matrix. For ease of writing, defineÃ so Equation (13) becomes: UsingÃ in the Kalman gain definition for the LETKF (Eq. (11)) gives: and using Equations (15) and (16) one can define the ensemble analysis mean (Eq. (2)) as: Let w a ¼ÃðY b Þ T O À1 ðy o À y b Þ, which defines the analysis increment in observation space, and thus: Letting or equivalently W a W T a ¼ ðk À 1ÞÃ, then using this in Equation (15) gives the required definition of the analysis covariance, Equation (9). As such the the analysis ensemble can then be constructed by: For x a to be the mean of the analysis ensemble, the sum across the columns of X a must be zero, i.e. X a l T ¼ 0 (where l ¼ ð1; 1; . . . ; 1Þ, a vector of 1's). By the definition of X b , X b l T ¼ 0. Therefore for X a l T ¼ 0 it is required that l is an eigenvector of W a . Wang et al. (2004a) showed that this is true if the symmetric solution of the square root is chosen.
The complete set of LETKF equations (in form analogous to the upate step of the classic Kalman filter) are: It can be seen that all the matrix operations take place in ensemble space (i.e. on matrices defined by the ensemble) with the largest matrix dimension being s (the number of observations). For the AENeAS implementation of the LETKF it is assumed that the observations are independent and thus the observation error covariance matrix, O, is diagonal.
The prediction step of the classic Kalman filter (Eqs. (4) and (5)) is not needed. The forecast model (which moves the state from one time step to the next) is replaced by a (not necessarily linear) model which moves each ensemble member forward in time.
The final component of the LETKF is localisation. The localisation used by the LETKF is similar to that described by Ott et al. (2004), where for each model grid point a subset of the global matrices are used. First the background perturbation matrices in both model and observation space are calculated globally. Then for each grid point the observation vector and associated background perturbation matrix mapped into observation space (Y b ) are restricted to only include observations from a region around that point (Fig. 1). Similarly the perturbation matrix and associated mean vector are restricted to only include model variables from the grid point. The LETKF equations (Eqs. (21)-(26) from Sect. 2.2) are then applied to each region. The output of the equations for each region only updates the centre grid point.
At each grid point an optimum combination of ensemble members are calculated (compared to the data). These represent local multidimensional unstable manifolds (Szunyogh, 2014). The global analysis is constructed by smoothly joining each of these together, i.e. forming a global unstable manifold. Szunyogh (2014) describes how this combination can be made using a model propagator; however in the LETKF this combination relies upon the choice of the matrix square root in the construction of the analysis (Wang et al., 2004b). The choice of the symmetric square root transformation forces consistency between adjacent local analyses since the transformation is a continuous function of the analysis error covariance matrix (Hunt et al., 2007). This consistency does not occur if the transformation suggested in Bishop et al. (2001) is used. As long as subsets of observations used for neighbouring grid points overlap heavily, the local weight vectors for the grid points are very similar. When weight vectors do not change much, the analysis ensemble members are roughly linear combinations of the ensemble members, and so the changes between neighbouring grid points remain reasonably physical. The effectiveness of this smoothing as compared to the global approach (i.e. LETKF compared to ETKF) is highly dependent upon the choice of region size. The regions around each model grid point should be chosen such that their intersection is sufficiently large (in comparison to the number of observations in each region).
One issue with all EnKFs is undersampling, whereby the ensemble is too small to statistically represent the error covariance matrix. Consequences of undersampling are spurious covariances and filter divergence (Anderson, 2001). Spurious covariances between variables are ones where the variables are not physically related. These may be in parts of the state vector that are physically a large distance apart. Filter divergence is the term used to describe when the assimilation stops improving the analysis state (Anderson, 2001). This can occur if the model variances become too large or too small in relation to the observation variance.
The localisation component of the the LETKF helps to address filter divergence by improving the specification of the covariance matrix. One of the underlying assumptions of ensemble modelling is that the set of ensemble members span the space of the model variables. In reality the whole space is often not fully spanned, but it is necessary that it is a reasonable approximation. By reducing the ensemble analysis to local regions, the ensemble members only need to span the region within those regions. The dimensionality of the reduced region is significantly less than the whole space and the approximation that the ensemble spans the space is a better one. Hunt et al. (2007) used a Lorenz-96 model to investigate the effectiveness of this localisation system by comparing the LETKF and ETKF and varying the ensemble sizes for models of dimension 40 and 80. The results showed that the global ETKF required an ensemble size of half the model dimension to be effective. In contrast doubling the dimension of the model had little effect on the performance of the LETKF, and a good performance was obtained for ensemble sizes greater than or equal to 10.

Introduction
The Advanced Ensemble electron density (Ne) Assimilation System (AENeAS) is a new upper atmosphere DA model that combines a physics background model and the LETKF. Currently, the physics model is provided by TIE-GCM (described in Sect. 3.2). However AENeAS is modular so that in future other physics models can be used instead of, or as well as, TIE-GCM.
The primary motivation for developing AENeAS has been for ionospheric/thermospheric forecasting. Ionospheric/ thermospheric physics models rely upon knowledge of electron, ion and neutral densities as well as temperatures and wind speeds. However the most abundant data source in the upper atmosphere is an integrated measurement of electron density called the total electron content (TEC) (described in Sect. 4.3). The LETKF is used so that the unknown covariances can be estimated from the TIE-GCM ensemble. The optimal number of ensemble members (balancing performance and computational cost) is still to be determined. However a 15 min AENeAS timestep can be run with 100 ensemble members in real-time on a 32-core computer. Benchmarking test results have shown that using 32 ensemble members provides good agreement with independent observations.

TIE-GCM
TIE-GCM is a global three-dimensional model of the coupled thermosphere ionosphere system (Richmond et al., 1992). At each time step the continuity, energy, and momentum equations are solved for neutral and ion species using a fourth-order, centred finite difference scheme (Roble et al., 1988). The model is driven by the daily F10.7 (the solar flux at a wavelength of 10.7 cm at the Earth's orbit which is used as a proxy for solar output) (Tapping, 2013), the 81-day F10.7 average and the Kp (which indicates the severity of the magnetic disturbances in near-Earth space) (Bartels, 1957). Instead of the Kp the model can use the Earth's magnetic field cross-tail potential (CTP) and the hemispheric power (HP) which are measures of auroral activity. These values are related to Kp via (Neale, 2010 The lower boundary condition (atmospheric tides) is given by the Global Scale Wave Model (GSWM) (Hagan et al., 1999).

NeQuick
TIE-GCM only extends to 600 km in altitude (depending on solar conditions). Whereas in reality the ionosphere/ plasmasphere extends much further than this and for accurate specification of the domain higher altitudes need to be considered. To achieve this in AENeAS, a higher reaching model of the ionosphere is attached to the topside of the TIE-GCM profiles which extends the electron density grids to 25,000 km. The other species considered by the model also have their grids extended to the same height (to ensure the altitude grid is the same across all species); however all values above the TIE-GCM maximum altitude are set to zero. The current ionospheric topside model used by AENeAS is NeQuick (Nava et al., 2008).
NeQuick is an monthly median ionospheric electron density model developed at the Aeronomy and Radiopropagation Laboratory (now Telecommunications/ICT for Development Laboratory) of the Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, Italy, and at the Institute for Geophysics, Astrophysics and Meteorology (IGAM) of the University of Graz, Austria (Nava et al., 2008). The model is based on the Di Giovanni-Radicella (DGR) model (Giovanni & Radicella, 1990). It has been designed to have continuously integrable vertical profiles which allows for rapid calculation of the total electron content for trans-ionospheric propagation applications. For use in AENeAS, the NeQuick bottomside profile is driven by parameters (height and peak density of the E, F1 and F2 ionospheric layers) taken from TIE-GCM. At altitudes above the TIE-GCM maximum the electron densities from NeQuick are used. By driving NeQuick with parameters derived from TIE-GCM ensures that the bottom and top-side profiles are smoothly connected.
As well as time and location, the model is also driven by F10.7 (like TIE-GCM). The topside of NeQuick, which is ued in AENeAS, is a simplified approximation to a diffusive equilibrium. A semi-Epstein layer represents the model topside with a height-dependent thickness parameter that has been empirically determined (Nava et al., 2008). For AENeAS, the University of Birmingham (UoB) Python version of NeQuick has been used (Angling et al., 2018).

Ensemble generation
To use the background model in the LETKF an ensemble of the model must be generated. The spread of the ensemble members should be sufficient to estimate the background covariance matrix (Sect. 3.1). For AENeAS, the F10.7, cross-tail potential (CTP) and hemispheric power (HP) are perturbed to generate a set of different background models. The CTP and HP values are chosen to be Normally distributed, where the mean of the distribution is the observed Kp converted to HP (Eq. (27)) and CTP (Eq. (28)) and the standard deviation is set at 40% of the observed value. The F10.7 values are generated using a random walk. At each time step the walk value has a uniformly distributed random number in the range [À3, 3] added to it but restricted to the range [À100, 100]. By construction the mean of the F10.7 values (across the ensemble members) is close to the observed F10.7, ensuring the long term trends of F10.7 are captured. F10.7 values are updated every 3 h (which is in line with the cadence of the Kp data) and ensures the ensemble members have time to respond to changes in conditions. The internal model time step is set to 60 s and each assimilation window is set at 15 min.

Localisation
In AENeAS localisation regions of ±10°latitude and ±20°l ongitude are used. These values have been chosen based on the estimated correlation lengths in the ionosphere from McNamara (2009) ( Table 1).

Covariance inflation
Any EnKF depends on the accuracy of the estimated background and observation error covariances and errors can effect the optimality of the data assimilation scheme (Liang et al., 2012). Unfortunately, using the ensemble to estimate the covariance matrices can result in sampling errors as the number of ensemble members is limited. Previous work has shown that small ensembles lead to underestimation of the covariances which eventually result in filter divergence (Anderson & Anderson, 1999;Constantinescu et al., 2007). Specifically the ensemble members become too similar. To mitigate this covariance inflation has been used. First proposed by Anderson and Anderson (1999) covariance inflation artificially increases the uncertainty in the background model. The required inflation value to use depends on the specific case (both domain and ensemble size) Hamill et al. (2005). Whilst there are various strategies for implementing covariance inflation (e.g. Hamill & Whitaker, 2005;Anderson, 2007Anderson, , 2009Whitaker et al., 2008;Li et al., 2009;Miyoshi, 2011;Kang et al., 2012), here multiplicative inflation Anderson and Anderson (1999) has been used.
Multiplicative inflation is more suited to the ionospherethermosphere problem than other inflation schemes due to the dynamic range of values involved. Currently the Anderson and Anderson (1999) fixed multiplicative inflation is implemented in AENeAS. This involves multiplying the background error covariance matrix, B, by an inflation factor q > 1. Since B ¼ ðk À 1Þ À1 X b X T b this is equivalent to multiplying X b by ffiffiffi q p . This can be achieved efficiently by replacing (15) (Hunt et al., 2007).

Sporadic Forcing
It is well known that physics-based general circulation models (GCMs) suffer in data assimilation schemes when sporadically forced by data (Baker et al., 1987;Lorenc et al., 1991;Macpherson, 1991;Bloom et al., 1996;Fox-Rabinovitz, 1996). The sporadic nature of the assimilated data (be it temporally or spatially) can upset the intrinsic balance of the equations (Ham et al., 2016) leading to discontinuities and high-frequency oscillations (Baker et al., 1987).
Two main approaches have been developed to deal with this problem. The first relies on the repeated insertion (RI) of data (also known as nudging). This involves using each observation multiple times in an observation window (which in the lower atmosphere community can be as long as 6 h). The observations are weighted throughout the window so that the largest weights are applied at the observation time, with decreasing weights before and after (Lorenc et al., 1991;Macpherson, 1991). This has the impact of slowly moving the background model state towards the analysis state over time. Secondly, incremental analysis updating (IAU) has been used (Baker et al., 1987). IAU distributes one analysis increment throughout the assimilation window rather than repeatedly assimilating differently weighted observations. This acts as a low pass filter to reduce high frequency oscillations (Ham et al., 2016). The key difference between IAU and RI is that IAU only modifies the analysis increment and not the background state, as RI does (Polavarapu et al., 2004).
Whilst both approaches have been used extensively in the lower atmosphere they have been developed in schemes which use long (approximately 6 h) assimilation windows. AENeAS relies on a much shorter assimilation window of 15 min, so neither RI or IAU are directly applicable. Therefore a different approach been developed; at each grid point the amount which any variable can change has been limited. For example for a single species, after calculating X a in the LETKF (X a ¼ X b ððk À 1ÞÃÞ 1 2 ; Eq. (25)) the actual value of X a (at each grid point) used to update the model is determined by: where a is a predefined constant set to the largest allowable change (i.e. a = 1.1 limits the amount of changes from À9.1% to +10%). This causes the background variables to move "in the direction" of the analysis, but slowly so as to ensure the background model equations remain stable. Initially, many grid points changes are limited by this procedure (since the difference between the background and analysis is the largest; Lee et al., 2006), however this reduces over time as the background model is moved nearer the observations.

Implementation
Hunt et al. (2007) described a computationally efficient implementation of the LETKF in seven steps. Steps 4 and 5 involve calculatingÃ (Eq. (14)) and W a (Eq. (19)). The computational cost of both calculations is proportional to the cube of the ensemble size. However the number of calculations can be reduced by first calculating which is the pre-inverse ofÃ defined in Equation (14). This can then be factorized tõ where Q is the matrix of eigenvalues ofÃ Ã and K is a diagonal matrix whose elements are the eigenvalues ofÃ Ã . The number of operations to do this is, at most, the order of the cube of the ensemble size (since it is known thatÃ Ã is symmetric). Since K is diagonal it is easy to calculate both K À1 and K À 1 2 (with the number of operations linearly proportional to size of the ensemble) and sõ This approach also allows for a simple check thatÃ is positive definite (as required by the LETKF) since all the eigenvalues will be > 0.

Initial results -AENeAS benchmarking 4.1 Overview
During the AENeAS development cycle a short benchmark test has been developed for investigating changes in performance as improvements are made. A more detailed statistical study to quantify the performance of the model is necessary in the future. This initial test compares AENeAS with its background model, TIE-GCM+ (TIE-GCM with an NeQuick topside), NeQuick and the IGS global TEC map.

Test scenario
The time period for the test scenario is between June 4th and June 7th, 2017. Figure 2 shows the solar and geomagnetic conditions during the test period. It is a solar and geomagnetically quiet time with the F10.7 only varying between 75 and 79 and the Kp between 0 and 2 (well below the storm threshold of Kp 4). This is beneficial for the benchmarking test as the models do not have to respond to any rapidly varying geophysical conditions.

Data
Signals from global navigation satellite systems (GNSS) can be used to determine the total electron content (TEC) between the satellite and a receiver on the ground (or in space for radio occultation, RO). TEC is the line integral of electron density on a path between two points (i.e. between the satellite and the receiver) where N e is the electron density. TEC is often expressed in TEC units (TECU), which is 10 16 electrons/m 2 . A full derivation of how TEC is estimated can be found in Elvidge (2014).
The TEC data has been provided by the Ionospheric determination and Navigation based on Satellite And Terrestrial (IonSAT) systems research group at the Universitat Politcnica de Catalunya (UPC). The data has been calibrated (i.e. biases have been removed) by leveling the data to the UQRG model which is produced by combining tomographic modelling of the ionosphere with Kriging interpolation (Hernández-Pajares et al., 1999;Orus et al., 2002). This model has been shown to provide excellent performance compared to independent observations and other Global Ionosphere Maps (GIMs) (Hernández-Pajares et al., 2017).
An example map of the data is shown in Figure 3. This figure shows the TEC values (which are assimilated) mapped to vertical TEC (VTEC) values (in order to more easily visualize the data). The mapping function assumed a thin-shell ionosphere at a fixed altitude of 300 km.
For testing, the models are compared in terms of their ability to reproduce the total electron content (TEC) from a receiver that has not been assimilated.

Results
Assimilating the data causes AENeAS to make global changes to the background model. A sample AENeAS post assimilation TEC map (at 1230 on June 5th) is shown in Figure 4. The corresponding TIE-GCM+ output is shown in Figure 5.
The changes result in the AENeAS TEC being generally higher than TIE-GCM+ in the northern hemisphere and lower or unchanged in the south. The exception to this is Australasia that has a higher TEC than the background model. Additional small scale structure is also created and can be seen especially clearly around the equator.  S. Elvidge andM.J. Angling: J. Space Weather Space Clim. 2019, 9, A30 (Hernández-Pajares et al., 2009). It should be observed that the general structure more closely agrees with the AENeAS output rather than TIE-GCM+. In particular the areas of increased density over Australasia, the South Atlantic and California.
The GPS station "REDU" (based in Belgium) was excluded from the assimilation so that it could be independently compared against the AENeAS results. At 1230 on June 5th the REDU receiver could see 8 satellites and therefore returned 8 slant TEC observations. As with the assimilated data, these TEC values are levelled to UQRG to provide calibrated TEC. For comparison, the modelled TEC is then found by integrating along each ray path through the electron density grids.
The observations have been compared with AENeAS (preand post-assimilation), TIE-GCM+ and NeQuick (Fig. 7). The REDU observations are plotted with black vertical lines representing the uncertainty of the observations. TIE-GCM+ is shown in green and is significantly different from the observations. On average the TIE-GCM integrated TECs are approximately 70% of the observation. The NeQuick results (yellow) are very similar to the TIE-GCM+ values and are lower than all of the observations. The pre-assimilation AENeAS results (blue) shows that the model has successfully "moved" the background TIE-GCM+ values to nearer the observed values. The pre-assimilation results include all of the AENeAS history upto the previous timestep. Assimilating data has the effect of moving the mean closer to the observation and reducing the spread of the ensemble members (shown in red). For each observation at least one fifth of the analysis ensemble member is within the uncertainty of the observation. Figure 8 shows a snapshot of electron density profiles from June 5th at 1230 from Dourbes, Belgium (longitude: 4.6, latitude: 50.1). Each individual AENeAS ensemble member (red) is shown, as well as TIE-GCM (blue) and NeQuick (yellow).    For reference the profile derived from the Dourbes ionosonde (DB049) is shown in black. It can be seen that AENeAS does very well moving from the background model (TIE-GCM) towards the Dourbes ionosonde. It is important to remember that the electron density profile has not been assimilated into AENeAS, and only slant TEC has. Even with the lack of height information from the TEC measurements, the reconstructed peak electron density from AENeAS closely matches the value from the ionosonde.
The overall statistics for the full 3 day test are shown in Table 2. It can be seen that AENeAS performs very well, especially compared to its background model, TIE-GCM+. Both the mean and standard deviation of the model errors (model minus observation) are much less for AENeAS than the other two models. NeQuick, the empirical model, also performs better than the physics-based TIE-GCM+. For further reference the statistics from the IGSG model are also included. Two important things should be noted with regards to the IGS global output. Firstly the REDU observations have not been removed from the IGSG model, and, unsurprisingly, the mean error is significantly, 1.5 TEC units, less than AENeAS. Secondly, the IGS global map is only output every 2 h, and the map for 1230 is linearly interpolated from the 1200 and 1400 maps. Whilst the mean error of the IGSG is the smallest of all of the models, it has the largest error standard deviation. This could be a result of comparing the smoothed IGSG output to the 30 s observations (Hernández-Pajares et al., 2009).
Although AENeAS performs well there are some issues with artificial boundary effects in the assimilation. This is most obvious in the region around Hawaii in Figure 4. Whilst Section 2.2 presents approaches for improving smoothness these are clearly insufficient in areas of very sparse data. Another method for ensuring smoothness has been described by Yang et al. (2009). This approach involves performing the LETKF calculations on a coarse grid and then interpolating to the required grid resolution (Fig. 9). Specifically the ensemble weight vectors are interpolated to find the high resolution grid point values. Yang et al. (2009) use a smooth bivariate    . 9. The circles are where the LETKF assimilation is performed. The higher resolution grid (crosses) is then found by interpolation of the weight vectors. Image from Yang et al. (2009). interpolation scheme by locally fitting quintic polynomials to the zonal and meridional values of the coarse analysis grid. This scheme insures that the analysis ensemble perturbations maintain the zero mean property as required for the LETKF. Whilst this method is a sub-optimal assimilation scheme it has the advantage that the smoothness is guaranteed. It will be implemented in the next iteration of AENeAS.

Conclusions
A new model, the Advanced Ensemble electron density (Ne) Assimilation System (AENeAS) has been developed. AENeAS is a physics-based data assimilation model of the Earth's upper atmosphere. Its background model is TIE-GCM and the model uses the local ensemble transform Kalman filter (LETKF) for the assimilation scheme. The LETKF is an efficient implementation of the ensemble Kalman filter which defines local regions where the assimilation is performed. The advantage of this is that it reduces the state space of the model and brings it closer to the space spanned by the ensemble members. The algorithm iterates through each grid point independently and so is naturally suitable for parallelization. A computationally efficient implementation using eigendecomposition has also been presented.
Data assimilation with physics-based models can upset the intrinsic balance of the model equations. Nudging or incremental analysis updating is used in the meteorological community to slowly move the background model to the analysis state. However in the lower atmosphere the assimilation window is usually 6 h and these approaches have been developed with that in mind. For the upper atmosphere the assimilation windows are usually much shorter (15 min) and therefore a new sporadic forcing technique is described.
Results of initial testing show that AENeAS can effectively assimilate GPS TEC data and reduce TEC errors compared to TIE-GCM and NeQuick. However a more detailed statistical study is required to quantify the performance of the model. Further methods will be required to ensure model smoothness in areas with very sparse data.