Low latitude monthly total electron content composite correlations

– Spatial correlations of total electron content (TEC) variability are compared among two SAMI3 model runs and Jet Propulsion Laboratory Global Ionospheric Maps (JPL/GIM). Individual monthly correlation maps are constructed with Equatorial reference points at 12 evenly spaced longitudes and 12 universal times. TEC composite correlations (TCCs) are then calculated by averaging the individual maps, shifted zonally to synchronize local time. The TCC structures are quanti ﬁ ed using Gaussian ﬁ ts in the zonal and meridional directions. A non-zero large-scale “ base correlation ” is found in all three datasets for 2014, a year with high solar activity. Higher base correlations generally occur in the SAMI3 runs than in JPL/GIM. The SAMI3 run driven with climatological neutral ﬁ elds shows higher correlations than the run driven with neutrals from a Whole Atmosphere Community Climate Model with thermosphere – ionosphere eXtension (WACCM-X) simulation. Base correlation values strongly correlate with monthly F10.7 standard deviations. Empirical orthogonal function (EOF) analyses con ﬁ rm that large-scale correlations are usually, although not always, related to solar forcing. Strong correlations between the Ap index and EOF modes are also observed, consistent with the geomagnetic forcing of the TEC ﬁ eld. The widths of the correlation structures are also examined, and these vary considerably with local time, month, and dataset. Off-Equator conjugate point correlations are also calculated from each dataset and variations with the month and local time are analyzed. Analysis of TCCs for 2010, a year with low solar activity, shows that base correlations as well as correlations of the ﬁ rst EOF mode with F10.7 are generally weaker than in 2014.


Introduction
The goal of this paper is to explore the spatial correlations of the ionospheric state on daily timescales in order to inform ionospheric data assimilation (DA) systems on how to model the structure of the background error covariance matrix (BECM). While the full BECM contains all state variables, for simplicity we focus on the vertical total electron content (vTEC, referred to as TEC hereafter). While a number of papers have examined the correlation structure of TEC (as reviewed further below), this study is novel for two reasons. First, we compare correlation results among three different datasets. Two are year-long model runs of Sami3 is Another Model of the Ionosphere (SAMI3, Huba et al., 2017;McDonald et al., 2018;Zawdie et al., 2020) using two different specifications of the neutral atmosphere. The third is an analysis provided by the Jet Propulsion Laboratory Global Ionospheric Maps (JPL/GIMs, Mannucci et al., 1998). By comparing SAMI3 runs with JPL/GIMs, we can assess how well the model simulates the TEC correlations observed in the "real world". The second novel aspect of this study is the analysis of the structure of the BECM in terms of the leading empirical orthogonal functions (EOFs) that characterize the TEC variability. We aimed to understand whether the ionosphere may be dominated by a limited number of modes determining the structure of the ionospheric BECMs.
A defining issue with ensemble DA systems is the use of relatively small numbers of ensemble members, i.e., the number of ensemble members is far less than the dimension of the problem. Small ensemble sizes lead to spurious covariances within the BECM and subsequently to noise in the resulting state estimates. The typical way that this issue is handled is through the use of what is referred to as "localization". Localization assumes that correlations are local to a region and therefore all covariances outside of that region are then set to zero. When covariances are truly local, as in tropospheric numerical weather prediction, this "localization" paradigm can lead to large improvements in the quality of the resulting state estimates from an EnKF. However, as we will show, the correlations in TEC can be highly non-local and therefore we wish to assess the plausibility of contemporary BECM localization tools for the ionosphere.
While some of the above ionospheric DA systems do not use localization (e.g., Chartier et al., 2016;Chen et al., 2016Chen et al., , 2019Codrescu et al., 2018), most localize the BECM in the horizontal, and some in the vertical as well. Localization widths vary among models. For example, Elvidge & Angling (2019) use a localization region with latitude/longitude distances of 10°/20°, based on estimated ionospheric correlation lengths from McNamara (2009) and McNamara & Wilkinson (2009). Mengist et al. (2019) use a Gaussian localization function with zonal (meridional) widths of 5°(3°) in low latitudes and 10°(5°) in high latitudes. Morozov et al. (2013) use a piecewise continuous GC (Gaspari & Cohn, 1999) function with a halfwidth of 30°, while Pedatella et al. (2020) use a GC function with a half-width of 11.5°. Chartier et al. (2016) examined the sensitivity of ionospheric DA to localization length, comparing simulations with localization radii of 0.2, 0.5, and 1.0 radians (11.5°, 28.6°, and 57.3°) during a geomagnetic storm simulation. They found similar errors for all three radii from 00:00 to 12:00 local time (LT). The largest localization radius performed better from 12:00 to 15:00 LT, while the smallest radius performed better after 15:00 LT, possibly indicating the ensemble was not properly representing the variability in the later stages of the storm. Forsythe et al. (2020a) derived global distributions of ionospheric correlation lengths based on IRI-2016 model errors (calculated using JPL/GIM as the "truth") and day-to-day TEC variability, noticing significant differences between the two approaches, and investigated the extent of the anisotropy of the correlation length scales. Forsythe et al. (2020b) calculated vertical correlation lengths using IRI and five incoherent scatter radars (ISR) and showed that vertical correlations also vary with solar conditions and are asymmetric, generally having larger (smaller) lengths above (below) the reference altitude. Forsythe et al. (2021a) estimated correlation times, and Forsythe et al. (2021b) combined these with the correlation lengths for use in the IDA4D model. They showed that the vertical part of the covariance played the most important role and helped correct a tomographic issue relating to the assimilation of slant TEC data.
We note that correlation lengths are often based on a fixed linear correlation threshold coefficient value. For example, Klobuchar & Johanson (1977) determined the length at which the correlation coefficient drops below 0.7 as the "correlation distance" that provides a 29% uncertainty reduction in predicting one station using data from another. Shim et al. (2008) and McNamara (2009) also used a 0.7 cutoff to define correlation lengths, while Yue et al. (2007) and Liu et al. (2018) used a 0.75 cutoff, andForsythe et al. (2020a) used a 0.8 cutoff. While this approach provides an objective method for estimating correlation lengths, care needs to be taken in applying quoted lengths for DA purposes. Many DA algorithms make use of Gaussian covariance models. The proper way to fit a Gaussian function to a correlation field is to find the value of the correlation equal to e À0.5 % 0.607. Therefore, typical correlation length scales in the literature may be too short to be applied to Gaussian covariance modeling.
Another important use of correlation length scales in DA is the construction of the localization function for ensemble schemes. When the goal is localization then one is not trying to fit a Gaussian to the correlation structure. The goal here is rather to define a function that, when multiplied against the covariances within the BECM, will keep the covariance structure near the diagonal of the BECM intact while also setting to zero far-field noise in the BECM (see Morzfeld & Hodyss, 2023, for more detailed discussion). This requires a length scale longer than the true length scale of the data. Because the width of the localization scheme depends not only on the physical correlations but also on the ensemble size this localization length scale is typically obtained through ad-hoc tuning of the DA system. Additionally, it has been found that using too narrow of localization can lead to poor representations of multi-variate relationships between variables and therefore poor forecasts from the ensemble (Kepert, 2009;Greybush et al., 2011).
Ideally, the BECM represents the correlation of errors across variables and across different spatial regions. In this study, we focus on the spatial correlations of vertical TEC. A number of studies have examined TEC correlations on different timescales, and many of these have also attempted to determine the physical cause of the correlations. Three main drivers of TEC variability have been examined: solar variability, geomagnetic variability, and meteorological processes (e.g., Forbes et al., 2000;Rishbeth & Mendillo, 2001;Araujo-Pradere et al., 2004Fang et al., 2013Fang et al., , 2018. As explained by Fang et al. (2018), the contributions of these contributors to the total TEC or N m F 2 variability strongly depend on local time, latitude, and altitude. Fang et al. (2018) used the coupled Whole Atmosphere Model (WAM) and Global Ionosphere Plasmasphere (GIP) to simulate ionospheric variability during June -July 2012, a time with strong variations in solar and geomagnetic activity. Comparing three simulations that isolated lower atmosphere processes, geomagnetic activity, and solar forcing they found that at low latitudes solar activity dominates the daytime absolute TEC variability, while the contributions from the lower atmosphere D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 and geomagnetic activity are approximately equal. However, when examined in terms of relative contributions to the global TEC variability, geomagnetic activity was the main contributor, followed by solar activity, and then lower atmospheric perturbations. In periods of low solar and geomagnetic activity, the lower atmosphere processes (tides, gravity waves, and planetary waves) can play a much larger role. For example, it has been shown that lower atmospheric forcing associated with sudden stratospheric warmings can significantly impact the ionosphere during geomagnetically quiet conditions (Siddiqui et al., 2021).
Observational studies have also confirmed that ionospheric variability depends on local time, season, latitude, solar proton flux, solar wind, geomagnetic activity, and terrestrial weather (Rush & Gibbs, 1973;Forbes et al., 2000;Rishbeth & Mendillo, 2001). Early studies of ionospheric spatial correlations have been performed using ionosonde and Global Positioning System (GPS) observations (e.g., Klobuchar & Johanson, 1977;Nisbet et al., 1981;Gail et al., 1993;Bust et al., 2001). More recent comprehensive analyses have been performed using global datasets. Yue et al. (2007) and Forsythe et al. (2020a) used JPL/GIM to analyze variations of horizontal correlation lengths with local time (LT), magnetic latitude (Mlat), and season. Shim et al. (2008) used over 1000 ground-based receivers to analyze spatial correlation variations with LT, Mlat, and season. Liu & Chen (2009) Liu et al. (2018) used JPL/GIM with detailed fitting analyses to calculate length scales as a function of LT, month, and Mlat. Several studies also examined geomagnetic conjugate correlations (e.g., Yue et al., 2007;Liu et al., 2018). These are interhemispheric correlations associated with connecting magnetic field lines. The conjugate correlations pose a problem for DA since they do not behave like typical monotonic correlation functions.
This paper will provide a detailed analysis of TEC correlations to complement previous studies and provide guidance for ionospheric DA. We note that the correlations presented here represent TEC variability about the time-mean (month-long sequences are used), which we assume reasonably represents the correlations for the monthly-varying "climatological covariance" that could be used in a DA system that does not have an ensemble capability, or could be combined with an ensemble in a hybrid 4D-Var system in order to improve the rank of the BECM. An alternate approach would be to compile day-today differences in TEC, DTEC, which has often been used for examining ionospheric variability. Assuming the TEC values are the true values, DTEC approximates the error that would be obtained using a 24-h "persistence" forecast model, in which the next day's forecast is identical to today's analysis. Forsythe et al. (2020a) showed that horizontal correlation lengths calculated using DTEC were significantly different from those calculated using estimated model error from the difference between IRI and JPL/GIM maps. Our approach provides a new method to approximate TEC correlations that can be compared to other approaches.
The paper is organized as follows. Section 2 provides descriptions of the models and data used in this study. Section 3 details the TEC correlation composite calculation and examines the base correlations for the three datasets. Section 4 uses EOFs to determine the dominant modes of TEC variability and relates these to the solar and geomagnetic indices. This section also quantifies correlation widths and examines the interhemispheric conjugate correlations. Section 5 provides a summary and conclusions. We mainly focus on 2014 in this paper, a year with high solar activity. However, for comparison, we also show in the Supplementary Material some results for 2010, a year with low solar activity.
2 Model and data description 2.1 SAMI3 SAMI3 is a physics-based model of the ionosphere that simulates the transport and chemistry of seven ion species and solves for the electron and ion temperature for three species (Huba et al., 2000). The momentum equation is split into motions along and across the magnetic field. In the configuration used for this study, the Earth's magnetic field is specified by the International Geomagnetic Reference Field (IGRF) in the magnetic apex coordinate system (Richmond, 1995;Laundal & Richmond, 2016). Details on the solver can be obtained from previous studies (see Zawdie et al., 2020 and references therein). We compare runs using two different SAMI3 configurations for the thermosphere. For "SAMI3/climo," the neutral temperature and composition are from NRLMSISE-00 (Picone et al., 2002) and the thermospheric winds are from the Horizontal Wind Model (HWM14) (Drob et al., 2015). For "SAMI3/WACCM-X," the neutral atmosphere is specified using WACCM-X temperature, composition, and winds. As explained in McDonald et al. (2018) and Zawdie et al. (2020), for this experiment, WACCM-X is run in a specified dynamics configuration in which the lower boundary is forced by the Navy Global Environmental Model-High Altitude (NAVGEM-HA, Eckermann et al., 2018). The solar forcing of SAMI3 uses a flux term that includes the average of the daily F10.7 index and the 81-day mean of the daily F10.7 index (Huba et al., 2000). In addition, for SAMI3/climo the F10.7 affects the neutral densities and temperatures via NRLMSISE-00. The geomagnetic forcing is through the Ap index, which in SAMI3/climo affects both the neutral's density and temperature via NRLMSISE-00 and the neutral HWM14 winds. For the SAMI3/WACCM-X used here, there is no explicit use of the Ap index, although geomagnetic variations can implicitly affect SAMI3 via its influence on the WACCM-X neutrals (Liu et al., 2010).

JPL Global Ionospheric Maps
To compare with the SAMI3 forecasts, we use the 2-h JPL/ GIMs of TEC. As described in Mannucci et al. (1998) and Iijima et al. (1999), the approach uses a Kalman filter to retrieve vertical TEC maps from global GPS measurements. The JPL/ GIMs are generated from a global network of 200 GPS receivers at a longitude/latitude resolution of 5°/2.5°. The JPL/GIMs were downloaded from the Crustal Dynamics Data Information System (CDDIS) website (https://cddis.nasa.gov/archive/gnss/ products/ionex). Further details are provided in Emmert et al. (2017). D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 2.3 Comparison of the datasets at one location While a complete comparison of the datasets is beyond the scope of this paper, we provide in Figure 1 a sample comparison that shows the daily TEC values at one location (latitude = 0°, longitude = 0°) over 2014 at four times of the day (0000, 0600, 1200, 1800 UTC). All datasets show a strong diurnal variation in TEC, minimizing at 0600 UTC and maximizing at 1200 or 1800 UTC. They also show a semi-annual cycle with maxima in March-April and October-November, coinciding with solar variations at the Equator, in addition to a 27-day cycle associated with the solar rotation period. On daily time scales, the SAMI3/climo is smoother than the other datasets (e.g., in June/July at 1800 UTC), while SAMI3/WACCM-X shows enhanced daily variability.
The SAMI3/WACCM-X TEC is generally biased high with respect to the other datasets at 0000, 1200, and 1800 UTC. While a correction to the SAMI3/WACCM-X TEC could be applied (McDonald et al., 2018 scaled the electron density by a factor of 0.7), the TEC correlations used in this paper are not impacted by a constant scaling, so we do not apply a correction for this paper. While the 27-day cycles at 1800 UTC appear to be similar between SAMI3/climo and JPL/GIM, the amplitude of the cycle is much larger in SAMI3/WACCM-X, suggesting that there is enhanced responsiveness to solar variations. The SAMI3/climo more closely aligns with the JPL/GIM data in these time series, although there are some biases over the annual cycle. For example, at 1800 UTC, SAMI3/climo is biased high from January to April and October to December and is biased low from May to September. In the results shown below, we will focus on monthly correlations, in which the mean values are removed from the time series. Therefore, the biases among the datasets are largely removed. This allows us to examine and compare the geographic interdependence of the TEC fields for each dataset. The SAMI3/climo and JPL/GIM TEC values at latitude = 0°, longitude = 0°for 2010 are shown in Supplementary Figure S1 (SAMI3/ WACCM-X data are not included in the analyses of 2010). The TEC daily values are much smaller and the variability is much weaker than in 2014 due to weaker solar activity, as shown below. JPL/GIM TEC is generally higher than SAMI3/ climo TEC in 2010.

Solar and geomagnetic indices
To gain perspective on the variability in the solar and magnetic indices, Figures 2a and 2b shows the daily F10.7 and Ap values for the year 2014. There are wide swings in F10.7 over some months, while others show relatively steady F10.7. To quantify the variability, Figures 2c and 2d shows the monthly standard deviations (r) of F10.7 and Ap. The r F 10:7 in January is 33, compared to 11 and 7.5 in February and March, respectively. r F 10:7 is also large in July (42) and October (37). The Ap index shows very large values in February and very small values in March and July. Also plotted in Figures 2c and 2d is the monthly correlation coefficients of the 1200 UTC empirical orthogonal function amplitude with the F10.7 and Ap indices (red lines), which we will examine later. Supplemental Figure S2 shows the indices for 2010. While the F10.7 values are much smaller and have much weaker variability, the Ap index shows some months with strong variability, particularly in April, May, and August 2010.

Interpolation to regular and magnetic grid
In the results described below, the SAMI3 TEC maps are linearly interpolated to the JPL/GIM grid, which has a longitude Â latitude resolution of 5°Â 2.5°, to allow easier comparison. Each map is also interpolated to a 2-D (constant in altitude) magnetic grid with a resolution of 5.0°magnetic longitude (Mlon) Â 2.5°magnetic latitude (Mlat). The magnetic grid is obtained by interpolating the SAMI3 "Apex" magnetic grid line locations to obtain the latitude/longitude positions where these grid lines cross the constant altitude of 110 km, as used in Figure 2 of Richmond (1995). We then interpolate horizontally to obtain the latitude/longitude coordinates associated with each Mlat/Mlon pair on the magnetic grid. In order to avoid polar processes, we limit both latitude and Mlat to ±60°. The motivation to exclude the poles is both due to limitations in the SAMI3 run settings used in this study as well as larger errors in the JPL/GIM products at high latitudes due to less coverage (Mannucci et al., 1998). Yue et al. (2007) similarly limited the data range to Mlat of ±60°. Both grids are displayed in Figure 3. Note that Mlon is defined as zero for the line that passes through the geomagnetic pole (longitude = 71°W, latitude = 79°N for this Apex grid for 2014). This position on the magnetic grid is indicated by a red line in Figure 2b. To facilitate comparisons made using the two grids, this longitude shift will need to be considered, since the local time of longitude = 0°does not coincide with local time Mlon = 0°. We also highlight in Figure 3 the 12 longitudes and Mlons that are used for the composite calculations discussed below.

Individual correlation maps
The main approach used in this paper is to calculate TEC composite correlations by averaging individual correlation maps taken at different times that are shifted in the longitude to synchronize the local time. The motivation for this approach is (1) to account for the significant diurnal cycle in TEC due to solar production in the daytime and nighttime recombination, and (2) to reduce spurious correlations by combining multiple maps. The mathematical formalism is provided below for clarity.
First, we define a linear correlation map that is a function of longitude and latitude (denoted by k i and u j ) for a selected reference point, k i 0 ; u j 0 , and a reference coordinated universal time (UTC), in hours. We note here that (k i , u j ) may refer to either geographic coordinates or magnetic coordinates. The individual correlation maps are produced by first compiling TEC maps over a given time range (one month is used in our study), with the number of available days indicated by D, and the day of the year indicated by d. We note here that we do not bin the data by geomagnetic or solar activity, so some of the variations by month will be due to variations in these parameters. Including these variations allows us to assess their impact on the correlations. This is important since there is uncertainty in geomagnetic and solar activity in real DA systems that needs to be modeled in the BECM if it is to represent the true error in the system. Because each day of the month has different solar and geomagnetic activity, a monthly dataset approximates an ensemble with uncertainty in solar and geomagnetic activity.
In Section 4, we will filter the TEC data using empirical orthogonal functions in an attempt to examine to what degree the variability in each month can be attributed to solar and geomagnetic activity. We next form an array of indexed TEC values (q) at a fixed hour (h, in UTC) as follows: A global map of the linear correlation coefficient, r, for a given reference point is made by calculating the correlations for all longitudes (total of N lon = 72) and latitudes (total of N lat = 49): where i = 1, 2, . . ., N lon and j = 1, 2, . . ., N lat , overbar indicates mean over all days and r is the standard deviation over all days. Sample correlation maps are provided in Figure 4, which are calculated using a fixed time of 1200 UTC and using 30 days covering 2-31 January 2014 (1 January is left off due to SAMI3's use of a "cold start," i.e., initialized with electron density specified by a Chapman layer). The reference point is indicated with a black dot on each panel and is located at the Equator, but at different longitudes in each frame. Also plotted in Figure 4 are several lines of solar zenith angle (SZA) for reference. Figures 4a, 4e and 4i show the reference point at longitude = À180°(0000 LT). There are high correlations (greater than~0.6, indicated by red/orange contours) within the vicinity of the reference point. The SAMI3/climo also has high correlations at locations remote from the reference point, while SAMI3/WACCM-X and JPL/GIM have weaker long-distance correlations. As the reference point moves into the sunrise, Figures 4b, 4f and 4j, the SAMI3/ climo data shows high correlations spreading both along the east-west and in the north-south directions, starting to fill the sunlit region. SAMI3/WACCM-X also has enhanced correlations, while JPL/GIM correlations are weaker. As the reference point moves to noon (Figs. 4c, 4g and 4k), the high correlations span nearly the whole sunlit region for SAMI3 data, but only partially for JPL/GIM. As the reference point moves into the evening (Figs. 4d, 4h and 4j), these large correlations remain for SAMI3/climo, but die down and are more localized for SAMI3/WACCM-X and JPL/GIM. These correlation maps, therefore, show a strong variability with longitude, latitude, local time, and dataset. There are also apparently spurious correlations at large distances. This is unavoidable due to the small sample size (30) relative to the size of the state (72 Â 49 = 3528). As shown below, composite averages help to reduce this noise. Figure 5 provides similar correlation maps, but in the magnetic coordinates at 1200 UTC. As mentioned earlier, the magnetic longitudes are shifted relative to regular longitudes, so the local times of Mlon = 0°is similarly shifted. Here we calculate the approximate local time of the magnetic grid by a shift of (À71°)/(15°/h) = À4.7 h, and these are indicated in each panel in Figure 5. The correlations using the magnetic grid show a similar diurnal cycle to the correlations using the geographic grid, with very large-scale correlations occurring in the SAMI3 data as the reference point enters the sunlit region and more compact correlations when the reference point is in the night. In fact, assuming the exact same point on the globe is used as the reference point, there would be a one-to-one correspondence between correlations using the two grids. So there is not necessarily an advantage to going to the magnetic grid for a single correlation map. However, the composites can show important differences between the two reference frames.

Composite correlation maps
The next step is to average the individual maps at different UTCs but shifted horizontally so that the local times of the maps are synchronized. Defining the local time (LT hereafter) at a given longitude as LT i , we can write the dependence of this variable on h and longitude: We use a similar relationship for the magnetic local time (MLT), which accounts for the geographic longitude of the north magnetic pole k N : where k i refers to magnetic longitude. As explained in Laundal & Richmond (2016), this definition is simplified, since it does not take into consideration the magnetic longitude of the subsolar point. As a further simplification, with a magnetic pole at a longitude of À71°we note that the correction term in equation (4) is not an integer, but rather it has the value k N /15 = À71°/(15°/h) = À4.73 h. For our composites to be on evenly-spaced 2-h intervals, we assume that k N = À60°, so that k N /15 = À4.00 h. This causes a 44 min time error in the MLT. For the qualitative purposes used in this paper, this simplification is adequate.
To calculate the composite correlation for a given fixed latitude (or Mlat), we average over individual correlation maps to reduce spurious noise. In this average, we want the local time of the reference points (the "reference local time", LT 0 ) to be the same for all maps, and we want to shift in the longitude to line up all of the reference points. For this composite calculation, it is helpful to specify the data intervals in longitude and time. We choose a fixed time spacing of X = 2 h. Using the longitude spacing for the TEC maps, 5°, for each time shift of X we need to shift the longitude index by to use individual correlation maps at 12 reference points at longitudes of 0°, 30°, 60°, . . ., and 330°. The TEC correlation composite (TCC hereafter) can then be calculated as follows, looping over reference points, and shifting in longitude in order to line up the reference points. Here i and j are the longitude and latitude indices for the TCC map, and i 0 and j 0 are the longitude and latitude indices of the individual reference points: In equation (5), h is the universal time that corresponds to the local time at the given reference point as detailed in equation (6). Note that when h becomes negative, we add 24 h: This procedure provides TCC values at all longitudes and latitudes. We can convert the resulting longitude index to a local time index by LT(i) = LT 0 À iD/15. This results in composite maps as a function of local time and latitude, TCC j 0 ;LT 0 LT; j ð Þ. These composites can be calculated at all latitudes, but we will focus on the Equator and 20°N. They are also calculated for the 12 reference times: LT 0 = 0, 2, 4, . . ., 22 h. Figure 6 shows Equatorial reference point TCCs maps for 2-31 January 2014 on the regular grid. The progression of LT 0 in Figure 6 is similar to the progression of longitudes in the individual correlation maps in Figure 4. The TCCs show a similar diurnal structure to the individual correlation maps, but are generally smoother and show a reduction of the long-distance spurious correlations. The TCC maps for 2-31 January 2010, D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 provided in Supplemental Figure S3, show generally weaker correlations for SAMI3/climo and JPL/GIM than in January 2014. This is consistent with the weaker variability of the solar flux seen in Supplemental Figure S2. The SAMI3/climo TCC maps in both years exhibit very strong global correlations, particularly throughout the sunlit region. This suggests that there are only a small number of dominant physical forcings in SAMI3/ climo. Compared to the SAMI3 TCCs, JPL/GIM show correlations that are generally more localized relative to the reference point.
While there is some diurnal variability (to be examined more closely below), the JPL/GIM does not exhibit the very large correlations filling the sunlit region as seen with SAMI3. This is consistent with Shim et al. (2008), who showed using ground-based GPS observations that TEC changes over most of the illuminated disk are largely uncorrelated, suggesting that "day-to-day changes in the solar illumination only add a statistically insignificant contribution to the day-to-day changes in TEC." There is an indication of spreading correlations in the meridional direction, particularly at 1800 LT 0 , but otherwise, the correlations remain quite local. One possible reason for the reduction of correlations in JPL/GIM is noise in the TEC observations themselves, which could mask some of the real variability. However, it is also likely that SAMI3 overestimates the correlations due to limitations in producing smaller-scale ionospheric phenomena.

Base correlation calculations
In order to characterize the spatial variation of the composites, we fit the TCCs in the zonal and meridional directions with Gaussian curves that have a fixed center at a maximum of 1.0 at the reference LT (or MLT) and reference latitude (or Mlat). With x referring to a given coordinate (longitude, latitude, Mlon, or Mlat), each of the fits results in two constants, a 0 (which we refer to as "base"), and a 1 (which we refer to as "width") that describe the variations, Note that if a 0 = 0, then a 1 is the half-width associated with the correlation value of e À0.5 = 0.607. In typical meteorological correlations, the base would be zero, indicating that a single phenomenon, with a single physical correlation scale, is present. But as seen in Figure 6, the correlations generally do not go to zero at a great distance from the reference point. This is consistent with at least two physical phenomena at work, one with a very long, globally correlated structure and another with more a local structure. The Equatorial LT line plots associated with the TCCs on the regular grid (black lines) are shown in Figure 7 for 2-31 January 2014, along with the Gaussian line fits (blue dashed lines). Values of the base and width are indicated on If the base correlations are largely due to F10.7 variability, then we would expect the monthly base values to be correlated with the monthly r F 10:7 . To test this, Figure 8 plots these two quantities as a function of the month in 2014 for SAMI3/ WACCM-X. Since the base depends on reference local time, there are 12 lines plotted, and color-coded by LT. As seen in the time series for base calculated using longitude (Fig. 8a), there is a strong relationship between base and r F 10:7 , and linear correlations between the two quantities range from 0.76 to 0.83. This confirms that the base correlations in SAMI3/WACCM-X are largely due to solar variations driving the photoproduction of electrons and ions. Similar variability is seen using latitude (Fig. 8b). Note that there are two points for 0800 LT with negative correlations in July and November that are spurious. The correlations using Mlon (Fig. 8c) tend to be higher than for longitude, and the correlations with Mlat (Fig. 8d) also show correspondence with r F 10:7 . We show in Supplemental Figure S5 the monthly longitudinal and Mlat base values for 2014 for the three datasets and for each local time (red lines), along with the mean values over all local times (blue lines), and r F 10:7 (black lines). Like SAMI3/WACCM-X, the SAMI3/climo and JPL/GIM base values appear to follow r F 10:7 fairly well, with the exception of a few months for JPL/GIM. The base values for both Lon and Mlat are generally largest in SAMI3/climo and smallest in JPL/GIM, while SAMI3/WACCM-X shows values in the middle.
The base correlation dependence on local time is examined in Figure 9 for SAMI3/WACCM-X for 2014. For longitude and Mlon (panels a and c), there is a tendency towards maxima around 0800 LT and 1800-2000 LT, corresponding to morning and evening hours, with minima around 0400 and 1400 LT. For latitude and Mlat (panels b and d), the base values generally show a strong diurnal cycle, with low values occurring from midnight to 0600 LT, a sharp increase from 0600 to 1000 LT, and a decrease from 1800 to 2200 LT. This LT time variation resembles the variation of the correlation scale seen in Liu et al. (2018), who used JPL/GIM data to determine correlation scales with a correlation value cutoff method. Their results for April 2008, for example, have low-latitude correlations that rise sharply from 0600 to 1200 LT, but then fall off more slowly from 1200 to 2200 LT. They also showed strong seasonal variability in 2008 with higher zonal correlations in spring and autumn than in summer and winter, as well as strong interannual variability. The diurnal variability of the base values for all three   The Gaussian fits suggest that TCCs may often be described by a global-scale base correlation and a local correlation. In an attempt to separate these two features, we apply an empirical orthogonal function (EOF) analysis to the data in order to identify and remove the largest sources of variability. As we will show below, this is often attributed to large-scale solar forcing. The EOF analysis is equivalent to singular value decomposition, or principal component analysis, in which a matrix of n-columns by m-rows is decomposed as the product of three matrices: Here A is a matrix of size N days Â N state (number of days Â size of the state) for which each column contains a vector of TEC data at fixed UTC over a given date range, with the time mean of each state element removed. Here N state = N lon Â N lat = 3528. U and V are the left and right singular vectors, and S is a diagonal matrix of eigenvalues. From this decomposition, we can determine the spatial structures of the EOFs and the amount of variance contained in each EOF "mode". We also calculate the amplitude of each mode as a function of time by projecting the TEC maps onto the EOF of each mode (i.e., a dot product of the two vectors). By correlating this amplitude with forcing indices (F10.7, Ap), we will attempt to provide insight into the physical mechanisms involved in different modes. While the approach is appealing, care must be taken to avoid over interpretation of individual modes, since forcing mechanisms are generally not orthogonal, while the eigenfunctions of the EOFs are orthogonal by design (Monahan et al., 2009). In plotting the EOFs and the amplitudes, we divide each EOF by the global maximum absolute value, and we multiply the amplitude by the same value. The plotted EOFs are then normalized to a maximum absolute value of 1 and the amplitudes (in TECU) have magnitudes similar to the global maximum TEC anomalies. Previous studies have also looked at TEC EOFs. For example, Chen et al. (2015) examined EOFs using monthly TEC averages over North America. They found the first mode constitutes 97.5% of the total variance and represents the spatial and diurnal variations. Talaat & Shu (2016) examined EOFs using eleven years of daily averaged TEC data and found the first (first four) EOFs to explain 89% (98%) of the total variance. Zhong et al. (2019) examined~10 years of topside ionospheric and plasmaspheric TEC data and found the first (first five) EOFs to account for 95.5% (98.8%) of the total variance. Vaishnav et al. (2019) examined EOF analysis of 19 years of global TEC maps and found the first three EOFs explain 99% of the total variance. Zhang et al. (2021) examined the EOF decomposition of monthly electron density using 11 years of radio occultation profiles, looking at the variations with the season as well as the solar cycle, and they found the first five EOFs to account for 98% of the variance. These studies show a strong structure of TEC variability by a few modes. One unique aspect of our analyses is that we group the data by month, rather than long time periods, so that we avoid the large-scale variations associated with the solar cycle. Another is that we fix the UTC time, which eliminates the obviously dominant diurnal cycle. After calculating the EOFs, we can remove one or more of the modes from the TEC time series and recompute the TCCs to ascertain if the large-scale structures have been reduced.
For illustration, the first three EOFs for the 2-31 January 2014 SAMI3/WACCM-X time series at 1200 UTC are provided in Figure 10. The first mode accounts for 60.2% of the total variance in the time series, while the second and third modes account for 9.1% and 6.8%, respectively. The spatial pattern in Figure 10a shows a maximum over Africa and generally high values over most of the sunlit region. There is also a broad "tongue" of high value that extends zonally across southern Asia and the Pacific Ocean. The amplitude of this mode can be calculated by projecting the pattern in Figure 10a onto individual daily TEC maps. The result is shown with a black line in Figure 10d, which has large values in the first few days of January, sloping down to a minimum on day 17 (15 January), and sloping upward the rest of the month. Also plotted in Figure 10d are the daily F10.7 solar index (red line), linearly scaled to provide the best fit to the EOF1 amplitude, along with a similarly scaled Ap index (blue line). There is a high degree of correlation (0.967) between EOF1 amplitude and F10.7, suggesting that it is the solar forcing in SAMI3 that is largely responsible for this mode. We also correlate EOF1 amplitude with the Ap daily index and obtain a low correlation value of 0.115. The second and third EOFs are plotted in Figures 10b and 10c, and these show generally smaller-scale structures that are both positive and negative. Neither F10.7 nor Ap shows a strong correlation with the amplitudes of the second and third modes, as seen in Figures 10e and 10f. That the F10.7 correlates well with TEC variations is consistent with other studies such as Liu & Chen (2009), who examined the correlation of JPL/GIM TEC with solar indices using quadratic regression. They found linear coefficients as high as 0.9, with significant diurnal and seasonal variability. Bergot et al. (2013) found similarly high correlations between TEC (from the Centre for Orbit Determination in Europe, CODE) and solar indices and derived an empirical model for TEC prediction based on F10.7. Vaishnav et al. (2019) provide a detailed EOF analysis of global TEC and found strong correlations of the first principal component with F10.7, while the third component correlated well with Ap and Kp indices. They also examined correlations with a number of other solar indices such as Mg II and He II. Figure 11 shows the first three EOFs for SAMI3/climo at 1200 UTC for 2-31 January 2014. The leading EOF (Fig. 11a), which captures 86% of the variability, is also strongly correlated (r = 0.962) with daily F10.7, and the spatial structure is similar to the EOF1 for SAMI3/WACCM-X (Fig. 10a). Much of the remaining SAMI3/climo variability is D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 split between EOF2 and EOF3, which account for 7.6% and 2.9%, respectively, of the variability. These two EOFs correlate fairly well with the Ap index (r = 0.670 and r = 0.627, respectively). As the magnetic indices are major drivers of the NRLMSISE-00 and HWM14 variability, we infer that it is the neutral densities and winds that are largely responsible for the second and third EOFs. As verified in offline calculations (not shown), a pure longitudinal westward propagating sine wave representing a semi-diurnal tide would be split into two EOFs in our approach. So it is likely that the same forcing is largely driving EOFs 2 and 3 in SAMI3/climo. We note that the first three EOFs combine to determine 97% of the TEC variability for SAMI3/climo for January 2014.
The EOFs for JPL/GIM at 1200 UTC for 2-31 January 2014 are shown in Figure 12. Unlike the SAMI3 EOFs, the JPL/GIM data show that the amplitude of the first EOF is not strongly correlated with F10.7 (r = 0.056), suggesting that there is another physical mode of variability that is dominating. The second EOF is more highly correlated with F10.7 (r = 0.703), and this mode accounts for only 15.3% of the variability. The JPL/GIM EOFs at 1800 UTC (Fig. 13), however, do show the first EOF highly correlated with F10.7 (r = 0.788), accounting for 31.7% of the variability. More detailed observational and modeling analyses would be necessary to understand the exact physical causes of each EOF.

Correlation width calculations
We next remove the leading mode from the TEC time series by subtracting the product of the EOF1 amplitude and the spatial EOF1 for each day. Then we repeat the TCC calculations with the filtered data. The resulting filtered TCCs for 2-31 January 2014 are shown in Figure 14. When compared with the original unfiltered TCCs in Figure 6, it is clear that the filtered TCCs have a much more compact structure and less diurnal variability. The baseline values are also much smaller, as seen in fits to the longitudinal TCC line plots (Supplemental Fig. S7), which show the base values now range from 0.11 to 0.35 for SAMI3/climo, À0.05 to 0.01 for SAMI3/WACCM-X, and À0.07 to 0.04 for JPL/GIM. The EOF filtering apparently removed much of the large-scale correlation structure associated with F10.7 forcing in SAMI3. We note that while F10.7 is usually the dominant forcing, as seen by inspection of individual EOF1 amplitude/F10.7 correlations, there are months when the F10.7 influence is reduced. We will examine this in more detail in Section 4.4. While Figure 14 shows more localized correlations than the TCCs made with unfiltered TEC, there are still large-scale correlations for SAMI3/climo, including negative correlations in the southern extratropics. These are likely caused by large-scale correlation structures in the tidal modes of the HWM that translate to the TEC variability.
We next examine the widths calculated after filtering out EOF1. The variations with month for 2014 for SAMI3/ WACCM-X are shown in Figure 15 for all four coordinates. We note that all longitude widths are calculated at the Equator, and we show the widths here in degrees of great circle distance (GCD) rather than hours (1 h = 15°GCD at the Equator). There does not seem to be any correlation between these widths with r F 10:7 , since the EOF1 has been removed from the time series, so we have not included r F 10:7 in these plots. We also looked for Fig. 11. Same as Figure 10, but for SAMI3/climo. D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 correlations with Ap but did not find any obvious relationships. It is likely that the variability is due to different dynamical conditions in the different months within the WACCM-X experiment. At some times (2000, 2200 LT), there is very little variability from month-to-month, while at other times (0000, 0200 LT) the variability is strong. Longitudinal widths range from~20 to 80°and are generally larger than latitudinal widths, which range from~10 to 40°. Mlon and Mlat widths show similar variations to their geographic counterparts but are generally slightly smaller. The longitudinal and Mlat widths (after removing the EOF1) as a function of the month for all three datasets are shown in Supplemental Figure S8. Red lines indicate different local times, and the blue lines are the means over all local times. The SAMI3/WACCM-X mean longitudinal widths range from~30°to 45°. The JPL/GIM mean widths are smaller (~30 to 40°) and show less monthly variation. SAMI3/climo mean widths are significantly larger in all months, reaching~100°i n March. This is expected due to weaker longitudinal variability in the HWM. In the Mlat direction, SAMI3/WACCM-X and JPL/GIM have similar mean widths of~15-20°, while SAMI3/ climo shows significantly larger widths, particularly in March and April.
As seen in Figure 16, there is a large variation of width with LT for the SAMI3/WACCM-X data for 2014. This can be seen most clearly in the longitude direction (Fig. 16a), in which the width peaks from 0000 to 0800 LT, but is relatively constant for the rest of the day. In the latitudinal direction (Fig. 16b), the diurnal variation depends on the month. In some months the width peaks around 1200 LT, while in other months it peaks in the late afternoon/evening hours. April is an anomaly with very high values occurring for the latitudinal widths from 0600 to 1200 LT. Similar results are seen for widths in the Mlon and Mlat directions (Figs. 15c and 15d). The diurnal variations of the widths for all datasets are shown in Supplemental Figure S9. Longitudinal widths for all three datasets are larger in the morning hours, peaking from 0200 to 0600 LT. SAMI3/WACCM-X and JPL/GIM have a broad minimum from 1200 to 2000 LT, while SAMI3/climo has a secondary peak at 1600 LT. SAMI3/WACCM-X has larger values than JPL/GIM during the morning, but similar values over the rest of the day, while SAMI3/climo has larger values at all times. The Mlat mean widths for SAMI3/climo and JPL/GIM show strong peaks at 0800-1000 MLT. SAMI3/WACCM-X also shows higher values in the late morning, but not as sharp as for the other datasets.
To summarize these results, we find that the SAMI3/ WACCM-X TCCs show a structure that consists of a baseline global correlation with values that vary consistently with the amount of variation in the incident solar flux used to drive SAMI3. Much of this correlation can be described by the first EOF mode, which maximizes in the sunlit regions and has an amplitude that is highly correlated with F10.7. Removing this EOF allows us to examine the correlation structure on shorter scales. These correlations vary considerably by month and over the day. This is likely due to variations in the neutral atmosphere that is driving the ionosphere.
In contrast to baseline correlations, where we have not been able to find references for comparison, there have been several studies that have examined correlation length scales for TEC. For example, Klobuchar & Johanson (1977) estimated the  Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 correlation distance of mean electron content using day-to-day variability of station pairs of TEC observations using VHF signals from geostationary satellites. They calculated a zonal (meridional) correlation distance of~2900 km (~1800 km), using a correlation cutoff of 0.7. Yue et al. (2007) used JPL/GIM data to estimate correlation distances as a function of Mlat and LT for 2000 and 2005, using a correlation cutoff of 0.75. They showed a peak in the meridional correlation distances at low latitudes around sunrise (see Fig. 3b), with values ranging from~11 to 15°. These lengths are somewhat smaller than we obtained, which may reflect differences in using the Gaussian fit versus correlation cutoff methods. In the longitudinal direction, Yue et al. (2007) showed a mean value of 20°near the Equator (Fig. 4), which is also somewhat shorter than our results. Shim et al. (2008) used >1000 ground-based TEC receivers to estimate low latitude zonal (meridional) correlation lengths of 11°(4°), using a correlation cutoff of 0.7. They also showed larger correlation lengths in daytime versus nighttime. McNamara (2009) used ionosondes in Australia and Papua New Guinea to calculate correlations and inferred zonal (meridional) correlation lengths of~1500 km (1000 km). They also separated magnetically quiet days (Ap < 25) from magnetically disturbed days. Liu et al. (2018) examined correlation scales using JPL/GIM data for four different years (2008, 2009, 2014, and 2015). Using a Gaussian fitting method, they found correlation lengths vary with season and year. Low latitude meridional lengths in 2014 ranged from 18°to 40°, while zonal lengths ranged from 72°(winter) to 135°(summer). The very high zonal lengths in the summer are considerably larger than those we obtained with JPL/GIM (Supplemental Fig. S8c). This is likely due to the influence of solar activity, which was shown to be highly correlated with JPL/GIM in summer 2014 (as will be shown in Sect. 4.4). Our approach of filtering out the largescale correlations results in more localized correlation scales. Forsythe et al. (2020a) examined correlation lengths using DTEC from JPL/GIM and showed qualitatively similar diurnal variability at the Equator to what we show in Figure 16 and Supplemental Figure S9, with peaks in the morning hours. Their zonal widths peaked at~20°(our widths peak at~40°) and their meridional widths peaked at~23°(our widths peak at~30°). They also calculated correlation lengths using differences between IRI and JPL/GIM to represent model errors. Their Equatorial zonal correlation lengths vary from~20 to 40°, while the meridional correlation lengths vary from~10 to 20°. The Mlat lengths calculated by Forsythe et al. (2020a) at the Equator show a diurnal cycle peaking in the morning, similar to what is seen in this study.

Conjugate correlations
Up to this point, we have focused on correlations with a reference point at the Equator (geographic and magnetic). If we move the reference point off the Equator, we expect to encounter correlations with grid points that have the same magnetic field line in the opposite hemisphere. These correlations show up most clearly in the magnetic coordinates. Magnetic conjugate D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 Fig. 14. As in Figure 6, except the leading EOF has been removed from the TEC time series before calculating the TCCs. D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 point correlation (CPC, for short) analysis with TEC was also performed by Yue et al. (2007), using JPL/GIM data for 2000 and 2005, using three months of day-to-day differences. They showed strong CPC occurs in all seasons and years, and they also showed a diurnal cycle, peaking in the afternoon. Also, they showed a dip in CPC at sunrise and sunset at higher Mlats due to the time delay between different magnetic conjugate points (as inferred from Fig. 3). CPCs were also examined by Shim et al. (2008), using day-to-day TEC variations for four 30-day periods in 2004 based on >1000 ground-based GPS receivers, and Liu et al. (2018) using JPL/GIM in four different years (2008, 2009, 2014, and 2015).
To illustrate the CPC correlation in SAMI3/WACCM-X, we plot in Figure 17a the TCC for reference Mlat = 20°N for 1400 MLT and date range 2-31 January 2014. While strong correlations occur throughout the daytime hours, there are two main correlation peaks. Besides the main peak at the reference Mlat, there is a secondary peak near Mlat = 20°S. A line plot through this TCC (Fig. 17b) shows the two peaks, with a red circle denoting the conjugate point. It is likely that some of this "conjugate" correlation is caused by the solar variations that simultaneously influence both Mlats, independent of the magnetic connection. As discussed above, we can reduce the solar influence by filtering out the first EOF. The resulting filtered TCC map (Fig. 17c) and line plot (Fig. 17d) show the CPC more sharply than the unfiltered results. We note that the maximum composite correlation in this plot is not exactly at the conjugate Mlat of 20°S. We hypothesize that this may be due to recombination at nighttime occurring faster at higher Mlats, thereby pushing the conjugate correlation peaks equatorward.
In order to determine the CPC value, we search for the maximum composite correlation within ±10°of the conjugate Mlat. As shown in Figure 17d, this point is shifted slightly from the conjugate Mlat. We calculated the CPC for all months and reference MLTs for Mlat = 20°S, and plot the results in Figure 17e. We see that the CPC generally has a diurnal cycle with higher daytime values, peaking at~0.6 near 1600-1800 MLT and minimizing at~0.4 at 0000 MLT. This diurnal variation is consistent with Mlat = 20°N CPC presented by Yue et al. (2007).
In Figure 18 we compare the CPCs calculated from the three datasets for 2014. These are all calculated with the EOF1 filtering and are shown as a function of the month (top row) and MLT (bottom row). The mean monthly CPCs (Figs. 18a-18c) range from~0.3 to 0.7 for SAMI3/climo and~0.5 to 0.6 for SAMI3/WACCM-X. JPL/GIM has somewhat lower values and less month-to-month variability, while SAMI3/climo shows very large variability, peaking in March and June.  are seen in all three datasets, although the timing of the peak varies. JPL/GIM shows a peak at 1200 to 1600 LT, while SAMI3/climo peaks from 0200 to 0800 LT. Yue et al. (2007) calculated CPCs at ±20°Mlat (their This may reflect differences in solar and magnetic variability between the two years, but also may be due to differences in approach. Whereas we use daily values of TEC over monthly  D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 intervals, Yue et al. (2007) used DTEC over three-month intervals. CPCs for 2010 are shown in Supplemental Figure S10. Both JPL/GIM and SAMI3/climo show similar ranges of mean correlations and variability in 2010 as in 2014. Further analysis of the CPCs is warranted to determine the cause of the differences in year and dataset, but overall we find that robust CPC correlations are observed both in SAMI3 and in the JPL/GIM for most months and local times.

Correlations of EOF modes with F10.7 and Ap indices
We conclude by comparing the correlations of EOF modes with the F10.7 and Ap indices. Figure 2c showed correlations of F10.7 with the EOF1 index at 12 UTC for SAMI3/WACCM-X for 2014. With the exception of March, the correlation with F10.7 is high (>0.6) during 2014. Figure 2d shows that the correlations with Ap with EOF1 are generally weak (<0.6) except for February, which is a month with a large Ap standard deviation. Here we will further quantify how these correlations depend on the dataset, month, universal time, and EOF mode. Figure 19 plots the correlation of EOF1 and EOF2 with F10.7 for all months and times (UTC) for the three datasets. Note that at each time when the correlation is significant (at the 95% confidence level, using the p-value from a two-sided T-test), a circle is included on the line plot. Line segments that do not show circles indicate statistically insignificant correlations. The SAMI3/WACCM-X (red lines) shows significant correlations of F10.7/EOF1 at all local times for all months except February, where three universal times (0000, 0200, and 0400 UT) are not significant. Therefore, of the 144 possible month/time combinations, 141 (98%) have the EOF1 significantly correlated with F10.7. The JPL/GIM EOF1 generally correlates very well with F10.7 in April-December but does not correlate as well in the first part of the year. In total, 114/144 or 79% of the month/time points are significant. Supplemental Figure S11 shows the correlations of EOF1 and EOF2 with F10.7 for 2010 for SAMI3/climo and JPL/GIM. Significant correlations are observed for a wide range of months and times, although the peak correlations are smaller than the peak correlations in 2014. April and June show very small correlations for both datasets, while March, May, and July show very large correlations. D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 We also performed a similar analysis with the Ap index. Figure 20 shows that while EOF1 sometimes exhibits strong correlations with Ap in 2014 (e.g., SAMI3 data for February, JPL/GIM data for August), in general, the correlation is rather weak. For EOF2 the correlation is considerably stronger, showing significant correlations for all data sets in numerous months and times. The SAMI3/climo in particular shows very strong correlations of EOF2 with Ap, which is consistent with the HWM neutral winds being strongly driven by Ap, whereas SAMI3/WACCM-X is driven by more realistic neutral winds from a global circulation model. The Ap correlations with EOF1 and EOF2 for 2010 are shown in Supplemental Figure S12 for SAMI3/climo and JPL/GIM. For the SAMI3/ climo the Ap/EOF1 correlations are generally small and insignificant (except for January and April), while Ap/EOF2 correlations can be quite large (e.g., May through October and December). It appears that even in this year of relatively weak variability the dominant mode is driven by F10.7, while Ap provides a strong secondary forcing. For JPL/GIM in 2010, there are some significant Ap/EOF1 correlations, and these are generally weaker than in SAMI3/climo. The Ap/EOF2 correlations are generally weak for JPL/GIM. Further analysis will be necessary to elucidate the exact physical processes involved in these EOF correlations, but these diagnostics motivate future research in understanding TEC correlations. We note that EOF analyses can be done with multiple variables to examine the variability of different components of the ionospheric system.

Summary and conclusions
This study examined composites of low-latitude TEC spatial correlations as a function of (1) three datasets, (2) model grid, (3) local time, (4) month, (5) year, and (6) Equator versus sub-tropics. The results showed the SAMI3/climo, driven by empirical neutral fields, had unrealistically large correlation structures, while SAMI3/WACCM-X, driven by more realistic winds based on a dynamical model, showed correlations that agreed better with the analyses from JPL/GIM. Base correlation values showed a strong correlation with solar variability in all datasets, as well as diurnal variability, with peak values in the late morning to evening hours. Removal of the base correlations D.R. Allen et al.: J. Space Weather Space Clim. 2023, 13, 7 was attempted by filtering the leading EOF from the TEC dataset. This resulted in more distinct Gaussian shapes that could be fitted to zonal and meridional widths. The zonal widths showed a semi-diurnal variation, while the meridional width varied diurnally. Conjugate correlations were also examined. Both the SAMI3 runs and JPL/GIM analyses showed strong correlations between magnetic latitudes of 20°N and 20°S. The SAMI3/WACCM-X and JPL/GIM showed a weak diurnal cycle in the conjugate correlations, peaking in the late afternoon, which resembled the meridional base correlations.
The results from this study show that all three datasets examined have Equatorial correlation structures that can be characterized by a globally-constant base value and a local width. These two features can often be separated by the removal of EOFs describing correlations associated with solar variability. In application to DA, this suggests that it may be possible to break the analysis problem into a global part that solves the amplitude of one or more leading EOFs and a local part that could be solved using traditional localization methods. Off the Equator, there is a secondary issue of the conjugate correlations. As far as we know, none of the current ionospheric DA methods deals specifically with this feature, except perhaps indirectly in cases where localization scales are large enough that ensembles capture this correlation. However, DA systems are generally formulated in the geographic reference frame. Moving to a geomagnetic reference frame may provide a way forward since correlations could be carried along magnetic field lines. For SAMI3, which separates along-field and cross-field motion, this approach could be very appealing. However, characterizing the BECM based on full 3-D electron density fields rather than 2-D TEC fields will be much more challenging.
Future work will involve analyzing the TEC correlation structures in the middle and high latitudes. In addition, we would like to analyze the EOF structures in more detail to understand how these analyses and forecasts respond to different forcings, including additional magnetic indices. Finally, we plan to compare these analyses with the correlation structures of actual SAMI3 ensembles used in a DA system. The knowledge gained from the work done in this paper will help to inform how best to generate ensemble perturbations for ionospheric DA as well as localization methods for reducing spurious correlations.
Supplemental Figure S2. Supplemental Figure S3. SAMI3/climo and JPL/GIM TEC correlation composites for 2-31 January 2010 as a function of latitude and LT. The reference point is marked by the black dot. It is located on the Equator, and the reference LT varies with panel (0000, 0600, 1200, and 1800 are shown). Thick black lines indicate correlation of 0.5, 0.7, and 0.9. Thin black lines are SZA on 16 January 2010 at 10°, 20°, . . ., 90°.
Supplemental Figure S4. SAMI3/climo and JPL/GIM line plots of TCC versus relative LT (black) for 2-31 January 2010 at four reference local times (0000, 0600, 1200, and 1800 LT). The reference latitude is the Equator. The Gaussian fits are shown by the dashed blue lines, with the two fitted parameters shown on the bottom of each plot. Base is unitless and width is in hours LT.
Supplemental Figure S5. Supplemental Figure S7. SAMI3/climo, SAMI3/WACCM-X, and JPL/GIM line plots of TCC versus relative LT (black) for 2-31 January 2014 at four reference local times (0000, 0600, 1200, and 1800 LT). The leading EOF has been removed from the TEC time series before calculating the TCCs. The reference latitude is the Equator. The Gaussian fits are shown by the dashed blue lines, with the two fitted parameters shown on the bottom of each plot. Base is unitless and width is in hours LT.
Supplemental Figure S11. Correlations of the EOF1 and EOF2 amplitudes with F10.7 as a function of time (UTC) and month of 2010. If the correlation is significant (p < 0.050 is 95% confidence level) then a circle is included on the plot. The colors indicate the experiments SAMI3/climo (black) and JPL/GIM (blue). Thick lines and filled circles are for EOF1, while thin lines and open circles are for EOF2.
Supplemental Figure S12. Correlations of the EOF1 and EOF2 amplitudes with Ap as a function of time (UTC) and month of 2010. If the correlation is significant (p < 0.050 is 95% confidence level) then a circle is included on the plot. The colors indicate the experiments SAMI3/climo (black) and JPL/GIM (blue). Thick lines and filled circles are for EOF1, while thin lines and open circles are for EOF2.