Automated Coronal Hole Identification via Multi-Thermal Intensity Segmentation

Coronal holes (CH) are regions of open magnetic fields that appear as dark areas in the solar corona due to their low density and temperature compared to the surrounding quiet corona. To date, accurate identification and segmentation of CHs has been a difficult task due to their comparable intensity to local quiet Sun regions. Current segmentation methods typically rely on the use of single EUV passband and magnetogram images to extract CH information. Here, the Coronal Hole Identification via Multi-thermal Emission Recognition Algorithm (CHIMERA) is described, which analyses multi-thermal images from the Atmospheric Image Assembly (AIA) onboard the Solar Dynamics Observatory (SDO) to segment coronal hole boundaries by their intensity ratio across three passbands (171 \AA, 193 \AA, and 211 \AA). The algorithm allows accurate extraction of CH boundaries and many of their properties, such as area, position, latitudinal and longitudinal width, and magnetic polarity of segmented CHs. From these properties, a clear linear relationship was identified between the duration of geomagnetic storms and coronal hole areas. CHIMERA can therefore form the basis of more accurate forecasting of the start and duration of geomagnetic storms.


Introduction
The Sun holds a high impact for space weather phenomena through the interactions of solar wind streams with Earth's magnetosphere which can result in geomagnetic storms (Tsurutani et al., 2006). Solar wind streams can damage satellites through differential and bulk charging, and geomagnetic storms may interfere with electrical power networks via geomagnetically induced currents (Blake et al., 2016;Boteler, 2001;Huttunen et al., 2008;Marshall et al., 2012). High speed solar wind streams can be traced back to low density, dark, open magnetic field regions of the corona, known as coronal holes (CHs); (Altschuler et al., 1972;Antonucci et al., 2004;Cranmer, 2009;Fujiki et al., 2005). Here, a new detection algorithm is discussed which segments coronal boundaries from images of the solar corona to ultimately improve predictions of solar wind streams and their properties at 1 AU, and hence, their impact on the Earth's magnetosphere. When observing the corona, CHs appear as dark regions on the solar disk and off the solar limb (see Figure 1), particularly in the 193 Å range due to their low density (∼4×10 8 cm -3 ) and temperatures (∼0.7 MK) compared to the surrounding quiet Sun regions (∼1.6×10 9 cm -3 and ∼1 MK respectively; Phillips, 1995). CHs, while not visible in any other atmospheric solar layer, are indi-cators of open magnetic field regions which extend far into the heliosphere (Krieger et al., 1973). These extended fields give CHs an ill-defined cut off height and enable CHs to be observed extending far out to the outer limits of the low corona. These high altitude CHs are observed off the solar limb coinciding with near-limb, on-disk CHs, and this phenomena is particularly visible coinciding with polar CHs.
Accurate detection and segmentation of CHs is made difficult by their irregular profile and comparable intensities to the nearby local quiet Sun regions. Detection becomes further difficult due to off-disk-center CHs often becoming obscured or occulted by brighter, high-density features such as streamers or active region loops. Initially, CH segmentation was done by eye based on two-day averages of He I (10830 Å) images and magnetograms (Harvey and Recely, 2002) taken by the Kitt Peak Vacuum Telescope (KPVT). This method adopted the use of magnetograms to separate CHs from other dark coronal features due to the uni-polar nature of CHs and is still popularly used (Cranmer, 2009). Henney and Harvey (2005) attempted to automate this detection process and, in turn, created the first automated CH segmentation algorithm. Post-launch of the Solar and Heliospheric Observatory (SOHO), new automated segmentation algorithms were developed that used the clarity of CHs in the 195 Å passband for identification purposes. Scholl and Habbal (2008) implemented the primary multi-passband detection method. Contrast enhanced images in the 171 Å, 195 Å, and 304 Å passbands from the Extreme Ultraviolet Telescope (EIT) on-board SOHO were used to differentiate CH 'candidate features' from quiet Sun and active regions. These candidate features were then separated into CHs and filaments by comparison with the Harvey and Recely (2002) He I coronal hole maps and corresponding magnetograms. The Coronal Hole Automated Recognition and Monitoring (CHARM) algorithm, developed by Krista and Gallagher (2009), uses a histogrambased intensity thresholding technique to differentiate dark coronal regions from quiet Sun regions using only the 195 Å or 193 Å passbands from SOHO or the Solar Dynamics Observatory (SDO) respectively. These dark coronal candidates were separated into CH and non-hole regions through the use of magnetograms. The Spatial Possibilistic Clustering Algorithm (SPoCA; Verbeeck et al., 2014) utilizes fuzzy clustering algorithms and distinctions on area and lifetime of features to differentiate CHs from other features. This algorithm is available for viewing at helioviewer.org. Reiss et al. (2015) designed a machine learning algorithm which uses a global intensity thresholding method and the SPoCA algorithm to separate CH features from surrounding regions. Recently, Boucheron et al. (2016) developed a detection algorithm which allowed for a varying boundary to search for a maximal change in average intensity between regions inside and outside of this varying contour.
Here, a new method for the detection and segmentation of CHs is discussed. This method is based on multi-thermal intensity segmentation across three EUV wavebands (171 Å, 193 Å and 211 Å) from the Atmospheric Imaging Assembly (AIA; Lemen et al., 2012) on-board SDO (Pesnell et al., 2012). This method is used in an automated CH segmentation algorithm, here in named the Coronal Hole Identification via Multi-thermal Emission Recognition Algorithm (CHIMERA). CHIMERA analyses intensities in these three wavebands to estimate the temperature and density of individual pixels and then segments pixels with similar properties to a CH. Next, magnetogram analysis removes smaller non-CH regions that can fall within these constraints by excluding features that don't exhibit the unipolar nature of CHs. In this paper, the stability and accuracy of CHIMERA is demonstrated, and the outputs available for comparison with solar wind measurements at 1 AU are discussed. To categorize a large database of CHs, CHIMERA is automated and runs daily on solarmonitor.org, providing one CH map per day.

Observations
CHIMERA was developed for use with image sets from AIA and HMI onboard SDO, namely the wavebands: 171 Å, 193 Å, and 211 Å and line of sight magnetograms hmi.M 720s. These EUV passbands were centered on the spectral emission lines of FeIX (171 Å), FeXII (193 Å), and Fe XIV (211 Å) and cover a peak temperature range of 7×10 6 -2×10 7 K. Each passband takes 4096×4096 pixel images at a cadence of 12 seconds for AIA and 12 minutes for low noise HMI enabling accurate, time precise feature segmentation. Due to the unique characteristic temperature responses and emission across the AIA wavebands of each coronal feature, it is possible to segment an individual feature by the ratios and magnitudes of emission in each waveband relative to each other. Here, the extraction of CH boundaries from the background coronal features is described by segmenting pixels that exhibit unipolarity, low temperature and low relative density compared to surrounding quiet Sun pixels.

Data Analysis
In this Section, the operation of CHIMERA and the methods used to extract CH boundaries from the three AIA wavelengths (171 Å, 193 Å and 211 Å) and one simultaneous snapshot from the Helioseismic and Magnetic Imager (HMI; Schou et al. 2012;Scherrer et al. 2012) is discussed. Figure 2 shows a flowchart describing the segmentation and verification steps of CHIMERA, and is explained further in the following sections: Multi-thermal intensity segmentation (Section 3.1),  , and (f), respectively. Colour in these zoom-ins denotes the number of solar disk pixels emitting at these wavelengths, with orange/red implying a large number of pixels and purple/blue implying few. constructed from procedures found in the SolarSoft library (Freeland and Handy, 1998) with the Interactive Data Language (IDL) programming language.

Multi-thermal Intensity Segmentation
Due to the thermal and density properties of coronal features, each feature has a distinct emission spectra. For example, CHs have a low relative temperature and density, hence they have peak emission in the 171 Å passband and low intensity across other passbands. Figure 3 illustrates an intensity cut across a disk center CH and the contrast, calculated using the Michelson contrast equa-tion (Equation 1; Michelson, 1927), between this CH to the surrounding corona plasma, which is only visually noticeable in the 171 Å, 193 Å and 211 Å passbands.
For this equation, I max is the mean intensity of non-CH pixels near this CH, and I min is the mean intensity of CH pixels within the CH center. By comparing intensities of individual pixels across these three passbands, it is expected for coronal features to cluster by their distinct emission spectra. Figure 4 (a, c, e) shows three scattergrams comparing the intensities of individual on-disk pixels across these three passbands for a selected date, 31 October 2016. Notably, a high density of pixels, contained within the red boxes in Figure 4 (a, c, e), exhibit low levels of emission. By taking a close examination of these low intensity pixels, 2D histograms, presented in Figure 4 (b, d, f), can be created to demonstrate the clustering of points with specific intensities in the analyzed passbands, with red colours implying a higher density of points, and purple/blue implying lower densities. Two regimes appear to exist in each histogram: a large, hotter cluster exists caused by a large number of quiet Sun pixels typically present on the Sun. Smaller clusters also exist, separated from the quiet Sun clusters by valleys, which have a preferred emission in the 171 Å wavelength. This separation of regimes is as expected from the typical temperature and densities of CHs and the ambient corona.
Finding an optimum fit to the minimum valleys between the two regimes can be used to separate CH candidates from non-CH regions. This valley is found for I 193 v I 171 and I 211 v I 171 by finding a number of minima in I y for fixed values of I x . Due to the irregular shape of the valley in I 211 v I 193 , the minima for this instance are instead calculated along lines with slopes parallel to the slope of the dominant intensity-occurrence ridge over the entire I 211 v I 193 regime (implemented here as ∼I 211 =(0.3)I 193 ). Converting these 2D histograms to log-space extends the valleys existing between the clusters, as shown in Figure 5 (a,c,e). Fitting a line using a least square regression to the minima of the valley in log-space enables the construction of an optimum segmentation line, as follows in Equations 2 and 3, where c = I y (I x = 0), m represents the slope of a linear equation, I y and I x are variable intensity measurements from a solar image. A segmentation line created by this method allows a variation in both slope and shape in intensity-space, as illustrated in Figure 5 (b,d,f). Feature intensities across these wavebands can vary due to instrument degradation, or may vary with the solar cycle. To accommodate these potential variations in intensity, the segmentation line is altered relative to the average intensity of all on-disk pixels for each wavelength. In the case of low intensity in the 193 Å passband the algorithm alters the line to be more stringent with the 193 Å intensities. This is done by multiplying the line (along the I 193 axis) by: (< I 193 >)/(< I m >), where < I m > is the measured mean intensity of on-disk pixels for 31 Oct 2016 in the 193 Å passband. This method is similarly performed for the I 171 and I 211 measurements. This method gives a simple, rapid, and robust change to the line parameters to continue accurate segmentation of CHs. By taking all regions with intensity ratios below these segmentation curves as CH candidates, three segmented maps are obtained showing all regions that fit these selection criteria.  (a) displays a tri-color AIA image composed of the 171 Å (blue), 193 Å (green) and 211 Å (red) passbands for 31 October 2016, while the three segmented maps shown in Figure 6 (b, c, d) are created by the three segmentation lines from Figure 5 (b,d,f). Notably, these maps have similarly segmented regions, however each is inclusive of additional features not present in the other two segmentations, such as a large filament region being detected in both the I 211 v I 193 and I 193 v I 171 segmentations but being absent in the I 211 v I 171 segmentation. Individually, these maps are insufficient to completely segment CH regions from other corona features.

Verification Processes
As seen in Figure 6 (b, c, d), each segmented map is excessively inclusive of CH candidates. However, incorrectly segmented regions are not omni-present in these segmentation maps, while CH regions are. A logical conjunction of the three segmentations removes these extra features while maintaining the detected CH pixels. The output binary map of this process is presented in Figure 7, having removed the majority of mal-detections while maintaining CH detections.
From this newly constructed binary map potential erroneous detections are further excluded by rejecting candidates below 1000 arcsec 2 (∼30"×30") in size. Small areas could be the result of spurious detections caused by overlapping features or short lived events. This minimum area cutoff follows from the idea of CHs being expansive regions of open magnetic field. This process removes the majority of erroneous detections, however any remaining non-CH regions are removed through the use of HMI magnetograms and the unipolar nature of CHs. This is completed through Fig. 7. Through the logical conjunction of the three segmentations in Figure 6 (b, c, d), CH candidates are obtained from CHIMERA that match the thermal properties exhibited by CH regions on 31 October 2016.
an approximation of the mean magnetic field of the CH candidate. Any candidates not unipolar in nature are removed from detection, i.e when the mean radial magnetic field strength averaged over the pixels enclosed within the CH candidates boundary is approximated to zero, <B> r ≈0 G. The threshold for this cut is calculated relative to the total area of the CH candidate, with the highest stringency being placed on smaller candidates. A small CH candidate is accepted if <B> r is greater than 1 G, and a large CH candidate with an area above 60000 arcsec 2 is accepted if <B> r is greater than 0.1 G. This varying stringency prevents large candidates being excluded due to a wider range of polarities being present within their boundaries. These thresholds were found from empirical measurements of magnetic properties of the candidates that were present after the previous conjunction and minimum area cut-off. After these final verification steps, segmented CH regions are obtained, as displayed in Figure 8.

Off-Limb CH Detections
In CHIMERA's segmentations some detections exist off the solar limb. These high altitude detections are caused by the open magnetic fields that exist within CH boundaries extending far out into the high corona causing a similar low density and temperature region. These regions, known as plumes, trace out a similar surface shape as that of on-disk CH regions when projected back onto the coronal surface and have matching thermal properties to that of a CH. Furthermore, these plumes are seen to appear off the polar limbs coinciding with polar CHs and off the eastern solar limb pre-appearance of an on disk CH. Thus, it is concluded that CHIMERA is capable of detecting the high altitude component of surface CHs. This enables the use of detected regions off the eastern limb as precursors, which can give insight to the appearance, and potential latitudinal width of a CH, before it rotates onto the near side of the solar disk.

Results
In this section the accuracy, stability and applications of CHIMERA are discussed, as well as the exact properties calculated for every CH, and potential relations that can be observed between CHs and the effects of the solar wind at 1 AU.

Coronal Hole Identification
CHIMERA efficiently and effectively extracts CH boundaries in a runtime of ∼30 seconds through the use of the 171 Å, 193 Å, 211 Å passbands and HMI magnetograms, and the resulting output is shown in Figure 8. The level of accuracy CHIMERA can attain is demonstrated through a comparison between CH boundaries extracted by CHIMERA to other extractions methods for 24 October 2016 in Figure 9. This figure displays (a) a manually segmented CH map from the National Oceanic and Atmospheric Administration (NOAA) Space Weather Prediction Center (SWPC) 2 , (b) a similar manually segemented map from the Met Office Space Weather Operations Center (MOSWOC) 3 , (c) a segmented map from the Automatic Solar Synoptic Analyzer (ASSA) 4 , (d) a segmented map from the Spatial Possibilistic Clustering Algorithm (SPoCA) 5 , (e) the result of a Potential Field Source Surface (PFSS) 6 extrapolation for comparison of methods with open magnetic field regions, and (f) a segmented map from the CHIMERA 7 method of segmentation. As can be seen in this comparison, CHIMERAs detailed segmented CH boundaries follow a similar shape to that of hand-segmented maps, furthermore, CHIMERA visibly better prevents the fracturing of CH detections than that of the SPoCA algorithm. Due to this high level of correlation of CHIMERA's segmentation ability to that of historically accurate segmentation by eye methods, it is concluded that CHIMERA is capable of accurately segmenting CH from the quiet Sun. The stability of CHIMERA's detections are demonstrated in Figure 10, where three solar images segmented by CHIMERA for the consecutive days 29-31 January 2017 (a-c) show the short-term stability of CHIMERA and (d) shows the mid-term stability of CHIMERA by comparing the normalized projected CH area of six randomly selected CH regions between January 2016 to July 2017 compared to their centroid longitude. In this plot small scale variations are due to projection effects of the irregularly shape CH regions. The short-term stability images were selected due to the presence of a large, polar CH (CH1) passing through the central meridian, which caused a minor storm at 1 AU, three days later. This storm registered as a Kp5o storm, according to the World Data Centre for Geomagnetism (WDCG), Kyoto 8 . Stability in detections is clearly visible, with each image being a logical one day rotation of the previous with minor boundary morphing expected of a large CH region over a daily timescale.

Coronal Hole Property extraction
From these CH segmentations, CH properties that may hold the highest impact on solar wind properties at 1 AU can be calculated. Table 1 illustrates all properties extracted from CH regions by CHIMERA.
Each property extracted by CHIMERA gives some insight into the current and potential geoeffectivity of a CH. CH area gives possible estimation of the high speed solar wind duration (Krista and Gallagher, 2009;Krieger et al., 1973) and velocity (Nolte et al., 1976), CH extent and positioning can give further insight into duration, as well as arrival time of solar wind streams (Cranmer, 2002), and magnetic polarity and flux are commonly associated with potential geo-effectivity of solar wind streams. Making these CH properties readily available outputs of CHIMERA allows a statistical analysis of the detected CHs and the solar winds they create, an example of which is displayed in Figure 11. This figure draws a relation between CH area, geomagnetic storm duration and peak storm intensity, taken from the WDCG. Displayed is a best fit line, ∆t = 0.26±0.01A + 0.53±0.11, correlating these two properties where ∆t is the geomagnetic storm duration in days and A is the coronal hole area in percent of the solar disk. From this relationship estimations can be made on the geomagnetic storms a given coronal hole will produce, and, as expected, larger coronal holes will typically produce longer geomagnetic storms.
CHIMERA is however not without limitations, like all automated systems. A predominant weakness in CH detection algorithms is the inability to detect CH regions when occulted by brighter, denser regions. Due to pixel intensities being an integration along the line of sight, pixels observing across equal projection of CH and quiet Sun regions will be seen as quiet Sun by CHIMERA. Normally this phenomenon is of little to no importance, but due to CHs being a low density feature they are more easily occulted than other coronal features. Furthermore, with solar wind properties having a high correlation with CH extent and area, inaccurate estimations can cause incorrect estimations of geo-effectivity of CHs.  Table 1. CH properties extracted by CHIMERA. Fig. 11. A comparison of measurements of CH area, the duration of the geomagnetic storms they produced and their peak Kp index taken from the WDCG. The best fit equation is displayed in the upper left, where ∆t is the geomagnetic storm duration in days, and A is the coronal hole area in percent of the solar disk.

Conclusions
A new fast (∼30 second runtime) and robust CH identification technique has been developed which uses HMI magnetograms and the three AIA passbands 171 Å, 193 Å, and 211 Å to segment CH boundaries. The algorithm classifies every pixel in each image by the exhibited thermal and density properties across these 3 passbands. This unique classification method can be further extended to classify other coronal features such as filaments, which are similarly dark to CHs but have a peaked emission in the 211 Å passband and are bipolar in nature. Upon completion of segmentation, CHIMERA calculates the centroid, area, latitudinal and longitudinal extent, CH percentage coverage, and many other properties for comparison with solar wind measurements at 1 AU and predictions of upcoming events up to the following 10-14 days.
The primary limitation of this segmentation method is classification of partially or fully occulted CHs. Low density and dim coronal features are highly susceptible to occultation from brighter features, this is particularly prevalent closer to the solar limb where occultation is more common. This effect implies a CH can seem to disappear before it reaches the solar limb, which can cause a CH to become undetectable 1-2 days before expected. These effects can cause errors in boundary segmentation and the calculation of near-limb CHs properties, however, it is currently impossible to accommodate these effects without the use of multiple viewpoints. Unfortunately, as of yet, there are no multiple viewpoints of the Sun in the three used passbands, although, this method of segmentation could be extended to other passbands which exhibit similar contrasts between CHs and other coronal features.
CHIMERA can be used to link properties of on-disk CH regions with the properties of their high speed solar wind streams and the geomagnetic storms they produce. From the CH properties extracted by CHIMERA and measurements of geomagnetic activity it is possible to draw correlations between these two phenomenon. Figure 11 shows one possible comparison of CH properties, calculated by CHIMERA, to the properties of the geomagnetic storms they produced. These comparisons can be extended further through machine learning algorithms to draw more relations between CH properties and the properties of the solar wind streams they produce.
Further study is required on the morphology of CHs and their interplanetary effects via the highspeed solar wind. Following on from this work, the multi-thermal analysis method of classification can be applied to other coronal features, allowing the segmentation and detection of filament regions or complex active regions and loop structures. Furthermore, outputs of feature properties can be compared statistically with solar wind effects at 1 AU to improve space weather forecasting methods. With the completion of this algorithm, the future aim for this research is to combine this CH detection software with solar wind models to create a real time emulation of solar CHs and the solar wind they produce up to ∼1 AU. This combined with statistical analyses of solar wind properties will enable accurate predictions of solar wind arrival and lifetimes of geomagnetic storms affecting the Earth's magnetosphere. Furthermore, this method of segmentation will be updated to segment pixels using 3D intensity histograms and tested on GOES-16 X-ray and EUV images in the future.