Leveraging the Mathematics of Shape for Solar Magnetic Eruption Prediction

Current operational forecasts of solar eruptions are made by human experts using a combination of qualitative shape-based classification systems and historical data about flaring frequencies. In the past decade, there has been a great deal of interest in crafting machine-learning (ML) flare-prediction methods to extract underlying patterns from a training set---e.g., a set of solar magnetogram images, each characterized by features derived from the magnetic field and labeled as to whether it was an eruption precursor. These patterns, captured by various methods (neural nets, support vector machines, etc.), can then be used to classify new images. A major challenge with any ML method is the \textit{featurization} of the data: pre-processing the raw images to extract higher-level properties, such as characteristics of the magnetic field, that can streamline the training and use of these methods. It is key to choose features that are informative, from the standpoint of the task at hand. To date, the majority of ML-based solar eruption methods have used physics-based magnetic and electric field features such as the total unsigned magnetic flux, the gradients of the fields, the vertical current density, etc. In this paper, we extend the relevant feature set to include characteristics of the magnetic field that are based purely on the geometry and topology of 2D magnetogram images and show that this improves the prediction accuracy of a neural-net based flare-prediction method.


Introduction
Sunspot active regions manifest as large-scale, high-magnitude, dipolar structures in images of the magnetic field at the surface, or "photosphere," of the sun. They are the source regions for the largest solar magnetic eruptions, which produce flares, coronal mass ejections (CMEs), and energetic particle events that can drive important space weather events. Photospheric magnetograms are 2D samples of the structure of the full 3D solar magnetic field and thus can provide important clues about the increasing complexity of the magnetic field in the lead-up to a magnetic eruptioninformation that can potentially be leveraged for the purposes of prediction. While there has been a great deal of recent work on machine-learning based algorithms for predicting solar eruptions from magnetogram data, the features used by these algorithms have been predominately physics-based: taking the curl of the magnetic field to get currents, computing its gradient, summing up its absolute values, etc.
We propose a new approach to this task: computing the topology and geometry of the structures in 2D magnetograms. Instead of deriving physical quantities from these data sets on a per-pixel basis, or attempting to model the full 3D coronal magnetic field or field line connections from the 2D information that is captured in magnetograms, we formally quantify their structure using the fundamental mathematics of shape. We argue that-even though this procedure ignores the 3D structure or connectivity of the full field-it enhances the predictive skill of current methods. Indeed, it is the current operational practice for human forecasters to predict solar eruptions by examining sunspot images and/or magnetograms and using the McIntosh (McIntosh, 1990) or Hale (Hale et al., 1919) classification systems to categorize active region complexity using alphabetical designations. Each category of active region has a statistical "24-hour eruption probability" derived from historical records that is then reported (following forecaster adjustments for factors such as rate of flux emergence) as the eruption forecast for a particular active region (Crown, 2012). Recent statistical methods extend this approach by using historical flaring rates, together with a Poisson process hypothesis, to develop more complicated models. For example, Gallagher et al. (2002) use the McIntosh classification to determine the probabilities of C-, M-, or X-class flares and Wheatland (2004) use a power law distribution of the flare magnitudes to determine an empirical eruption probability. However, these methods have not seen much use in operational forecasting, primarily because they do not show greater predictive capability than the historical forecasts that use look-up tables.
Recognizing that the magnetic reconnection that triggers eruptions takes place in the upper atmosphere (corona) of the sun, attempts have been made to model the coronal magnetic field using the measured photospheric field as a boundary condition. The simplest method to extrapolate the surface field into the corona assumes zero current, so that the field potential is a solution to the Laplace equation (Barnes and Leka, 2006;Barnes et al., 2005;Wang and Sheeley Jr., 1992). However, potential fields cannot store energy-they are lowest energy states-and thus cannot model the build-up of energy leading to a CME. Other strategies include Nonlinear Force-Free Field (NLFFF) extrapolations (Wiegelmann and Sakurai, 2012;Aulanier et al., 2005;Demoulin et al., 1996;Priest and Démoulin, 1995), which are known to correlate well with the sites of X-ray flare emission from eruptions. NLFFF models are presently the most promising avenue of coronal magnetic field and eruption modeling , but they have a large number of free parameters that require extensive manual tuning. Note that the photospheric boundary conditions are not sufficient to uniquely determine the solution (DeRosa et al., 2009;Metcalf et al., 2008;Schrijver et al., 2006), and hence the utility of these modeled 3D structures for forecasting appears to be limited at present.
An alternate approach models the connectivity between opposite polarity magnetic patches in the photosphere to identify significant structures such as "nulls" and "separatrix surfaces." For example, Barnes et al. (2005) develop a "Magnetic Charge Topology" (MCT) metric that is used to characterize eruption potential. Longcope (2005) reviews the application of topology to inferred field line connections and (Tarr and Longcope, 2012;Tarr et al., 2013) apply these methods to analyze the eruption potential of active regions. While these methods are sophisticated and the structures that they extract are meaningful in the context of solar eruptions, their computational complexity limits their operational application. 1 It is important to note that the use of the term "topology" in the methods described in the previous paragraph refers to the study of the shape and/or connectivity of 3D field lines above the photosphere (e.g., Longcope, 2005). This contrasts with the more general, mathematical definition of the term, which refers to the shape and connectivity of sets of any dimension-of which field lines are one instance. Ideas and techniques from this broader field of topology can be used to address many other important and potentially meaningful properties of solar data. Mathematically, topology distinguishes sets that cannot be transformed into one another by continuous maps with continuous inverses (homeomorphisms). Though this can obscure much of what is commonly meant by structure, its roughness can also be a virtue in that it will eliminate distinctions that could be due to unimportant distortions, e.g., those due to projection of the sun's spherical surface onto a 2D magnetogram. Computational topology, also known as topological data analysis (TDA), operationalizes this highly abstract framework for use with real-world data, which may be noisy and poorly sampled. Though computing the abstract topology from this type of data is not feasible, an aspect of shape that can be computed relatively straightforwardly is homology. In homology, shapes are distinguished according to their pieces, holes, voids, etc. This calculation can be reduced to linear algebra, essentially the calculation of the dimensions of the ranges and null spaces of certain matrices (Kaczynski et al., 2004). TDA has been used in many applications, including coverage of sensor networks (de Silva and Ghrist, 2007), structures in natural images (Ghrist, 2008), neural spike train data (Singh et al., 2008), and even the large-scale structure of the universe (Xu et al., 2019).
To compute the topology of a collection of points that samples an object requires an interpolation scheme to "fill in" the gaps between the points. The theory of persistent homology leverages interpolation to compute a shape as a function of scale. The result is encoded in a plot called a persistence diagram, which can be further processed to yield useful features. For example, such a diagram naturally captures how the structures in a magnetogram change as flux emerges into an active region during the evolution towards a flaring state. By contrast, computational geometrywidely used in computer graphics and computer-aided design-addresses the problem of extracting purely geometric information (line segments, polyhedra, etc.) from a data set. The spatial relationship of the positive and negative polarities in an active region, for example-and particularly their positioning relative to strong "neutral lines" in the photospheric field configuration-are known to be important indicators of flaring (e.g., Schrijver, 2007), and computational geometry can easily capture these properties in formal ways. We will demonstrate in §2 that a combination of topo-logical and geometric analyses can extract meaningful information from a series of magnetogram examples.
In the past decade, the large increase in magnetogram data afforded by space missions and advances in data access have shifted the forefront of flare-prediction research from empirical modeling methods to "data analytic" approaches such as machine learning. Camporeale (2019) summarizes the state of the art in machine learning approaches to space weather applications. In ML-based prediction applications, characteristic "features" of the photospheric magnetic field, sometimes combined with features seen in simultaneous Extreme UltraViolet (EUV) images of the solar corona, are used in a statistical sense to "train" a computational model to predict the probability of an eruption within a given time period (usually 24 hours). One example is a support vector machine (SVM) architecture to perform a binary classification of magnetograms as flaring or non-flaring (Bobra and Couvidat, 2015;Nishizuka et al., 2017;Boucheron et al., 2015;Yang et al., 2013;Yuan et al., 2010). Nishizuka et al. (2017) have also applied decision trees and clustering to the same task. Neural networks, which go beyond simple binary classification by learning complex nonlinear relationships among their inputs, have also been used to great advantage by Nishizuka et al. (2018). A variety of other ML algorithms, such as Bayesian networks (Yu et al., 2010), radial basis model networks (Colak and Qahwaji, 2009), logistic regression (Yuan et al., 2010), LASSO regression (Campi et al., 2019), and random forests or ERTs (Nishizuka et al., 2017;Campi et al., 2019) have also been implemented for solar flare prediction with some degree of success. Nishizuka et al. (2017) and Jonas et al. (2018) include Solar Dynamics Observatory (SDO) Atmospheric Imaging Assembly and EUV image characteristics in addition to magnetogram data in a fully connected neural network architecture. Benvenuto et al. (2018) use Fuzzy C-Means-an unsupervised machine learning method-in combination with some of the mentioned supervised methods for solar flare prediction. Approaches such as Guerra et al. (2018) and Kontogiannis et al. (2018), while not machine-learning approaches themselves, provide statistical tools for evaluating the engineered features in terms of their potential advantage in machine learning models. Finally, while most of the above methods focus on developing ML models using engineered features, recent methods have employed a convolutional neural network (CNN) approach which automatically extracts features from raw magnetogram data that are important to predicting flares (Huang et al., 2018;Park et al., 2018;Zheng et al., 2019).
The sophisticated methods described in the previous paragraph represent the state-of-the-art in ML-based solar eruption prediction. However, they predominately use physics-based features, and a careful, quantitative comparison study shows that none of them are significantly more skilled, and indeed are typically less skilled, than current human-in-the-loop methods employed at operational forecasting offices (Leka et al., 2019a,b;Barnes et al., 2016). Our goal is to improve upon these results using ideas and algorithms from computational topology and computational geometry to quantify the complexity in magnetograms and/or sunspot images. In this study, we analyze magnetogram images using different flux thresholds-sub-or super-level sets, in mathematical parlance-and extract structural signatures that, we conjecture, can be effectively leveraged as elements in a feature vector for machine-learning methods. As evidence in favor of that conjecture, we use high-fidelity vector magnetic field data from the Helioseismic and Magnetic Imager (HMI) instrument (Scherrer et al., 2011) on the NASA Solar Dynamics Observatory satellite (Pesnell et al., 2011) and show that adding shape-based signatures to existing feature vectors improves the 24-hour prediction accuracy of a neural-net based method. This is, to our knowledge, the first time that systematic quantitative measures of the shape of 2D magnetic structures in the photosphere have been developed for the purposes of flare prediction. In a sense, our approach is a mathematical systemization of the current ad hoc McIntosh or Hale classification systems. Though it employs topology, it is very different from the work described above that analyzes the magnetic field-line structure. We focus on the shapes of two-dimensional sets, restricting our analysis to the photospheric magnetic field structures. Our goal is to extract formal shape characterizations that can be leveraged by ML methods to improve flare prediction. We are not attempting to model the coronal magnetic field or determine field line connections between neighboring opposite polarity structures. Our approach differs from existing work on geometric (Schrijver, 2007;McAteer et al., 2010) and topological (Knyazeva et al., 2011;Makarenko et al., 2014) analysis of magnetogram features in that the goal is to derive features that improve MLbased methods for flare prediction rather than to discover any one physical property that is more or less predictive of flaring. We believe that this approach has merit since current operational flare prediction methods-the McIntosh and/or Hale classification systems used by human experts-are fundamentally based on active-region shape and geometry. In addition, Tarr and Longcope (2012) state that 'topological changes' can be shown to precede flaring activity in a typical sunspot active region, suggesting that active region shape, and its evolution, have a fundamental, meaningful connection to the physics of magnetic eruptions.
The following section goes into more depth on how to formulate and deploy computational topology and geometry in the context of HMI magnetograms. Section 3 presents a study of how features extracted from magnetograms using those techniques can improve the prediction accuracy of a specific machine-learning method for flare prediction. We conclude in §4. Fig. 1 shows a series of line-of-sight magnetograms of an active region before and during an eruptive period. In panel (a), the active region is newly emerged and is concentrated into a relatively compact and simple configuration. 2 Such initial emergence configurations store little free energy and are rarely associated with eruptions. However, as more magnetic flux emerges and this active region evolves under the influence of the plasma flows in the photosphere, it is stretched, rotated, and sheared into the complex shape shown in panel (b). While in this complex configuration, the active region produced a strong flare that had major impacts on Earth-based radio reception. The further development shown in panel (c), later in this sunspot's series of flaring events, is characterized by intense "polarity mixing," with positive and negative field in close proximity in highly sheared and stretched shapes.

Capturing the shapes of active regions
Topological Data Analysis or TDA (Kaczynski et al., 2004;Ghrist, 2008;Zomorodian, 2012) is exactly the right foundation for extracting and codifying the spatial richness of these images. Topology is the fundamental mathematics of shape: two sets are topologically equivalent if they are homeomorphic; 3 thus, as is often said, topology does not distinguish between a doughnut and a coffee cup. Unfortunately it is not practical to compute the topology of a set even if one has a complete description of it, let alone when one only has a finite number of samples. Topological data analysis was developed to address these challenges. It formally quantifies the shape of a data set according the so-called Betti numbers: the number of components (β 0 ), holes (β 1 ), voids (β 2 ), etc., in the data. Note that every topological space has a unique set of Betti numbers, but these do not completely classify the topology; for example, they would not capture the twist of a Möbius strip. Of course, a finite collection of points does not really have a "shape." TDA handles this by defining a scale parameter, for example, creating a manifold from a set of disconnected points by enlarging each point into a ball, as sketched in Fig. 2. This leads to the notion of -connectedness: a pair of points is treated as connected if they are within a distance of each other. A set of points connected by a graph with edges of length no more than is called an component. Gauss. The co-ordinates of each point (x, y) on a persistence diagram represent the birth and death times of a β 0 feature. Note that all the features lie above the x = y line in the diagrams since a feature cannot die before it is born. In terms of number of connected components at various scales, these diagrams reveal a clear change in the topology of the field structure around the X2.2-class flare that took place at 0910 UT on 6 September 2017.
dimension: one views dimension as reflecting the scale-dependent growth of the number of data points in a ball. However, TDA does not attempt to compute shape in the limit → 0, as one would for fractal dimension; rather it views the change in shape at finite values as reflecting macroscopic properties of the data (Robins et al., 2000(Robins et al., , 1998. More generally, the connections give rise to a simplicial complex, called a Rips 4 complex; this is essentially a list of these connections and groups of connections. Connections between pairs of points give edges (red lines in Fig. 2) that correspond to 1-simplices. When there is a cycle with three vertices-i.e., three -balls pairwise overlap-the associated triangle (green) is filled in; this is a 2-simplex, etc. The shape of the complex, and the Betti numbers that describe it, vary with the scale parameter: when = 0, β 0 is equal to the number of points in the data set and the Rips complex also contains no higher-dimensional simplices so β 1 = 0. As grows, nearby points are successively joined, causing components to grow and holes to form. For example for = 2 in the figure there are three components and one hole, so β = (β 0 , β 1 ) = (3, 1), but as reaches 3 , two triangles have formed to fill in the hole and β 1 becomes 0. Every hole eventually vanishes as the complex gets filled in, and so for large enough there is a single component with no holes, i.e., β = (1, 0).
All of this information about the spatial scales of the topological features in a data set can be captured in a plot called a persistence diagram (Edelsbrunner et al., 2000). Most -components, for example, have birth and death parameter values-where they appear and disappear, respectively, from the construction. A β 0 -persistence diagram has a point at ( birth , death ) for each component. Such a diagram is shown in Fig. 3 for the series of magnetograms in Fig. 1. Components that still exist at the upper end of the calculation interval-here = 5 pixels-are represented by triangles in the figure. When multiple components have the same birth and death values, the color of the icon in Fig. 3 indicates the number of components with that lifespan. One can also plot persistence diagrams for the other Betti numbers; the large hole to the right of center in Fig. 2(d), for instance, would give a point near ( 4 , 5 ) on a β 1 -persistence diagram, where 5 is the ball radius that causes all of the -balls around it to overlap, while the short-lived hole in panel (b) would be close to the diagonal near 2 . This rich representation of information about the underlying shape that is sampled by a set of data points can, we conjecture, be effectively leveraged by ML-based flare-prediction methods. In the case of the magnetograms in Fig. 1, we have preliminary evidence for this conjecture: the β 0 -persistence diagrams in Fig. 3 reveal a change in the topology of active region AR #12673 more than 24 hours before the X-class flare that took place at 0910 UT on 6 September 2017. To the eye, the change in the overall number of points on these diagrams is quite obvious; more important is the number and location of the points that lie far from the diagonal: i.e., those that persist for wide ranges of the scale parameter . The rich, multi-scale nature of the information captured in a persistence diagram, and the subjective nature of some of the associated definitions-viz., the notion of "far" from the diagonal-makes it a challenge to develop effective formal metrics for describing that structure. This challenge is a current focus in the TDA research community; see page 13. A full treatment of the associated solutions is beyond the scope of this paper.
While the pre-flare change in topology is encouraging, there is another issue here: the threshold used in these calculations to choose which pixels in the magnetogram to treat as points in the complex. Thresholding data into categories involves an awkward choice: rarely is there a crisp boundary between "low" and "high." (Indeed, this is a large part of the drive to use machine learning in data science.) While the radial magnetic field strength in an HMI magnetogram ranges up to about 5000 Gauss, there is no clear notion of what constitutes a good threshold value for defining coronal footpoint boundaries. Moreover, any threshold should be relatively insensitive to the instrumental noise threshold (on the order of 10 G) and the small-scale, background field (on the order of 100 G).
The proof-of-concept persistence diagrams in Fig. 3 were constructed from thresholded images containing only pixels with magnetic field intensity greater than 200 G; i.e., a super-level set of the intensity. Instead of some arbitrary choice, it would be preferable to view the threshold itself as a parameter to vary, thereby obtaining a sequence of super-level sets. An elegant alternative is to simply use the threshold value, rather than the spatial scale, as the persistence parameter. To eliminate the spatial scale parameter of the Rips complex that is appropriate for arbitrary point clouds like that shown in in Fig. 2, we note that magnetograms are pixed-based data. The simplest construction of a complex in this case uses a cubical complex. Here, one simply treats two pixels as connected if they share an edge or a vertex and each represents a flux level that is below the chosen threshold (i.e., sub-level thresholding). Fig. 4 shows a schematic of such a construction; Fig. 5(a-c) shows cubical complexes constructed for three threshold values for AR #12673. Note how the holes in Fig. 5(a-c) form and then fill in as the threshold changes. This gives a different view on persistent homology and a second kind of persistence diagram: one with the threshold value, rather than the connectedness scale, on the x and y axes, as shown in Fig. 5(d). This persistence diagram is far more detailed than the ones in Fig. 3 because it captures structures at a range of thresholds, and thus is more affected by the complexity of the structure of this active region. The patterns in this plot-the Fig. 4: In a cubical complex, above-threshold pixels are treated as connected iff they are neighbors. This example has four connected components, outlined in red, and one hole, outlined in blue-i.e., β 0 = 4 and β 1 = 1. large number of short-lived holes near the diagonal and the long-lived holes further above and to the left-are an accurate formal representation of that complexity 5 .
Persistence diagrams are powerful tools, but topology alone is not a complete tool for characterizing the shape of structures in a magnetogram. Simply identifying the number of high-flux regions, for instance, says nothing about their sizes or proximities. A pair of converging high-flux regions looks the same, from the standpoint of topology until they actually touch 6 -and topology cannot measure quantitative features like the total magnetic flux in such a region. In order to capture these important, and potentially predictive, properties so that they can be used by ML algorithms, we extract geometric information as well by computing the sizes of the high-flux regions and the distances between them, summing up the flux inside them, finding their centroids, and the like. Computational geometry algorithms (Forrest, 1971;Preparata and Shamos, 1985) are widely used for these kinds of calculations across many fields of science and engineering, including astronomy-e.g., shape reconstruction for asteroids (Devogele et al., 2015) and galaxy distribution analysis (Bhavsar and Lauer, 1996).
Our approach to distilling informative, discriminating features for each active region out of solar magnetogram data builds on all of these foundations. Any or all of the topological and geometric properties described in this section might be meaningful predictors of flaring. Moreover, it is not only the topology and geometry of the positive and negative regions of the field that are indicative, but also their topological and geometric relationships to one another; so we also explore composite features, as described in the following section.

Results
As demonstration of the utility of shape-based features in machine-learning methods for solar flare prediction, we chose to work with an Artificial Neural Network or ANN (Haykin, 1998), a machinelearning approach to fitting a repertoire of nonlinear functions to the data. ANNs, also known as multilayer perceptron architectures, are both flexible and powerful; they can generally model morecomplex nonlinear functions than regression networks like SVMs or decision trees. An ANN is built by stacking together layers of nonlinear elements, known as neurons, with weighted interconnections between consecutive layers. The input layer takes the magnetogram features and feeds them into the stack. A layer-by-layer, feed-forward propagation of activations finally results in a prediction at the output layer-in this case, a binary value that classifies the magnetogram as a precursor to a flare (or not), i.e., whether that magnetogram is followed by an eruption above the X-ray class of M-flares within the next 24 hours. This classification of magnetograms is equivalent to the M1.0+/0/24 event designation in Leka et al. (2019a).
The features used in this study were derived from a set of HMI SHARP region magnetograms, described in §3.1. We preprocessed each using the techniques described in the previous section to extract a suite of geometry-and topology-based features. These feature sets, described in §3.2, were tested both as the sole input vectors to a many-layered fully-connected deep neural network system and in combination with the physics-based feature vectors used in current ML models.
The design of a ANN requires setting various parameters, including the number of layers and the number of neurons in the "hidden layers" between the input and the output layers; these details are discussed in Appendix A. The process of training the ANN involves tuning the weights in the layer interconnections so that the ANN generates the correct labels for the data, given values for all of the input features. To implement this, we divided the data into a training set and a testing set; see §3.3. The training data were used by standard optimization algorithms to tune the weights. We repeated this using various combinations of the feature sets and evaluated the resulting prediction accuracy using the True Skill Score on the testing set; results are reported in §3.4.

Data
The Helioseismic and Magnetic Imager (HMI) instrument on the NASA Solar Dynamics Observatory satellite views the entire solar disk at a nominal data cadence of 12 minutes, and has captured every active region on the Earth-facing side of the Sun since its launch in 2010. The HMI dataset contains Space Weather HMI Active Region Patches (SHARPs) that provide cut-out regions around each of these active regions, each with vector magnetic field and other derived quantities calculated for all pixels in the region (Bobra et al., 2014) as it rotates across the Earth-facing hemisphere of the Sun. For this study, we used only B r , the radial field component, of SHARPs images along with the associated metadata taken by the HMI instrument between January 2010 and December 2016. The B r component image data is available in the JSOC hmi.sharp cea 720s dataset, where the magnetic field vector B is remapped to a Lambert Cylindrical Equal-Area (CEA) projection, and decomposed into the components B r , B θ and B φ . This segment of the SHARPs dataset contains about 2.6 million data records, each approximately 2 MB in size, totaling 5 TB of data. These active regions are known to have produced around 1250 M1.0+ flares within 24 hours of the image time-see, for example, Tbl. 1 in (Schrijver, 2016). We downsampled the B r dataset to a one-hour cadence (i.e., taking every 5th magnetogram), then used the the NOAA Geostationary Operational Environment Satellite (GOES) X-ray Spectrometer (XRS) flare catalog 7 to label each one as to whether or not the associated active region produced an M1.0+ flare in the 24 hours following the time of the sample. Next, we discarded all the magnetogram images that contained invalid pixel data (NaN values). The resulting data set included 3691 active regions, of which 141 produced at least one M1.0+ flare as they crossed the Sun's disk and 3550 did not. This corresponded to 438, 539 total magnetograms, of which 5538 and 432821, respectively, were labeled as flaring and non-flaring. A large positive/negative imbalance like this is a major challenge for any machine-learning algorithm, as described further below.

Features
Values for a number of physics-based features, like the total unsigned flux in the active regions and the vertical current helicity, are included in the image metadata for all SHARPs data products; see Tbl. 1 for a full list of these quantities and jsoc.stanford.edu/doc/data/hmi/sharp/sharp.htm or Bobra et al. (2014) for details about these values and the associated calculations. This feature set-the standard in ML-based flare prediction work-is the base case for our comparison experiments.
Our procedure for computing geometry-and topology-based features from each magnetogram was as follows. To remove "topological" noise, we first filtered out pixels whose magnetic flux magnitude was below 200 G, then aggregated the resulting pixels into clusters. We computed the Percentage of pixels with a mean shear angle greater than 45 degrees percent number and area of these clusters, then discarded all clusters whose area was than 10% of the area of the maximum cluster. We performed these operations separately for the positive (> 200 G) and negative (< −200 G) fields. We then computed an interaction factor (IF) between all positive/negative pairs; this quantity is defined a manner similar to the so-called Ising Energy used by Florios et al. where B pos and B neg are the sums of the flux over the respective components and r min is the smallest distance between them. 8 We then chose the pair with the highest IF value and derived a number of secondary features from that "most interacting pair" (MIP). These features are listed in Tbl. 2. For magnetogram images with only one sign of polarity, we assigned all features involving the missing polarity a default value of 0, including those that contain the distance terms between opposite polarities.
To compute topological features from these data, we constructed cubical complexes for each magnetogram across the range of magnetic flux magnitude thresholds between 263.15-5000 G  This range covers all relevant flux levels from quiet Sun network to sunspot umbral cores. We repeated this operation separately for the positive and negative fields, yielding a total of 10 + 10 = 20 cubical complexes. We then used those complexes to construct β 1 persistence diagrams, like that in Fig. 5(d), that capture the threshold values at which each hole in that field is born and dies. Finally, we transformed these 2D diagrams to vectors that can be more-effectively leveraged by machine-learning methods, one vector each for the positive and the negative magnetic flux values.
In the past few years, there has been a growing literature on the problem of featurizing persistence diagrams. There are two classes of approaches: finite-dimensional embeddings or kernelbased methods. Persistence landscapes (Bubenik, 2015), persistence images (Adams et al., 2017) and persistence silhouettes (Chazal et al., 2014) are some examples of the first approach that fit template functions to the diagrams. Kernel-based methods (Bubenik, 2015;Reininghaus et al., 2015;Kusano et al., 2016;Carrière et al., 2017;Le and Yamada, 2018) use generalized scalar products that transform the diagrams implicitly into infinite-dimensional Hilbert Spaces. While these methods are useful in defining meaningful relationships between two persistence diagrams, their inherent compression can lead to a loss of information. To address this, Carrière et al. (2019) propose a layer for neural network architectures that encodes most of these vector representations using a set of generalized point-wise transformation functions. Here, we chose a very simple version of the first class of approaches, counting the number of "live" holes at each value of the threshold flux. Since we have ten cubical complexes for each polarity of the field, this produces two vectors with ten entries each. We concatenate these together to create a single vector of length 20-the topology-based features for our study. Tbl. 3 summarizes the three basic feature sets of interest evaluated in this study.

Feature Set
Number of features SHARPs (baseline) 19 Geometry-based 16 Topology-based 20 Table 3: The three basic feature sets evaluated in this study. The performance of a combination of some of these feature sets is also studied, as described in Section 3.4.

Training
We followed standard procedures to train and evaluate the ANN model, beginning by splitting the dataset into training and testing sets. During the training phase, the output labels produced by the ANN, working from the features derived from each magnetogram in the training set, were compared with the true labels for that magnetogram. The difference was then propagated backwards through the network to update the weights of the interconnections between the layers. Via multiple passes through the entire dataset ("epochs"), the weights were updated until the ANN learned a set of weights that sufficiently fit the data. Once the training phase was completed, the weights of the model were frozen permanently and the ANN could be used to make predictions on the testing data. Details and citations for all of the steps in this process can be found in Appendix A.
The normal approach to splitting the data into training and testing sets-random shuffling-is not appropriate in this application. Instead, we split the magnetogram images dataset into training and testing sets based on their SHARPs IDs. That is, we randomly chose 70% of the SHARPs regions and placed all feature vectors extracted from the associated magnetograms in the training set. The feature vectors extracted from magnetograms from the remaining 30% made up the testing set. This ensured that the features used in evaluating the ANN did not have any similar counterparts that were used by the model during the training phase, thereby avoiding artificial boosting of prediction scores. We repeated this random shuffling procedure with 10 different random seeds to generate statistical results across 10 different training/testing sets.

Evaluation and Discussion
To study the utility of geometry-and topology-based features in solar-flare prediction with the ANN described in the previous sections, we performed a number of experiments. After 50 training epochs in each, we computed the True Skill Statistic (Woodcock, 1976) (also known as the Hanssen-Kuiper skill score) on the testing set, which is widely used in solar-flare prediction studies: where TP and FP (true and false positives) are, respectively, the images that are classified correctly and incorrectly as flaring. Similarly TN and FN are the images classified correctly and incorrectly as non-flaring. Note that −1 ≤ TSS ≤ 1 and TSS = −1 only when TP = TN = 0, so that every prediction is wrong; and TSS = 1 only when FP = FN = 0, so that every prediction is correct. The skill score reflects an accuracy relative to a reference forecast that is designed such that both random forecasts and unskilled forecasts (always predict majority class) have a score of 0. When (a) Geometry Experiments (b) Topology Experiments Fig. 6: Box-and-whisker plots of the TSS scores for different feature sets used across the geometry and topology experiments, reported for ten different training/testing splits generated from the same dataset. The green solid and dotted lines indicate the median and the mean of the TSS scores, respectively, for each plot. In plots (a) and (b), the label "SHARPs Geom" denotes a combined feature set of SHARPs and geometry-based features, while "SHARPs Top" denotes a combined feature set of SHARPs and topology-based features respectively.
(a) Geometry Experiments (b) Topology Experiments Fig. 7: Box-and-whisker plots of the TSS score improvements for the developed feature sets over the baseline (SHARPs-only) feature set, reported for ten different training/testing splits generated from the same dataset. The green solid and dotted lines indicate the median and the mean of the TSS score improvements, respectively, for each plot. In plots (a) and (b), the label "SHARPs Geom" denotes a combined feature set of SHARPs and geometry-based features, while "SHARPs Top" denotes a combined feature set of SHARPs and topology-based features respectively. TSS = 0 the prediction no better than chance in the sense that the "hit rate" is the same as the "false alarm rate." We performed suites of ten different training/testing experiments with different combinations of these three feature sets. The baseline experiment used the corpus of the 19 SHARPs features of Tbl. 1. The mean TSS score for this was 0.68 with the range of [0.57, 0.76]. This is in line with other machine-learning based flare-prediction methods (Leka et al., 2019a), indicating that our neural net is a useful test case for this comparison study.
Training and validating the same ANN with the 16 geometry-based features listed in Tbl. 2 yielded a mean TSS value of 0.67 with a range of [0.54, 0.72], suggesting that these geometry-based features, surprisingly, do only slightly worse than the SHARPs features, which were constructed by solar physics experts to be their best characterizations of an active region for the purpose of eruption prediction. A TSS of 0.67 is also in line with the current ad hoc prediction methods that employ human experts, qualitative classifications, and historical lookup tables (Leka et al., 2019a). It is encouraging that abstract geometric features, captured automatically by an algorithm, allow a machine-learning method to match that score.
To determine how the geometry-based features worked in conjunction with the SHARPs features, we repeated these experiments using both the SHARPs metadata features and the geometry-based features (again on the same data sets), which raised the mean TSS score to 0.72, with a range of [0.60, 0.79]. That is, there is a synergy between these two feature sets: the combination works better than either one alone, indicating that when physical properties of the magnetic field are augmented with geometric properties of the active region, the predictive potential of a given magnetogram increases. The box-and-whisker plots in panel (a) of Fig. 6 provide a graphical comparison of the TSS scores reported in this paragraph and the previous one.
We then trained and tested the ANN using our topology-based features, alone and in combination with the SHARPs features. The results are summarized in Fig. 6(b). Topological features alone yielded a mean TSS of 0.72 with a range of [0.65, 0.79]. This is a clear improvement over the performance of the ANN with only the baseline SHARPs features. In combination with the SHARPs features, the topological features improved the mean TSS score to 0.73 with a range of [0.67, 0.80]. Comparing the two panels of Fig. 6, one can observe that this was also a slight improvement over the SHARPs-geometry combination.
While the above analysis describes the aggregate TSS comparison, it is useful to understand how the developed feature sets (geometry-based, topology-based and their combination feature set counterparts with SHARPs features) perform with respect to the SHARPs-only baseline in each of the ten training/testing splits. The boxplots in Fig. 7 show the improvement in TSS scores for the geometry-based, topology-based and the combination feature sets with respect to the baseline TSS. This improvement is determined by subtracting the SHARPs-only TSS score for each generated training/testing data split from the TSS score of the developed feature set using the same training/testing split. For the developed feature sets, the mean improvement of the TSS over the baseline is in line with the above analysis. The geometry-based mean TSS deteriorates by 0.01 while the SHARPs-geometry combination show a mean TSS improvement of 0.04. The topologybased features and the SHARPs-topology combination show a mean TSS improvement of 0.03 and 0.05 respectively.
The results of these experiments confirm our intuition that abstract measures of the shapes of magnetogram structures can be indicative of eruption potential. This is additionally satisfying since the historical method of assessing the eruptive potential of an active region relies on human forecasters essentially analyzing "complexity of shape" in a qualitative manner. Our results demonstrate that computational geometry and computational topology have captured this in a quantitative and repeatable algorithm. This not only points the way forward to a more robust set of features for machine-learning base eruption prediction architectures. It may even lead to new and more-effective way to classify sunspot active regions.
In this section, we have relied on TSS as a sole metric for comparing the performance of the model for different feature sets. An extension of this analysis to other metrics is described in Appendix B. The scores for these alternative metrics, presented in Tbl. B.1 and Tbl. B.2, indicate that their trends across the different feature sets are similar to the TSS trends discussed above. Additionally, Appendix C describes the predictive power of individual features from the different feature sets using the Fisher score -a univariate feature ranking method. As shown in Figures C.1(a) and Figures C.1(b), 9 of the geometry-based features make it to the top 15 highest ranking features when ranking SHARPs and geometry-based features, whereas 6 of the top 15 features belong to the topology feature set in the SHARPs-topology ranking experiment.

Conclusions
We have shown that abstract spatial properties of magnetograms can be useful in machine-learning methods for solar-flare prediction. We extract these properties from magnetograms using computational geometry and computational topology techniques, producing feature vectors that compete well with the traditional physics-based SHARPs features for the purposes of machine learning methods for flare prediction. A neural net trained on a large corpus of active regions from HMI, preprocessed using these techniques, classifies magnetograms as flare precursors with a slightly higher True Skill Statistic score than the same model using traditional SHARPs features-the educated assessment by human experts as to the best information with which to characterize an active region. Combining shape-and physics-based features further improved the TSS scores, indicating a synergy between the two types of information.
The power of these highly abstract classifications of structure may be surprising, particularly in view of the fact that magnetograms are just the boundary conditions of the full fields whose complicated dynamics are what produce eruptions. Even so, active region shape has fundamental, meaningful connections to the physics leading up to an eruption. As an active region emerges, becoming progressively larger and more complex, the shapes of the opposite polarity structures on a magnetogram capture the evolution of the photospheric field. Flares occur when that field forces a rearrangement of the fields in the corona, where magnetic reconnection occurs. Shape is what human forecasters use in their classifications, and topology can codify structural complexity in a formal and yet practical manner that makes it ideally suited for capturing this richness. In current operational practice, an expert examines sunspot images and/or magnetograms, classifies a sunspot active region according to taxonomies developed empirically, and then uses look-up tables of historical probabilities to say whether or not it will erupt within a future time period. In a sense, our approach systematizes this "human-in-the-loop" forecasting approach by applying the mathematical concepts of topology to address the shape and connectivity of sets derived from the structure of the photospheric magnetic field.
In a future paper we plan to validate our results using a number of variants of our neural network architecture. In addition, our results can be given a firmer statistical basis by using more training and testing partitions of the HMI data, different initial conditions for the network weights, and different numbers of training epochs. Finally, neural networks are only one type of machine-learning model: we will obviously need to compare with other methods, like SVMs, decision trees, and convolutional neural networks (CNNs), before making broader claims about the general utility of shape-based features in machine learning.
There are also a number of data issues that require more attention. Firstly, in our current approach, we simply discard the magnetograms containing invalid pixel data. Addressing such pixel-level anomalies in the data-e.g., by smoothing-would be a good next step here. This will help prevent loss of any potentially interesting samples and allow us to work with larger datasets. It will also be important to address the appropriate choice of a temporal resolution: is the selection of every fifth sample (i.e., hourly resolution) optimal? This could be carried out in a purely mathematical fashion, via expert analysis of persistence diagrams, but it would also be useful to evaluate the effects of different temporal cadences on the TSS of the neural-net model.
Another important issue is to address is temporal evolution: all current ML-based flare prediction methods, including those reported in this paper, use only single snapshots of the field. The dynamics of the structure of an active region-the progression through time of the shapes and their relationships-could be captured using topological approaches that track those structures through time and space, such as the CROCKER plots of Topaz et al. (2015).
Last but not least: the geometric and topological feature sets used here are only a first cut; extending them to capture different aspects of the shape of an active region could improve the results. For example, computational geometry can extract the curvatures of the boundaries of the different polarity regions. In terms of topology, the possibilities include higher-dimensional Betti numbers, alternative complex constructions, and different ways of featurizing persistence diagrams. These could give us new tools to address the structural complexity of active regions that are potentially evolving towards an eruption.

Appendix A: Artificial Neural Network: Design and Implementation
To evaluate the different feature sets, we designed a standard feedforward neural network using Keras (Chollet et al., 2015) with six densely connected layers. The input layer size is variable depending on the size of the feature set; the output layer contains two neurons corresponding to the two classes-flaring and non-flaring. The four intermediate layers contain 12, 24, 16 and 8 neurons respectively, when counting from the direction of the input to the output layer. To prevent overfitting, a Ridge Regression regularization with a penalty factor of 0.01 is used at each layer that limits the L 2 sum of all the weights. For the study reported here, we designed the neural network in a trial-and-error process that balanced complexity of representation against training and testing time. In future work, we play to formalize this approach of fine-tuning the model hyperparameters (learning rates, regularization parameters, loss weights, etc.) using a separate validation and a testing set (as opposed to only using a testing set). The validation set can be used for determining the optimal hyperparameter values, with the testing set used for a final evaluation of the model.
The layout of the network is shown in the figure below; the network parameters are summarized in Tbl. A.1. We used the standard back-propagation algorithm built into Keras to train the weights of the ANN, which are initialized using the Glorot uniform initializer (Glorot and Bengio, 2010) at the beginning of the training process. Since our problem is a two category classification one, we chose a weighted binary cross-entropy function to measure the training loss between the model output and the true target that is propagated backwards to update the network weights. Over a training instance for N data samples over C classes (here C = 2), this loss function is given by where p(y ic ) represents the target prediction probability for class c of the i th data point, whose real target value is y ic . The loss function class weights w c in this formula determine the penalty of incorrectly predicting the probability for the associated classes c. As demonstrated in Section 3.1, our  dataset is highly unbalanced, with the non-flaring class over-represented by two orders of magnitude. By choosing w no− f lare : w f lare = 1 : 100, we over-penalize the minority class (flaring magnetograms) to offset the effect of its size. This is one of the many ways to mitigate imbalance in training sets, and has been used in the flare-prediction literature (Bobra and Couvidat, 2015;Nishizuka et al., 2018). Finally, we use the Adagrad optimizer (Duchi et al., 2011) to perform the weight update during the backpropagation phase. An extension of the Stochastic Gradient Descent algorithm, the Adagrad optimizer adapts the learning rate to the individual parameters, so that more sensitive parameters which significantly affect the output are assigned a lower learning rate, whereas less sensitive parameters are updated with a greater learning rate. (Our initial choice was the Adam optimizer (Kingma and Ba, 2014), but we observed jitters in the prediction scores after convergence when using it.) We used the default parameters for the PyTorch implementation with the learning rate multiplier set to 5 × 10 −4 . All experiments were run on a NVIDIA TITAN V graphics processing unit (NVIDIA Driver Version 440.31, CUDA Version 10.2). Across all feature sets, the training time per epoch was approximately 22s. For a total of 50 epochs, the total training time then amounted to approximately 19 minutes. The average validation time across all the feature sets was approximately 13s.

Appendix B: Alternative Evaluation Metrics
Here, we report the performance of the various feature sets using some other standard metrics. Comparing the forecast against the actual event, we populate the contingency table values: true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN). Using these four classes, we compute the five additional metrics for each experiment: Accuracy, Precision, Recall, the F-1 Score (F1) and the Heidke Skill Score (HSS): For a detailed explanation of these metrics, refer to Bobra and Couvidat (2015) (note that the HSS 2 variant of the Heidke Skill Score from Bobra and Couvidat (2015) is used here). The mean metric scores and the mean score improvements along with the standard deviations over the baseline the geometry and topology experiments are summarized in Tbl. B.1 and Tbl. B.2 respectively. The trends for these metrics match the trends in the TSS scores described in Section 3.

Appendix C: Feature Ranking
Evaluating the model performance scores using different feature sets is one way of determining how effective the engineered features are in providing an accurate model prediction. An alternative evaluation strategy is to compute the Fisher score (Gu et al., 2011) individually for each feature, as done in Bobra and Couvidat (2015). A method of univariate feature ranking, the Fisher score (or the F-score) determines the ability of the feature to separate the distributions of the classes in the dataset. For a dataset with two labels, the F-score of a feature i, as defined in Bobra and Couvidat (2015), is given by wherex i ,x + i andx − i represent the average feature value over samples of the full dataset: n + samples belonging to the positive class and n − samples belonging to the negative class respectively. The numerator in the above equation represents the inter-class distance or separability, while the denominator computes the intra-class variance. Thus, a feature with a smaller spread within each of the two classes and a higher separation between the two class means would generate a higher F-score.
For all the features in the SHARPs-plus-geometry and SHARPs-plus-topology combination feature sets from the geometry and the topology experiments, we compute the normalized F-score scaled with respect to the highest scoring feature. We use the f classif functionality in the Python Scikit-Learn package (Pedregosa et al., 2011) to compute the F-score, and report the top 15 ranking features. The results are shown in Figure  Feature ranking, as determined via the Fisher score method, for the experiments reported in this paper. The x-axis is the normalized Fisher score and the y-axis is the feature rank. Red identifies features based on topology and geometry; blue indicates SHARPs features. The features are plotted in descending order of the F-Score value on the x-axis and ascending order of the rank on the y-axis (both axes are inverted). The topological features described by the formats pos intvl n and neg intvl n represent the number of holes for the n th magnetic flux threshold for the positive and negative polarities respectively.
ing features-total photospheric energy density (TOTPOT), total unsigned flux (USFLUX) and the total area of the active pixels (AREA ACR)-belong to the SHARPs feature set. In the geometrybased experiments ( Figure C.1(a)), all but one of the top 15 properties belong to the geometry feature set. The higher-ranking of these pertain to the magnetic flux: the largest area (max pos flux, max neg flux) and the polarities belonging to the MIP (IF pos flux, IF neg flux), followed by the areas of the respective polarities (geometry features with the " area" suffix). In the topologybased experiments ( Figure C.1(b)), the feature ranking is more mixed. Six of the top 15 features belong to the topology feature set: the number of live holes at the magnetic flux values ±263.16 G (pos intvl 1, neg intvl 1), ±789.47 G (pos intvl 2, neg intvl 2) and ±1315.79 G (pos intvl 3, neg intvl 3). In terms of univariate feature ranking, the geometry (nine of the top 15 features) and topology features (six of the top 15) perform well individually when compared with SHARPs features. These scores demonstrate the predictive power of individual features in terms of their ability to discriminate the flaring and non-flaring datasets. However, this cannot be directly correlated with the TSS scores of the different feature sets described above. The Fisher score is a univariate feature ranking method and does not take into account the correlation between the different features. We leave this investigation of the correlation within and between the various feature sets as future work.