Solar flare forecasting using morphological properties of sunspot groups

We describe a new tool developed for solar flare forecasting on the base of some sunspot group properties. Assuming that the flare frequency follows the Poisson statistics, this tool uses a database containing the morphological characteristics of the suspot groups daily observed by the Equatorial Spar of INAF $-$ Catania Astrophysical Observatory since January 2002 up today. By means of a linear combination of the flare rates computed on the base of some properties of the sunspot groups, like area, number of pores and sunspots, Zurich class, relative importance between leading spot and density of the sunspot population, and type of penumbra of the main sunspot, we determine the probability percentages that a flare of a particular energy range may occur. Comparing our forecasts with the flares registered by GOES satellites in the 1$-$8 \AA{} X$-$ray band during the subsequent 24 hrs we measured the performance of our method. We found that this method, which combines some morphological parameters and a statistical technique, has the best performances for the strongest events, which are more interesting for their implications in the Earth environment.


Introduction
The solar eruptions are the main manifestation of the magnetic activity of the Sun in the heliosphere with also an impact on Earth environment. The effects of these eruptions on the Earth and anthropological activities involve more and more economical interests, considering that the technology pervades many aspects of our life. Therefore, many efforts of the scientific community have been addressed to forecast solar eruptions with enough advance to prevent damages to technological systems.
Solar eruptions involve the whole solar atmosphere and are usually accompanied by sudden and intense variation in brightness, which are named flares. They are the manifestation of the release of free magnetic energy (see Benz, 2017 for more details). Although their origin is usually located in the higher layers of the solar atmosphere, the magnetic field configuration suitable for the occurrence of these events is mostly driven by the photospheric evolution of the active regions (ARs). The emergence of new magnetic flux from the convection zone into the solar atmosphere and the rearrangement of the coronal field due to the horizontal photospheric displacements of the field line footpoints are two fundamental mechanisms which determine the AR configuration suitable for the flare occurrence (e.g., Romano & Zuccarello, 2007;Romano et al., 2015Romano et al., , 2018. In particular, these mechanisms have greater effects when they occur near the magnetic polarity inversion line (PIL).
Many approaches to flare forecasting have been produced in last decades (see Barnes et al., 2016 for an overview, see Florios et al., 2018;Kontogiannis et al., 2018;Kim et al., 2019;Korsós et al., 2019 for more recent results). Although the configuration of the magnetic field in corona is fundamental for the trigger of flares, due to the difficulties to measure directly the magnetic field in the upper layers of the solar atmosphere, many of the flare forecasting methods are based on photospheric observations of the ARs.
Using three photospheric magnetic parameters, i.e., the total unsigned magnetic flux, the length of the strong-gradient magnetic polarity inversion line, and the total magnetic energy dissipation, Yuan et al. (2010) developed a method which obtained some of the best prediction results for the X-GOES class flares. This result confirms that the photospheric configuration of an AR provides already a good indication for the occurrence of the extreme events, which are the most interesting for their impact to our life. Another method to forecast strong flares considers the free magnetic energy obtained from the gradient of the neutral lines and the total unsigned flux measured at photospheric level (Falconer et al., 2008). In fact, it has been confirmed that the magnetic flux close to the PIL is a good proxy for several methods (see Leka & Barnes, 2003;Schrijver, 2007). More recently, another forecasting method was proposed by Korsós et al. (2015). According to this method, the pre-flare behaviour of spot groups is investigated by the introduction of the weighted horizontal magnetic gradient (WG M ). This proxy enables the potential to forecast flares on the base of the magnetic gradient among all spots within an appropriately defined region close to the PIL (see also Korsós & Erdélyi, 2016). The WG M allows to forecast the flare onset time and whether a flare is followed by another event within about 18 h, with a good reliability.
The photospheric measures of the magnetic field can be also used to produce a model of the coronal field and determine the connections between each pair of magnetic sources (Barnes et al., 2005). A successful prediction method based on the coronal magnetic connectivity has been developed by Georgoulis & Rust (2007). The computation of a parameter named effective connected magnetic field strength which is derived from a connectivity matrix describes the magnetic flux distribution of an AR and provide a probability of flare occurrence. Also this method seems efficient for major solar flares.
Beside of the above mentioned methods there are also statistical methods which obtained good results: for example a method based on the flaring history of the observed ARs developed by Wheatland (2004). This method assumes that the events obey a power-low frequency-size distribution and occur on short timescales following a Poisson process with a constant mean rate. This method appeared to be more accurate at predicting X events, which are more important contributors to space weather, than M events. Another forecasting method, performed on statistics base, has been proposed by Bloomfield et al. (2012). It uses the McIntosh group classification of sunspot groups (SGs) observed in photosphere and assumes that flares are Poisson-distributed processes. This method showed that Poisson probabilities perform comparably to other more complex prediction methods.
A completely different approach to flare forecasting belongs to the Machine Learning framework. In this case, the properties utilized for prediction (named features) are automatically extracted from observations and they are ranked according to their correlation with the event occurrence. Recent examples of machine learning results have been obtained by Florios et al. (2018), Inceoglu et al. (2018), Kim et al. (2019), and Camporeale (2019).
Recently, a very useful and complete comparison among some of the above mentioned flare forecasting methods has been performed by Barnes et al. (2016), reaching the conclusion that it may be possible to obtain the best prediction by combining a method which characterizes an AR by one or more parameters and uses a statistical technique. For this reason, in this paper we report the results obtained by the development of a new method which is based mainly on the Zurich classification of the SGs observed at photospheric level and assuming the Poisson statistics for the flare occurrence. We provide an estimation of the capability to host flares of a specified energy range for an AR characterized by a particular configuration, size and fragmentation (Contarino et al., 2009). The data used to reach this goal are described in Section 2. The method is explained in Section 3. Section 4 reports the validation process used to estimate the reliability of our forecastings. Section 5 contains some conclusions.

Data description
We used data collected daily by the Equatorial Spar of INAF-Catania Astrophysical Observatory (INAF-OACt) from January 2002 up today. When the weather conditions permit, this telescope allows the observation of the photosphere in White Light (WL) by a Cooke refractor (150 mm/2230 mm) which projects a 24.5 cm diameter image of the Sun with a spatial resolution of about 1-2 arcsec depending on the seeing conditions. Every day, with a cadence of 24 h, each SG visible on the disk is cataloged registering some information about the heliographic latitude and longitude of its baricenter, the number of sunspots and pores in the group, SS, the projected area in tens M. Falco et al.: J. Space Weather Space Clim. 2019, 9, A22 of millionths of the solar hemisphere, AA, the type of penumbra of the main sunspot of the group, t 1 , the relative importance between the leading spot and density of the sunspot population, t 2 (see Table 2 of Ternullo et al., 2006 for the details about this parameter) and the group type according to Zurich classification, t 3 (see Zirin, 1998).
We considered 8598 SGs. While t 1 , t 2 , and t 3 assume predefined values on the base of the classification of the SG morphology (see Ternullo et al., 2006 for the details), SS and AA can assume a wide range of values on the base of the size and fragmentation of the groups. Their distribution in terms of the number of sunspots and pores is reported in the left panel of Figure 1. SS ranges from 1 (single sunspot or single pore) to 96 and we note a monotonic decrease of the number of SGs as the number of sunspots and pores increase. The distribution of the SGs on the base of their projected area is shown in the right panel of Figure 1. We can see that also the number of SGs is inversely proportional to their area, although we do not observe the same smoothed behavior of SS distribution (compare the zoom plots reported inside the main plots of the left and right panels of Fig. 1). The largest group observed during the considered time interval had an extension of 267 tens of millionths of the solar hemisphere. Table 1 contains the number of SGs for each morphological characteristic. We note that in our sample more than 43% of the SGs (3711) were characterized by t 1 = 2, i.e. by a symmetric main sunspot with a diameter less than 2.5°. About 1500 SGs were formed by only pores (t 1 = 0), while the number of groups characterized by other types of penumbra of the main sunspot were of the same order of magnitude. For the parameter t 2 three are the main configurations: t 2 = 0, corresponding to SGs formed only by pores or single sunspots, t 2 = 1, corresponding to SGs formed by a leading largest sunspot and following smaller pores, and t 2 = 4, corresponding to SGs formed by a leading largest sunspot and following smaller sunspots. t 3 is based on the Zurich classification and shows that SGs of C and D type are the most diffused. Instead, SGs of F and G type, corresponding to the more extended and complex SGs are less diffused, although they are usually the preferential sites for the more energetic flares.
We also used the flare records obtained by the GOES satellites, from January 2002 up today, and collected in the Space Weather Prediction Center Reports (https://www.swpc.noaa. gov/products/solar-and-geophysical-event-reports) to evaluate the performance of our forecasting method. In particular, we considered start time and class for each flare of C1.0 and greater observed by GOES. We neglected B-class flares.

INAF-OACt forecasting method
Our method to investigate the SG capability of hosting flares during the AR evolution assumes that the flare event frequency follows the Poisson statistics, as in Bloomfield et al. (2012). We compute the flare rates, FRs, of SGs characterized by specific values of the above mentioned five parameters: SS, AA, t 1 , t 2 and t 3 . While t 1 , t 2 , and t 3 assume predefined values corresponding to different classes (according to the Zurich classification), SS and AA cover a wide range of values. Therefore, taking into account the distributions of the observed SGs as function of SS and AA (right and left panels of Fig. 1), we divided the whole dataset into statistically significant and homogenous subsets (classes), considering 23 and 31 subsets for SS and AA values, respectively.
For each specific value, x, of each generic parameter, k, we compute the FR, by the ratio between the number of SGs which hosted at least a flare, Nf(x k ), and the total number of SGs characterized by that value of the parameter N(x k ): Then we compute the average among the flare rates for all parameters: providing an estimation of the capability of hosting flares in each SG characterized by a particular configuration, size and fragmentation. FR is calculated for three different ranges of flare energy: C1.0 GOES class and greater (C1.0+), M1.0 class and greater (M1.0+) and X1.0 class and greater (X1.0+).
Finally, following the Poisson statistics, the probability that a SGs host a flare is obtained by: These probabilities are daily published at http://ssa.oact.inaf. it/oact/Flare_forecasting.php, distinguishing for each SG visible on the solar disc and for each flare energy range (C1.0+, M1.0+ and X1.0+). We assume that the forecastings are valid for the subsequent 24 h following the observation time, when approximately new observations and new measurements of the parameters are available. A screenshot of the web page containing the flare probabilities computed on September 6, 2017, is shown in Figure 2. As we can see, for each AR the NOAA number and the Catania Sunspot Group Number are indicted to identify the region where the flare may occur.

An example of flare forecasting: AR NOAA 12673
In order to illustrate the application of our method for a specific case, we consider the forecasting for the flare activity of the super active region (SAR) NOAA 12673 that appeared on the Sun during the first two weeks of September 2017 and unleashed more than 40 C-, about 20 M-, and 4 X-GOES class flares during its passage over the solar disc. In particular it hosted the strongest flare of the cycle 24 on September 6, 2017 (Romano et al., 2018. The SAR complexity is visible in Figure 3, where we show the continuum filtergram taken by HMI/SDO at 617.3 nm. The SAR NOAA 12673, as observed by INAF-OACt on September 6 at 6:50 UT, was formed by many pores and sunspots (SS = 45), it was one of the largest SG of the solar cycle 24 (AA = 91), its main sunspot was characterized by an asymmetric penumbra with a diameter greater then 2.5°(t 1 = 5), the following spot was largest and the sunspot population density intermediate (t 2 = 5), its configuration belong to the D class of the Zurich classification. On the base of these morphological characteristics we computed the FR for each parameter and for each energy range, using equation (1) (see Table 2).
Then we computed their average by equation (2)  These high percentages have been confirmed during the 24 h subsequent to our forecast by 2 C-, 2 M-and 2 X-class flares, as reported in Table 3.

Performance measures
On the base of the flare records obtained by the GOES satellites and collected in the Space Weather Prediction Center Reports we evaluated the performance of our forecasting  Table 4 we summarize the statistics of the predictions considering data from 2002 until 2017. We considered 8598 SGs. Among them the SGs hosting C1.0+, M1.0+ and X1.0+ class flares were 1841, 347 and 47, respectively. We constructed the probability distribution for forecasts, f, and observations, o. In Table 4 we report the properties of these distributions. hp f i and hoi are the average of the forecast and observation probabilities over all observed SGs. We obtained the same values between them in the three energy ranges due to peculiarity of the method which is based effectively on the observed flare rates. However, the trend of hp f i and hoi over the energy ranges depends on the smaller number of events of higher energy.
We also calculated the median for p f and the standard deviation for p f and o. With hp f |o = 1i and hp f |o = 0i we denote the average of the forecast probabilities over all observed SGs where at least one flare occurred and not, respectively. Over the same subsets the medians have been computed. Both the averages, with the corresponding standard deviations, and the medians show significant differences in the probabilities obtained for flaring and not flaring SGs.
The linear association describing the correlation between p f and o provides already an indication of the reliability of our method for the three energy ranges (see Wheatland, 2004 for more details).
To evaluate in detail our method, taking into account that it provides probabilistic forecasts, it has been necessary to determine a threshold probability to build a contingency table where any forecast probability over the threshold was considered to be a forecast for an event, and anything less was considered to be a forecast for non-event. The determination of the threshold, as we see in the following, depends on the adopted method of evaluation of the performance.
In fact, in a binary forecast method the contingency table is easily built: "TP" is the number of true positives events, "TN" is the number of true negative events, "FP" is the number of false positive events and "FN" is the number of false negative events. In that case the Rate Correct provides a first estimation of the forecast performance: A for ¼ TPþTN N , where N is the total number of forecasts (N = TP + FP + FN + TN). For probabilistic forecasts, as in our case, TP, FP, FN, and TN depend on criteria to determine the threshold.
Usually, to avoid that the measure of accuracy can be misleading for very unbalanced event/no event ratio, it is necessary to normalize the performance to a reference forecast by using a so named skill score:  where A per and A ref are the accuracy of the perfect forecast and the reference method, respectively. We measured the accuracy of our probabilistic forecasts by two different skill scores (see Barnes et al., 2016

for details): the Applemans Skill Score (ApSS) and the Hanssen & Kuipers Discriminant (H&KSS).
According with the ApSS, the three accuracies used to calculate the skill score are the following: According with the H&KSS: We determined the threshold probabilities for each flare class and for each skill score and generated the binary categorical classifications, considering the values that maximize the two skill scores. These thresholds are reported in parenthesis in Table 5. In particular for the H&KSS we determined threshold probabilities from the so called Receiver Operating Characteristic (ROC) curves. The first term of A ref represents the probability of detection (POD), while the second term represents the false alarm rate (FAR). The ROC curves show POD as function of FAR by varying the threshold above which a SG is predicted to produce a flare. We built a ROC curve for each flare energy range and we determined the threshold probabilities which maximize the H&KSS, considering the points of the curves that are more distant from the dotted diagonal line (see the left panels of Fig. 4). These thresholds are 0.25, 0.12 and 0.01 for C1.0+, M1.0+ and X1.0+, respectively.
Due to the fact that our flare forecasting method predicts the probability of a flare of a given class range rather than a binary, categorical forecast, a more appropriate measurement of accuracy is the mean square error (MSE): where p f is the forecast probability, and o is the observed outcome (o = 0 for no event, o = 1 for an event). In this case A per = 0.
Using the climatological event rate as a reference forecast and its corresponding accuracy: we computed the so called Barier Skill Score (BSS): Therefore, we remark that BSS = 0 corresponds to perfect performance.
In Figure 4 we report the reliability plots, which compare the predicted probabilities for all days sorted into bins of 0.05 width and the observed event rates. The associated uncertainties are calculated using the Bayesian estimation assuming binomial statistics (see Jaynes, 2003 for more details). It is noteworthy that the increasing size of the error bars with the increase of the predicted probabilities and the flare energy ranges reflects the decreasing sample size of the events. These plots allow us to understand the performance of the method for different flare probabilities. In fact, they show a slight tendency of over-prediction (point lying below x = y) for the smaller probabilities and a tendency of under-prediction (point lying over x = y) for the higher probabilities. We also note that there are few number of points in the reliability plot for X1.0+ flares due to the small number of event in the corresponding sample (see the fourth column of Table 4).
All the results of the performances about our method are reported in Table 5. We see that all scores show a performance of our method better than the reference forecasts, although different skill scores emphasize different aspects of our performance. We note that H&KSS and BSS are characterized by better values for higher flare class ranges (we remember that perfect performances should correspond to 1 and 0 for H&KSS and BSS, respectively). In particular, H&KSS varies from 0.45 for C1.0+ flares to 0.70 for X1.0+, while the threshold probabilities decreases from 0.25 to 0.01. This means that our method seems to get the best results for the more energetic events. However, we cannot neglect that ApSS shows an opposite trend for the C1.0+ and M1.0+ classes. But, according to the results of Barnes et al. (2016), we see that the maximum forecast probability decreases with increasing event energy. This also explains the small values of ApSS for larger events due to the small sample of events to be considered a predicted event in this categorical forecast.

Discussion and conclusions
The flare forecasting method, applied to the daily INAF-OACt observations of the photosphere, provides an indication of the probabilities that each SG visible on the solar disc may host solar flares of C1.0+, M1.0+ and X1.0+ class. Taking into account the conclusions of Barnes et al. (2016), this method has the advantage to combine the characterization of each AR by some parameters and the application of a statistical analysis. The method requires only a daily observation of the solar disc at photospheric level, therefore, it is possible to collect all the information useful to compute the flare probabilities only exploiting a short temporal window of good weather. Moreover, the simple procedure to determine the parameter values from the photosphere observation and compute the flare rates allows to obtain the forecasts in few minutes after the observations. At the moment, the values of the parameters used as inputs for the method are obtained by eye. Once these values are determined, an automatized procedure consults the database and provides the flare probabilities for each AR visible on the solar disc. The probabilities are daily available at the web page of the INAF-OACT web site: http://ssa.oact.inaf.it/oact/Flare_ forecasting.php.
The performance measures of our method show that we obtain the best results and more accurate forecasts for the stronger events. We do not achieve particularly high skill scores, suggesting that there is considerable room for improvement our method. The BSS, which is suitable for probabilistic forecasts, is equal to 0.04 for X1.0+ flares. This means that the photospheric configuration of a SG is already a valid indication of the possible occurrence of a strong flare, independently from its observations in corona. This is confirmed by H&KSS = 0.70 for X1.0+ flares with a threshold value of 0.01. These results support our method for its Space Weather purposes, in fact, the prediction of extreme events is particularly important to prevent serious damages to technological systems which pervade our life. Actually, ApSS showed for C1.0+ and M1.0+ flares an opposite trend in comparison to the other skill scores, with a better performance for the sample containing also the weaker events. However, we impute this apparent conflicting results to the sensibility of ApSS to the sample size (Barnes et al., 2016). Also for this reason it was not possible to measure ApSS for the X1.0+ flares. However, taking into account that our method determines the probabilities of a flare rather than a binary, categorical forecast, we are confident that the most accurate measures are those obtained by the BSS, which shows values closer to zero (perfect performance) as the energy of the predicted events increases.
The good results obtained by our method can be also confirmed by its comparison with other forecasting methods. One of the most promising method has been developed by Korsós et al. (2015). It is based on the weighted horizontal magnetic gradient (WG M ), defined between opposite polarity umbrae at the polarity inversion line of ARs. Studying the evolution of the unsigned magnetic flux, the distance between the area-weighted barycenters of opposite polarities and the WG M , they are able to predict the flare onset time and assess whether a flare is followed by another event within about 18 h (see Korsós et al., 2015; 2019 for more details). We remark that their method is not based on a statistical base. Among the ARs used for the application of their method we can consider the AR NOAA 11504, which has been observed also by INAF-OACt. A M1.2 flare and a M1.9 flare occurred in this AR on June 13, 2012 at 13:17 UT and on June 14, 2012 at 14:35 UT, respectively. Korsós et al. (2019) measured the relative gradient of the rising phase of the WG M and the relative gradient of the distance parameter of the converging motion for this AR. They obtained a WG M maximum value, WG max M = 0.55 Â 10 6 Wb/ m, which was followed by a less step decrease phase ended with the M1.2 flare (WG flare M = 0.35 Â 10 6 Wb/m) and the M1.9 flare (WG flare M = 0.23 Â 10 6 Wb/m). They also calculated the percentage differences (WG % M ) for the first and second flare which were 34% and 61%, respectively. This values are in accordance with Korsós et al. (2015): in fact, if the WG % M is less then 42% more flares are expected to follow, otherwise other flares are not expected. Actually, the AR NOAA 11504 showed another event after the first flare of M1.2 class and no other significant event after the second flare of M1.9 class.
The AR NOAA 11504 was observed by INAF-OACt on June 13, 2012 at 07:18. It was formed by many pores and sunspots (SS = 31) with a large projected area AA = 54. The main sunspot was characterized by a symmetric penumbra with a diameter grater then 2.5 (t 1 = 4), the leading spot was largest and the sunspot population density intermediate (t 2 = 4) and its configuration belong to the E class of the Zurich classification. On the base of the morphological characteristics the AR presented a probability of 13% for the occurrence of M1.0+ flares. This percentage was widely higher than the threshold that maximizes the H&KSS for M1.0+ flares. The day after, on June 14, 2012 at 08:00 the morphological characteristics were the same: t 1 = 4, t 2 = 4 and t 3 = 5, but the number of sunspots and pores (SS = 46) and the projected area (AA = 72) increased. The corresponding probability for M1.0+ flares was higher than before (32%).
This comparison allows us to highlight some advantages of our method. For example, it is noteworthy that the percentages obtained by the observed parameters do not depend on the AR evolution and can be determined using a single observation of the photosphere. Although the method of Korsós et al. (2015) is more sophisticated, it requires a continuity in the target observations that is not always guaranteed, especially by the ground based telescopes. Moreover, it seems that the higher probabilities of the INAF-OACt method correspond to higher intensity of the occurred flares. Therefore, we can conclude that, as for the WG M method, the INAF-OACt method could be able to estimate the likelihood of a subsequent flare of the same or larger energy.
However, we think that the limits of this method at different energy ranges may be ascribed to the indirect determination of the magnetic field configuration of the SGs. In fact, up today in our method the flare rates are determined on the base of the SGs characteristics apart from the polarities of pores and sunspots. Therefore, a possible improvement of the present version of the INAF-OACt method could be the implementation of the flare rates using also the McIntosh group classification and other useful magnetic field characterization parameters.
A further margin of improvement of our forecasting method resides in the opportunity to weight the relative importance of each parameter. In fact, in equation (2) we compute the average among the flare rates. Instead, we think that the results obtained by a machine learning approach may provide a rating of each parameter and, consequently, may suggest a different multiplicative coefficient for each term of the average of equation (2) in order to get better performance.
It is noteworthy that we can improve our methods using the digital images acquired in the continuum at 656.78 nm and applying an automatic procedure to extract the considered parameters from those full disc images. In this way, we could be able to provide an upgrade of our forecasting in near real time according to the acquisition rate of the digital images (to date about 1 h). Nevertheless, taking into account the usual dynamics and evolution of the ARs at photospheric level, we are confident that the main characteristics of the sunspot groups and, consequently, the corresponding flare rates do not change significantly in 24 h.
Finally, we also plan to strength the statistics by the implementation of the database with data taken before January 2002. In fact, INAF-OACt has an archive of full disc drawings of the photosphere, containing the useful information for the SG characterization, since the beginning of the 20th century, and the GOES X-ray flux measurements at 1-8 Å are available since 1986. We believe that the update of our databse will be a further opportunity to be more confident especially for the statistics of the strongest events, which are more interesting for their contribution to space weather conditions.