Supervised classification of solar features using prior information

Context: The Sun as seen by Extreme Ultraviolet (EUV) telescopes exhibits a variety of large-scale structures. Of particular interest for space-weather applications is the extraction of active regions (AR) and coronal holes (CH). The next generation of GOESR satellites will provide continuous monitoring of the solar corona in six EUV bandpasses that are similar to the ones provided by the SDO-AIA EUV telescope since May 2010. Supervised segmentations of EUV images that are consistent with manual segmentations by for example space-weather forecasters help in extracting useful information from the raw data. Aims: We present a supervised segmentation method that is based on the Maximum A Posteriori rule. Our method allows integrating both manually segmented images as well as other type of information. It is applied on SDO-AIA images to segment them into AR, CH, and the remaining Quiet Sun (QS) part. Methods: A Bayesian classifier is applied on training masks provided by the user. The noise structure in EUV images is nontrivial, and this suggests the use of a non-parametric kernel density estimator to fit the intensity distribution within each class. Under the Naive Bayes assumption we can add information such as latitude distribution and total coverage of each class in a consistent manner. Those information can be prescribed by an expert or estimated with an Expectation-Maximization algorithm. Results: The segmentation masks are in line with the training masks given as input and show consistency over time. Introduction of additional information besides pixel intensity improves upon the quality of the final segmentation. Conclusions: Such a tool can aid in building automated segmentations that are consistent with some ground truth’ defined by the users.


Introduction
Solar images exhibit a variety of large-scale structures with potential space-weather effects. The most prominent examples are coronal holes (CH), where the high-speed solar wind departs (Krieger et al. 1973;Verbanac et al. 2011) and active regions (AR), which have the potential to produce flares and to be associated with coronal mass ejections. An accurate monitoring of AR, Quiet Sun (QS), and CH can also serve as input into solar EUV irradiance reconstruction models (Haberreiter et al. 2014).
Synoptic maps showing the various characteristics of the solar surface are produced manually on a daily basis by the forecasters at the NOAA Space Weather Prediction Center 1 . Such a partition constitutes a form of segmentation mask, where each pixel spatial coordinate is associated with some label' corresponding to a given feature.
With the ever-growing volume of data available (e.g. Solar Dynamics Observatory delivering 1TB of data per day since May 2010), automated feature detection and identification methods have rapidly developed in recent years. Segmentation procedures can be divided into two categories, depending on the type of prior knowledge used to train the process. Unsupervised methods automatically select the partitioning criteria, whereas in supervised segmentation, direct user guidance is required by means of a training set.
Unsupervised segmentation procedures are classically divided into region-based, edge-based, and hybrid methods. The first category encompasses histogram-based segmentation, where pixels are classified according to their intensity (Pettauer & Brandt 1997;Steinegger et al. 1998;Worden et al. 1999;Colak & Qahwaji 2008;Krista & Gallagher 2009;de Toma 2011;Verbeeck et al. 2014). These include clustering methods and thresholding methods with manual or automatic determination of this threshold, which can be global or local. Region-growing procedures, which use the connectivity of individual pixels to incorporate information about the local neighborhood, are also included in this category (Preminger et al. 1997;Benkhalil et al. 2003Benkhalil et al. , 2006Higgins et al. 2011;McAteer et al. 2005). Edge-based methods on the other hand focus on discontinuities and thus on locating region boundaries (Zharkov et al. 2005;Curto et al. 2008;Watson et al. 2009) These methods can be combined in many ways, and the literature on the subject is extensive. Aschwanden (2010) and Martens et al. (2012) review and summarize recent work in this area while Verbeeck et al. (2013) and Caballero & Aranda (2013) compare different segmentation procedures.
These unsupervised methods are usually designed for a specific scientific enquiry and assume a single correct answer. Boundaries of ARs or CHs are however fuzzy, and hence their precise determination is subject to a given scientific application. If a user is interested in an AR's core, and another in the bright region around the AR, they will have to tune the parameters of an unsupervised algorithm in order to meet their desired results. Although the introduction of expert knowledge is possible within this context (Barra et al. 2007), it is not straightforward. Finally, performance evaluation of unsupervised methods in the context of solar physics is still lacking (Zhang et al. 2008).
A supervised approach on the other hand is well suited when the user has prior knowledge about the segmentation, which is typically the case with EUV solar images. Ideally, it requires a ground truth', which is not readily available for the solar corona. It is however possible to rely on the large amount of scientific and operational research in interpreting solar images in order to define a desired ground truth. A Bayesian classifier attributes a class label to a pixel so as to maximize the posterior probability in a Bayesian sense. It builds upon the likelihood estimation of a pixel intensity probability density function (PDF) given a class label. This is possible thanks to the ground truth' segmentation provided at training time. Previous work on supervised classification in solar images has used a parametric PDF in the form of a (possibly multivariate) Gaussian (Dudok de Wit 2006;Rigler et al. 2012; or mixture of Gaussians (Turmon et al. 2002) for such estimation. The Bayesian framework allows introducing prior information, e.g. in the form of enforcing spatial smoothness (Turmon et al. 2002;Rigler et al. 2012).
In this paper, we propose a supervised classification scheme that is based on the Maximum A Posteriori (MAP) rule. We illustrate our method using a manually segmented image, as well as an image segmented using the unsupervised SPoCA procedure ) as training inputs. This allows us to compare the SPoCA and MAP classification, and to highlight the capability of building segmentation masks that are in line with what the user wants. For example, a forecaster could specify if only the bright core of an AR satisfies the definition of AR class (as in the SPoCA procedure) or if the extended bright region around the AR should also be included (as in our manually labelled image).
Our contribution in this paper is threefold. First, we use a non-parametric kernel density estimator to estimate the PDF of observing a pixel intensity in a given class. This choice is motivated by the complex noise structure in EUV images and by the need to provide some robustness with respect to wrongly labelled pixels in the training data. Second, unlike previous work, we do not assume that the classes are a priori equally likely. Instead, we propose to estimate the prior probability of belonging to a class through an Expectation-Maximization (EM) procedure. We use the Naive Bayes assumption to include other information such as the latitude distribution for each class to further improve the segmentation. Finally, we devise criteria to evaluate the performance of the segmentation method. The proposed framework is flexible enough to allow the inclusion of additional information in a consistent way.
Section 2 introduces the MAP classifier and Section 3 presents the dataset and training masks used in this work. Actual computation of the MAP classifier is discussed in Section 4 while its performance is evaluated in Section 5. Section 6 concludes with perspectives on future work.

Bayesian classifier
The objective of segmentation is to assign a label or class C i , i 2 {1,. . ., N}, to each pixel location x in an image. In this work, we consider mono-channel segmentation of EUV images into N = 3 classes: CH, AR, and QS. Generalization to multichannel acquisition is discussed in Section 6.
Our supervised method requires an expert to supply, at training time, a rough segmentation of one or more preprocessed images into the three classes. Assuming properly calibrated data, the classifier will segment new images and maximize the posterior probability P(C i |I(x)), that is, the probability that intensity I observed at pixel x of the dataset belongs to the class C i . Bayes' theorem states that this posterior probability is proportional to the likelihood p(I(x)|C i ) of observing pixel I(x) when class C i is assumed multiplied by the prior probability p(C i ) of being labelled C i : Assuming a uniform P(C i ) leads to a Maximum Likelihood (ML) classifier (Sect. 2.1). When a prior distribution on P(C i ) and possibly other quantities are introduced, this leads to a MAP estimator (Sect. 2.2). Section 2.3 explains how to maximize the corresponding likelihood function.

Maximum Likelihood classifier
ML classification is commonly used in remotely sensed data (Richards 1999). It assumes a uniform P(C i ) over the N classes, and hence maximizing the posterior probability P(C i |I(x)) is equivalent to maximizing the likelihood function p(I(x)|C i ) over the classes C i , i = 1, . . ., N.
The segmentation mask supplied at training time provides for each class a set of observed pixel intensities. This allows estimating the PDF p(I(x)|C i ) for each class C i . Often, a parametric PDF is assumed and its parameters are estimated via ML (Turmon et al. 2002;Dudok de Wit 2006;Rigler et al. 2012;. In this work, motivated by the noise statistics analysis of Section 3.1, and in order to be as general as possible, we estimate these PDFs using a nonparametric kernel density estimator. Let n i be the number of pixels belonging to class C i , and I j be the intensity observed in the jth pixel of class C i in the training set. The kernel density estimation for this class can be computed as: where K is a kernel or weight function. In our case, K is a standard Gaussian density function. In words, this kernel density estimation estimates a PDF by summing Gaussians centered around each data point (here the intensities observed). This is similar to a histogram, but allows some degree of smoothness to be enforced. It also ensures to always have a strictly positive density, unlike histograms which may have a probability of zero in a certain bin if no data point belongs to this bin. We choose the bandwidth h as h $ n À1=5 i , following common practice in the statistical literature: see Equation (6.19) in Scott (2015).
The ML classifier S ML (x) infers the labelling C i , i = 1, 2, or 3 of new points x as: that is, it attributes a pixel to the class C i corresponding to the highest value of the likelihood function.

Maximum A Posteriori classifier
Equation (1) implicitly assumes that all classes are equally likely, which is typically not the case: the QS covers the majority of the solar surface most of the time.
Besides the intensity distribution p(I(x)|C i ), other quantities could add pertinent information to the classification problem. An example is the distribution of latitude, p(L(x)|C i ), which is typically non-uniform since CHs and ARs appear preferentially at distinct latitudes.
In order to combine the various information, we use the Naive Bayes assumption of conditional independencies. More precisely, let us assume that intensity I(x) and latitude L(x) are statistically independent given the class labelling. The likelihood then factorizes as: Multiplying by p(C i ), the MAP classifier writes: Note that S ML is equal to S MAP with uniform p(L(x)|C i ) and p(C i ).
More properties may easily be incorporated in this framework under the Naive Bayes assumption. Examples include intensities in other wavelengths, magnetogram measurements, and optical flow velocities. Morphological and geometrical properties such as the area or properties describing the shape (Reiss et al. 2014) of connected components belonging to AR or CH classes could also be included.
In this paper, we show the performance of a MAP estimator given the intensity I(x) in one EUV channel as well as the heliographic latitude. We explain below how to estimate p(C i ) and p(L(x)|C i ) from a dataset of images.

Area coverage and latitude distribution estimation
Given a dataset of images to be segmented, the parameters of the distributions p(C i ) and p(L(x)|C i ) can be estimated using an EM procedure (Dempster et al. 1977) as follows: 1. Start with uniform p(C i ) and p(L(x)|C i ).

Compute MAP segmentations using this p(C i ) and
p(L(x)|C i ) on the dataset (Maximization step) 3. Recalculate p(C i ) and p(L(x)|C i ) from these MAP segmentations on the dataset (Expectation step) 4. Repeat steps 2 and 3 until the values converge.
One or both of these likelihood functions could be prescribed by an expert, or built manually. It is for example well known that CHs appear preferentially at high latitudes (Lowder et al. 2014) and that ARs appear in two bands on both sides of the equator (Yeates et al. 2008). Hence one could manually build p(L(x)|C i ) and simplify the above EM procedure by computing only p(C i ).
In this EM-scheme, the probability distributions p(C i ) and p(L(x)|C i ) are assumed to be stationary (i.e. not time-dependent) during the period covered by the dataset. More precisely we are computing an average distribution over that time period. As p(C i ) and p(L(x)|C i ) evolve over the solar cycle, care must be taken to estimate these distributions on appropriate time periods. When an extended time period is covered, a sliding window approach should be considered in order to handle the different regimes present in a solar cycle.
The above procedure can also integrate information on morphological or geometrical properties. Estimating the corresponding distributions would require a first segmentation to obtain connected components of AR and CH. The initial likelihood function in step 1 of the EM-scheme would then be the likelihood of these properties computed from an ML-based segmentation on the given dataset.

Fuzzy segmentation
Instead of producing a crisp segmentation, it is often useful to provide a degree of membership to every class. The posterior distribution p(C i |(I(x), L(x))) defined in Eqs.
(2)-(4) allows us to compute a degree of membership for class C i as: A fuzzy segmentation provides a degree of certainty about classified pixels. It allows trading one type of error for another. For example, one may want to include pixels having a relatively small AR membership into the AR class in order not to miss any AR pixels, even if it means misclassifying some QS pixels as AR. This could be accomplished by considering any pixel having an AR membership above a given treshold to be in the AR class. The membership value is also useful as an estimation of the accuracy of the segmentation at any given pixel.

Dataset and training masks
Since May 2010, the Atmospheric Imaging Assembly (AIA; Lemen et al. 2012) on board the Solar Dynamics Observatory (SDO) mission delivers 4096 · 4096 pixel images of the solar corona in 10 bandpasses. Since the structures we are interested in are large scale, and in order to reduce the computational and storage burdens, we use the level 1.5 calibrated and rebinned 1024 · 1024 pixel synoptic 2 images 3 .
Those synoptic AIA images are a proxy for the images to be produced by the SUVI instrument on board the next generation of GOES-R satellites. SUVI will return 1024 · 1024 images in six EUV channels at least once every minute. Those channels are similar or close to the ones of AIA. To demonstrate the properties of our algorithm we consider two datasets of SDO-AIA 19.3 nm images: A four year Dataset spanning October 1st, 2010 up to and including September 30th, 2014 at a cadence of one image per day. A three month Dataset going from January 1st, 2011 up to and including March 31st, 2011 at a higher cadence of one image per hour. When properly preprocessed as described in Section 3.2, statistics of a series of images are comparable.
The four year dataset covers roughly the ascending part of Solar Cycle 24, where ARs are relatively well defined and isolated, and CHs may extend up to low latitudes. Extending the analysis on other time periods, such as solar minima or maxima, would require training masks for these different regimes too, and to process the data using for example a sliding window approach.
The remaining part of this section presents a noise analysis of EUV images (Sect. 3.1), the preprocessing steps (Sect. 3.2), and the training masks (Sect. 3.3).

Noise statistics
The incidence of photon flux on a EUV telescope is converted to digital numbers (DN) through a series of steps, each introducing some noise: photon counting produces Poisson noise, the readout is affected by Gaussian noise, and multiplicative noise appears due to inhomogeneities in the CCD detector response or flatfield. Noisy observations can be modelled as the realization from a random variable Y, whose noise part can be decomposed into an additive, Poissonian, and multiplicative corruptions. Expectation E and Variance V of Y then verify: where r is the standard deviation of the additive component and a, b are parameters. To estimate the range where the noise tends to follow a Poisson distribution, it is useful to reparametrize the model with a ¼ r 2 =x 2 0 and b = gr 2 /x 0 leading to where x 0 is the point and g is the range around which the model behaves as Poissonian. In the first SDO-AIA 19.3 nm image recorded on February 11, 2011, we selected 24 areas in and outside the solar disk. We computed local means and variances in a 7 · 7 window within these areas and used those to fit Eq. (6). We obtain: CHs typically contain intensity values ranging between 5 and 50 DN, well within the Poissonian regime. This justifies the choice of a flexible non-parametric density estimator for the pixel intensity within each class as described in Section 2.1.

Preprocessing
The SDO-AIA synoptic images are already calibrated to level 1.5 by the instrument team. Our analysis uses the four year dataset, for which the intensity in each image is normalized to [0, 1] as follows:

Limb brightness correction
The apparent brightening of the solar corona is due to the integration over the line of sight through the optically thin material that composes the solar corona. This effect may hinder the segmentation process and hence needs to be removed. In Rigler et al. (2012), this is solved by assuming a constant height above the solar surface and computing the path length through this volume for the angle of view corresponding to each pixel. The corresponding image is then used as a pseudo-channel in a multi-channel procedure and accounts for limb brightening effects.
Our approach relies on an empirical functional fit, following the spirit of Harvey & White (1999) and Barra et al. (2009). Given that limb brightening depends only on the distance to the center of the solar disk as observed by the instrument, Barra et al. (2009) apply a polar transform to represent the image I in a (q, h) plane. They compute a profile F(q) as the integral over all angles q of I(q, h), and obtain the corrected image as the ratio between I(q, h) and F(q) multiplied by the median value of on-disk intensities. This correction is abrupt near the limb, and Verbeeck et al. (2014) proposed a smoother parametric fit to account for this effect.
In this paper, we provide a non-parametric fit to the intensity profile as follows. We apply a polar coordinate transform on each image. In helioprojective coordinates, the distance q to the disk center goes from 0 arcsec to the observed radius of the solar disk in arcseconds. The angle h goes from À180°to 180°. For each reprojected image, we compute the profile F I (q) as the median over all angles h for a given radius, i.e. F I (q) = median h I(q, h). Even though the median over all angles is taken, F I (q) will usually not be smooth due to the presence of ARs. Therefore the median of all image radial profiles is taken and used as the final radial profile to perform the limb brightness correction: F(q) = median I F I (q). The radial profile F(q) for the four year dataset is shown in Figure 1 in blue. The bump located at about 0.4 solar radii is due to the AR bands and is unwanted. To remove it, we consider a smaller dataset where relatively few ARs were present as proposed in Kraaikamp & Verbeeck (2015) and compute the radial intensity profile F(q) on this restricted dataset. We used the period from October 2010 to March 2011. The resulting radial intensity profile is shown in Figure 1 in green and is used as the final profile for the limb brightness correction.
This profile is approximately linear between 0 and 0.7 solar radii. Once this section is detrended it has a relative variance of only 0.016%. This should be small enough to avoid creating artifacts during the preprocessing. Once this profile is obtained we divide each pixel of the original image by the corresponding value of this profile to obtain a limb brightness corrected image. Since we estimate the brightness profile up to one solar radius, we mask out pixels that are off-disk.

Training masks
For the results shown in Section 4, we use as a first training mask a manually segmented preprocessed image from our four year dataset. Figure 2a shows the image taken on April 23, 2013 at 00:00:07 UT together with the AR, QS, and CH boundaries that were prescribed manually. Such masks can easily be created by a standard image manipulation program that features layered editing, for example the free and open source program GIMP.
Care must be taken to minimize overlap between masks for different classes. Indeed, although the proposed method can in theory handle such overlap, in practice they will translate into an increased overlap between intensity distribution of various classes, leading to increased classification uncertainty for the range of intensities in the overlapping region. Conversely, it is also important to avoid big gaps between the masks of neighboring classes to ensure that the decision boundary found through Eq. (4) is as desired. For example, adequate AR segmentation requires to provide masks not only for the ARs themselves, but also for the QS that is close to the desired AR boundaries.
The SPoCA-suite ) provides the SPoCA-AR and SPoCA-CH modules which are integrated into the SDO Event Detection System (EDS). Every 4 h, the EDS generates and uploads the SPoCA entries into the AR and CH catalogs of the HEK or Heliophysics Events Knowledgebase (Hurlburt et al. 2012) For evaluation purposes we used as a second training set the AR and CH masks computed for the EDS modules, see Figure 2b. The pixels that were classified as neither AR nor CH were used as the QS class.

Maximum A Posteriori segmentation
This section discusses the MAP segmentation obtained on the dataset described above.   Figure 3a displays the kernel density estimates of observed intensities per class. Those were obtained thanks to the two training masks: the manually segmented training mask that considers an extended AR (Fig. 2a), and the SPoCA segmented training mask that selects only the AR core (Fig. 2b). Each class distribution is roughly unimodal with only a small amount of overlap in the transition zones between CH and QS, and between QS and AR.
In the manually segmented image, we also observe some overlap between the dark CH and bright AR classes. This is most likely due to errors in the manual segmentation, e.g. bright points embedded in CHs that were misclassified as CH. The SPoCA training mask being computed with an automated threshold-based segmentation technique, produces a smaller overlap between the AR and QS classes. A small overlap is nevertheless still present due to post-processing, e.g. boundary smoothing and removal of small connected components from the AR class.

Distribution estimation using EM-scheme
We estimated the parameters of the distributions p(C i ) and p(L(x)|C i ) with the EM-scheme on the four year dataset, again using the manually segmented and the SPoCA segmented images. For both masks the EM-iteration converges to a maximum relative change of the values of p(C i ) of less than 1% in about five iterations. With our implementation, each iteration takes approximately 5 min of computation time on a server with two Intel Xeon E5-2680v2 processors with 10 cores each. The computation time can be decreased by subsampling the images in time. However, care must be taken that the subsampling does not interact with the solar rotation as it could bias the results toward certain Carrington longitudes. A good alternative to subsampling at regular intervals would be a random sampling. For the estimation of p(L(x)|C i ) it is important that the dataset spans over a time range that is long enough (multiple Carrington rotations) and is of a high enough cadence (at least one image per day). Once these distributions are estimated, they can be applied to all images in the dataset using Eq. (4). This way their final segmentation can be calculated in only one step. The area coverage is computed as the number of pixels belonging to a particular class to the number of on-disk pixels in an area-preserving sinusoidal reprojection of the original helioprojective image. With the manually segmented training mask, estimation of P(C i ) over the four year data set results in an area coverage of 8.55%, 81.96% and 9.49% for the AR, QS and CH class, respectively. With the SPoCA training mask, those values are 2.87%, 90.37% and 6.76%. The smaller values for the AR coverage in the latter case reflects the fact that the SPoCA-AR module is tailored to extract only the core of ARs, while our manually segmented image considered the extended part of an AR. Both values for the CH coverage are within the range found by Lowder et al. (2014) (see Fig. 9 therein).
The results for p(L(x)|C i ) are shown in Figure 3b. The wellknown AR bands are clearly defined, as well as the tendency for CHs to appear at high latitudes. Moreover, the North-South assymetry corresponding to the ascending phase of Solar Cycle 24 during 2010-2014 is noticeable in the CH and AR latitude distributions (Seaton et al. 2013). The AR latitude distribution for the segmentations using the SPoCA mask is slightly more peaked because this mask selects the core AR and their latitudinal extent is thus reduced. As the corresponding likelihood function is computed for pixels on the original helioprojective images, the number of pixels at high positive and negative latitude goes down closer to the poles, and hence the latitude distribution goes to zero at high latitudes for all three classes. The absolute values of these likelihoods are not important for our method, as we are only comparing the relative values of likelihood functions across classes. Figure 4 visualizes, for the manually segmented training image, the two-dimensional (2D) histogram computed as the product between p(I(x)|C i ) and p(L(x)|C i ) displayed in Figures 3a and 3b, respectively. For comparison, Figure 5 shows the 2D histograms of the intensity and latitude values that are observed after the MAP estimation is performed, removing thus any overlap between classes. This final estimation of the joint PDF p ((I(x), L(x))|C i ) exhibits an M-shape in the QS distribution, and shows dark features classified as CH Fig. 6. Comparison between ML (left column) and MAP (right column) classifiers on the same image as the initial training mask. Top: segmentation contours, with red being the AR class and cyan being the CH class. Bottom: CH membership, with black being 0% membership and white being 100% membership. even at low latitudes. The M-shape in the QS histogram is mainly due to the diffuse part around the AR, which produces two bands at the common AR latitudes.

Comparison between ML and MAP classifiers
The difference between the ML and MAP classifiers is highlighted in Figure 6. The ML classifier attributes to the CH class some pixels located in the NW quadrant, whereas the extra information in the MAP classifier greatly helps in decreasing the CH membership values in that region. For the image in Figure 6, the mean CH membership in the ML segmentation for all pixels that have less than 50% CH membership is 1.02 · 10 À1 with a variance of 1.15 · 10 À2 . For the MAP segmentation we obtain a mean of 2.95 · 10 À2 with a variance of 4.44 · 10 À3 . The much lower mean and variance for the MAP membership values suggest that the overall noise in low membership values is reduced, and membership values quickly drop to zero outside the desired regions, resulting in more sharply defined CHs. This is clearly visible in the bottom row of Figure 6.

Post-processing
The results that were presented here were not post-processed in any way. For some applications however, smoother boundaries and noise removal are desired. For these applications, one can use standard methods such as morphological opening and closing as shown in Verbeeck et al. (2014).

Performance evaluation
To evaluate the performance of our classifier, we first compare it against a test set of new manually labelled images (Sect. 5.1). Second, we devise some desired properties of an accurate solar image segmentation and show how our classifier performs against those properties (Sect. 5.2).

Validation using a test dataset
Performance evaluation of a supervised method is classically done against a test dataset. In our case, the test dataset is constituted of new manually segmented images. While Rigler et al. (2012) studied the effect of having different experts specify the training and the test segmentation masks, in this work the test masks were created by the same person who made the initial training mask in order to avoid personal biases from influencing the results. Note that they still do not provide an ideal ground truth since these manual segmentations may contain errors themselves.   8. Contours of manual segmentation in AR (red), QS (green), CH (cyan) and contours around regions of pixels that were classified differently by the MAP classifier (yellow). The image was taken on April 10, 2011 at 00:00:07 UT. Several misclassifications seem to be due to inaccuracies in the manual segmentation: 1. The small dark region in the SE quadrant was classified by the MAP classifier as CH instead of QS. 2. Some bright points that were manually segmented as QS or CH (when embedded in CH) but are more correctly labelled as AR by the MAP classifier. 3. Near the boundaries of the manual CH and AR segmentations some small regions were assigned to the QS class by the MAP classifier, which is arguably more correct for the cases shown in this figure.
Area coverage Time 0.06 J a n 0 5 2 0 1 1 J a n 1 9 2 0 1 1 F e b 0 2 2 0 1 1 F e b 1 6 2 0 1 1 M a r 0 2 2 0 1 1 M a r 1 6 2 0 1 1 M a r 3 0 2 0 1 1 A p r 1 3 2 0 1 1 A p r 2 7 2 0 1 1 We provided rough segmentations for nine new randomly selected images from the four year dataset. In these segmentations, we included only those regions that were least likely to result in errors in the manual segmentation, i.e. we did not segment regions that are in the transition zones between AR and QS, and between QS and CH. The results are shown in Table 1. From the results we can see that the distinction between CH and QS is more difficult to make than the distinction between AR and QS. In the case of the QS it should be noted that difficulties only arise for QS pixels that are almost AR or CH pixels. Since the manual segmentations focused primarily on regions that were not in the transition zones in order to avoid classification errors in the constructed ground truth, this is only an approximation of the true performance.   J a n 0 7 2 0 1 1 J a n 2 1 2 0 1 1 F e b 0 4 2 0 1 1 F e b 1 8 2 0 1 1 M a r 0 4 2 0 1 1 M a r 1 8 2 0 1 1 J a n 0 7 2 0 1 1 J a n 2 The shape of the membership value probability distribution for the pixels that were wrongly classified (Fig. 7) reveals that most wrongly classified pixels have a membership value around 0.5, and are likely to be in the ''fuzzy'' boundary region of each class. This suggests that the membership value is a useful indication of the classification accuracy. However, there is a second peak close to 100% membership. These pixel misclassifications are likely to be due to errors in the manual segmentation rather than to a poor performance of the algorithm. Figure 8 shows one of the test manual segmentations along with contours around pixels classified differently by the MAP classifier. It highlights several parts where misclassifications are due to inaccuracies from the manual segmentation.

Stability and consistency criteria
We saw in the previous section that creating a ground truth for a fully objective verification of this method is difficult: it would require a correct class assignment on a pixel by pixel basis. As an alternative, we look at desired properties of a segmentation method. We identify the following criteria for an accurate image segmentation method into large-scale features such as ARs and CHs: 1. Stable segmentations on short timescales in the absence of major solar activity. 2. Consistent results over longer time periods. 3. Consistent with the manually provided training masks: changes in the training mask should reflect in the resulting segmentations.
To show these properties we use the high cadence three month dataset and the same training masks as before. We reused the radial intensity profile as well as the distributions p(C i ) and p(L(x)|C i ) estimated on the four year dataset.
In the absence of major solar activity (e.g. flares), and on shorter timescales (on the order of hours) the observed size of CHs does not show large variations. The first property says that in these cases the area coverage computed from the segmentation should also show only minor variations over time. Figure 9 shows that this is indeed the case: the median variance in CH fractional area over a 24 h sliding window is 8.1 · 10 À6 , compared to a variance of 6.4 · 10 À4 for the entire three month period. This addresses the first property.
We can use the long term presence and relative stability of CHs to show the second property: similar area coverage should be observed on segmented images from one solar rotation to the next, provided the observed area of the CHs stays stable. We calculated the mean squared error between the CH area data at the original time and shifted by various offsets between 26 and 35 days, and found a minimum mean squared error at an offset of 29.22 days (Fig. 9). This rotation period corresponds to the differential rotation at a latitude of approximately 55°, which is consistent with our results for the latitude distribution of CH pixels in Figure 3b. This addresses the second property.
For the third property we compare the results from two different training masks: the manually segmented image that selects the extended AR (Fig. 2a) and the SPoCA segmented image that selects the core AR, and attributes to the QS class the diffuse part of the ARs (Fig. 2b). A comparison between the resulting segmentations in Figure 10 shows that, when using the SPoCA training mask, the AR class is reduced to only the brightest core of the ARs. Figure 11 compares area coverages resulting from the MAP segmentations (using the manually segmented training mask and the SPoCA segmented training mask) and from the unsupervised SPoCA segmentations. The area coverages of ARs produced by SPoCA are similar to the ones produced by the MAP classifier trained with the SPoCA mask. The CH area coverages produced by the MAP classifier trained on one SPoCA segmentated image are correlated with the SPoCA-CH area coverages, but with a variable offset. This offset results from the fact that the SPoCA-CH module puts a minimum lifetime requirement of three days on CH connected components, which reduces the number of pixels attributed to the CH class.

Conclusions
We proposed a flexible way of segmenting coronal EUV images into active regions, quiet Sun, and coronal holes that can be tailored to the user's needs. Our method produces crisp as well as fuzzy segmentations together with a measure of accuracy of the segmentation.
Contrary to many existing methods, our approach is not a ''black box'' and is easily adaptable to a user's needs through the manual segmentation of an example image as input. It provides a segmentation that is consistent with what the user, e.g. a space-weather forecaster, would provide. However, this also means that the method is heavily influenced by the user's skill and interests. Our method allows for special requirements on properties such as area and heliographic location distribution to be enforced via a suitable likelihood function for each parameter. These likelihood functions can be manually specified in order to e.g. exclude certain unwanted features, but can also be automatically estimated from the data by using an Expectation-Maximization procedure. This paper showed that including such information improves upon the final segmentation.
The proposed method can be extended to multiple bandpasses and instruments by using the Naive Bayes assumption and multiplying the marginal intensity likelihood of the different bandpasses. Such an approach will however not take into account the correlation between intensities observed in different bandpasses. One way to account for such correlation is to combine mono-channel results using fusion theory (Barra et al. 2007;. This allows one to define how to deal with consonant and partially conflicting information from the various channels (Bloch 1996). This method can also be extended to applications other than coronal segmentation, for example to segment the photosphere and to detect sunspot umbra and penumbra.
The coronal hole class extracted with our method possibly contains some filament channels, which also appear as dark features in EUV images. Distinguishing between filament channels and coronal holes would require to include morphological or geometrical properties such as the ones defined in Reiss et al. (2015) or magnetogram information (Scholl & Habbal 2008;Lowder et al. 2014).
The Maximum Likelihood and Maximum a posteriori segmentations presented here provide a consistent way for comparing manual segmentations done by different experts, or segmentations obtained from different automated algorithms. Given distinct training sets, the dispersion in the resulting class intensity probability density function could for example be measured using a Kullback-Leibler distance (Kullback 1959).
The segmentations resulting from the various training sets can also be compared against stability and consistency criteria such as the ones defined in Section 5.2. From such a study, one could derive a commonly accepted standard, e.g. defining how far an active region should extend. This would provide a benchmark for comparing different feature detection algorithms.
The software implementing this method along with a script to reproduce all results outlined in this paper is available at http://bitbucket.org/rubendv/bayesian_segmentation_code.