Supervised classification of solar features using prior information

Ruben De Visscher; Véronique Delouille; Pierre Dupont; Charles-Alban Deledalle

doi:10.1051/swsc/2015033

All issues

Volume 5 (2015)

J. Space Weather Space Clim., 5 (2015) A34

Full HTML

Statistical Challenges in Solar Information Processing

Open Access

Issue		J. Space Weather Space Clim. Volume 5, 2015 Statistical Challenges in Solar Information Processing


Article Number		A34
Number of page(s)		12
DOI		https://doi.org/10.1051/swsc/2015033
Published online		27 October 2015

J. Space Weather Space Clim., 5, A34 (2015)

Technical Article

Supervised classification of solar features using prior information

Ruben De Visscher¹^*, Véronique Delouille¹, Pierre Dupont² and Charles-Alban Deledalle³

¹ Royal Observatory of Belgium, 1180 Brussels, Belgium
² Université catholique de Louvain – ICTEAM, 1348 Louvain-la-Neuve, Belgium
³ Université Bordeaux 1, 33405 Talence, France

^* Corresponding author: ruben.devisscher@observatory.be

Received: 11 June 2014
Accepted: 27 August 2015

Abstract

Context: The Sun as seen by Extreme Ultraviolet (EUV) telescopes exhibits a variety of large-scale structures. Of particular interest for space-weather applications is the extraction of active regions (AR) and coronal holes (CH). The next generation of GOES-R satellites will provide continuous monitoring of the solar corona in six EUV bandpasses that are similar to the ones provided by the SDO-AIA EUV telescope since May 2010. Supervised segmentations of EUV images that are consistent with manual segmentations by for example space-weather forecasters help in extracting useful information from the raw data.

Aims: We present a supervised segmentation method that is based on the Maximum A Posteriori rule. Our method allows integrating both manually segmented images as well as other type of information. It is applied on SDO-AIA images to segment them into AR, CH, and the remaining Quiet Sun (QS) part.

Methods: A Bayesian classifier is applied on training masks provided by the user. The noise structure in EUV images is non-trivial, and this suggests the use of a non-parametric kernel density estimator to fit the intensity distribution within each class. Under the Naive Bayes assumption we can add information such as latitude distribution and total coverage of each class in a consistent manner. Those information can be prescribed by an expert or estimated with an Expectation-Maximization algorithm.

Results: The segmentation masks are in line with the training masks given as input and show consistency over time. Introduction of additional information besides pixel intensity improves upon the quality of the final segmentation.

Conclusions: Such a tool can aid in building automated segmentations that are consistent with some ground truth’ defined by the users.

Key words: Solar image processing / Corona / Statistics and probability / Classification

© R. De Visscher et al., Published by EDP Sciences 2015

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Solar images exhibit a variety of large-scale structures with potential space-weather effects. The most prominent examples are coronal holes (CH), where the high-speed solar wind departs (Krieger et al. 1973; Verbanac et al. 2011) and active regions (AR), which have the potential to produce flares and to be associated with coronal mass ejections. An accurate monitoring of AR, Quiet Sun (QS), and CH can also serve as input into solar EUV irradiance reconstruction models (Haberreiter et al. 2014).

Synoptic maps showing the various characteristics of the solar surface are produced manually on a daily basis by the forecasters at the NOAA Space Weather Prediction Center¹. Such a partition constitutes a form of segmentation mask, where each pixel spatial coordinate is associated with some label’ corresponding to a given feature.

With the ever-growing volume of data available (e.g. Solar Dynamics Observatory delivering 1TB of data per day since May 2010), automated feature detection and identification methods have rapidly developed in recent years. Segmentation procedures can be divided into two categories, depending on the type of prior knowledge used to train the process. Unsupervised methods automatically select the partitioning criteria, whereas in supervised segmentation, direct user guidance is required by means of a training set.

Unsupervised segmentation procedures are classically divided into region-based, edge-based, and hybrid methods. The first category encompasses histogram-based segmentation, where pixels are classified according to their intensity (Pettauer & Brandt 1997; Steinegger et al. 1998; Worden et al. 1999; Colak & Qahwaji 2008; Krista & Gallagher 2009; de Toma 2011; Verbeeck et al. 2014). These include clustering methods and thresholding methods with manual or automatic determination of this threshold, which can be global or local. Region-growing procedures, which use the connectivity of individual pixels to incorporate information about the local neighborhood, are also included in this category (Preminger et al. 1997; Benkhalil et al. 2003, 2006; Higgins et al. 2011; McAteer et al. 2005). Edge-based methods on the other hand focus on discontinuities and thus on locating region boundaries (Zharkov et al. 2005; Curto et al. 2008; Watson et al. 2009) These methods can be combined in many ways, and the literature on the subject is extensive. Aschwanden (2010) and Martens et al. (2012) review and summarize recent work in this area while Verbeeck et al. (2013) and Caballero & Aranda (2013) compare different segmentation procedures.

These unsupervised methods are usually designed for a specific scientific enquiry and assume a single correct answer. Boundaries of ARs or CHs are however fuzzy, and hence their precise determination is subject to a given scientific application. If a user is interested in an AR’s core, and another in the bright region around the AR, they will have to tune the parameters of an unsupervised algorithm in order to meet their desired results. Although the introduction of expert knowledge is possible within this context (Barra et al. 2007), it is not straightforward. Finally, performance evaluation of unsupervised methods in the context of solar physics is still lacking (Zhang et al. 2008).

A supervised approach on the other hand is well suited when the user has prior knowledge about the segmentation, which is typically the case with EUV solar images. Ideally, it requires a ground truth’, which is not readily available for the solar corona. It is however possible to rely on the large amount of scientific and operational research in interpreting solar images in order to define a desired ground truth. A Bayesian classifier attributes a class label to a pixel so as to maximize the posterior probability in a Bayesian sense. It builds upon the likelihood estimation of a pixel intensity probability density function (PDF) given a class label. This is possible thanks to the ground truth’ segmentation provided at training time. Previous work on supervised classification in solar images has used a parametric PDF in the form of a (possibly multivariate) Gaussian (Dudok de Wit 2006; Rigler et al. 2012; Colak & Qahwaji 2013) or mixture of Gaussians (Turmon et al. 2002) for such estimation. The Bayesian framework allows introducing prior information, e.g. in the form of enforcing spatial smoothness (Turmon et al. 2002; Rigler et al. 2012).

In this paper, we propose a supervised classification scheme that is based on the Maximum A Posteriori (MAP) rule. We illustrate our method using a manually segmented image, as well as an image segmented using the unsupervised SPoCA procedure (Verbeeck et al. 2014) as training inputs. This allows us to compare the SPoCA and MAP classification, and to highlight the capability of building segmentation masks that are in line with what the user wants. For example, a forecaster could specify if only the bright core of an AR satisfies the definition of AR class (as in the SPoCA procedure) or if the extended bright region around the AR should also be included (as in our manually labelled image).

Our contribution in this paper is threefold. First, we use a non-parametric kernel density estimator to estimate the PDF of observing a pixel intensity in a given class. This choice is motivated by the complex noise structure in EUV images and by the need to provide some robustness with respect to wrongly labelled pixels in the training data. Second, unlike previous work, we do not assume that the classes are a priori equally likely. Instead, we propose to estimate the prior probability of belonging to a class through an Expectation-Maximization (EM) procedure. We use the Naive Bayes assumption to include other information such as the latitude distribution for each class to further improve the segmentation. Finally, we devise criteria to evaluate the performance of the segmentation method. The proposed framework is flexible enough to allow the inclusion of additional information in a consistent way.

Section 2 introduces the MAP classifier and Section 3 presents the dataset and training masks used in this work. Actual computation of the MAP classifier is discussed in Section 4 while its performance is evaluated in Section 5. Section 6 concludes with perspectives on future work.

2. Bayesian classifier

The objective of segmentation is to assign a label or class C_i, i ∈ {1,…, N}, to each pixel location x in an image. In this work, we consider mono-channel segmentation of EUV images into N = 3 classes: CH, AR, and QS. Generalization to multi-channel acquisition is discussed in Section 6.

Our supervised method requires an expert to supply, at training time, a rough segmentation of one or more preprocessed images into the three classes. Assuming properly calibrated data, the classifier will segment new images and maximize the posterior probability P(C_i|I(x)), that is, the probability that intensity I observed at pixel x of the dataset belongs to the class C_i. Bayes’ theorem states that this posterior probability is proportional to the likelihood p(I(x)|C_i) of observing pixel I(x) when class C_i is assumed multiplied by the prior probability p(C_i) of being labelled C_i: $p (C_{i} | I (x)) \sim p (I (x) | C_{i}) P (C_{i}) .$ $p({C}_i|I(x))\sim p(I(x)|{C}_i)P\left({C}_i\right).$

Assuming a uniform P(C_i) leads to a Maximum Likelihood (ML) classifier (Sect. 2.1). When a prior distribution on P(C_i) and possibly other quantities are introduced, this leads to a MAP estimator (Sect. 2.2). Section 2.3 explains how to maximize the corresponding likelihood function.

2.1. Maximum Likelihood classifier

ML classification is commonly used in remotely sensed data (Richards 1999). It assumes a uniform P(C_i) over the N classes, and hence maximizing the posterior probability P(C_i|I(x)) is equivalent to maximizing the likelihood function p(I(x)|C_i) over the classes C_i, i = 1, …, N.

The segmentation mask supplied at training time provides for each class a set of observed pixel intensities. This allows estimating the PDF p(I(x)|C_i) for each class C_i. Often, a parametric PDF is assumed and its parameters are estimated via ML (Turmon et al. 2002; Dudok de Wit 2006; Rigler et al. 2012; Colak & Qahwaji 2013). In this work, motivated by the noise statistics analysis of Section 3.1, and in order to be as general as possible, we estimate these PDFs using a nonparametric kernel density estimator. Let n_i be the number of pixels belonging to class C_i, and I_j be the intensity observed in the jth pixel of class C_i in the training set. The kernel density estimation for this class can be computed as: $p (y) = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} \frac{1}{h} K (\frac{y - I_{j}}{h})$ $p(y)=\frac{1}{{n}_i}\sum_{j=1}^{{n}_i} \frac{1}{h}K\left(\frac{y-{I}_j}{h}\right)$ where K is a kernel or weight function. In our case, K is a standard Gaussian density function. In words, this kernel density estimation estimates a PDF by summing Gaussians centered around each data point (here the intensities observed). This is similar to a histogram, but allows some degree of smoothness to be enforced. It also ensures to always have a strictly positive density, unlike histograms which may have a probability of zero in a certain bin if no data point belongs to this bin. We choose the bandwidth h as $h\sim {n}_i^{-1/5}$ , following common practice in the statistical literature: see Equation (6.19) in Scott (2015).

The ML classifier S ^ML(x) infers the labelling C_i, i = 1, 2, or 3 of new points x as: $S^{ML} (x) = \arg \max_{C_{i}} p (I (x) | C_{i}), i = 1, \dots, 3,$ ${S}^{\mathrm{ML}}(x)=\mathrm{arg}\underset{{C}_i}{\mathrm{max}}p(I(x)|{C}_i),\hspace{1em}i=1,\dots,\enspace 3,$ (1)that is, it attributes a pixel to the class C_i corresponding to the highest value of the likelihood function.

2.2. Maximum A Posteriori classifier

Equation (1) implicitly assumes that all classes are equally likely, which is typically not the case: the QS covers the majority of the solar surface most of the time.

Besides the intensity distribution p(I(x)|C_i), other quantities could add pertinent information to the classification problem. An example is the distribution of latitude, p(L(x)|C_i), which is typically non-uniform since CHs and ARs appear preferentially at distinct latitudes.

In order to combine the various information, we use the Naive Bayes assumption of conditional independencies. More precisely, let us assume that intensity I(x) and latitude L(x) are statistically independent given the class labelling. The likelihood then factorizes as: $p ((I (x), L (x)) | C_{i}) \approx p (I (x) | C_{i}) p (L (x) | C_{i}) .$ $p\left(\left(I(x),L(x)\right)|{C}_i\right)\approx p(I(x)|{C}_i)p(L(x)|{C}_i).$ (2)

Multiplying by p(C_i), the MAP classifier writes: $S^{MAP} (x) = \arg \max_{C_{i}} p (C_{i} | (I (x), L (x))),$ ${S}^{\mathrm{MAP}\enspace }(x)=\mathrm{arg}\underset{{C}_i}{\mathrm{max}}\enspace {p}\left({C}_i|\left(I(x),L(x)\right)\right),$ (3) $\approx \arg \max_{C_{i}} p (I (x) | C_{i}) p (L (x) | C_{i}) p (C_{i}) .$ $\approx \mathrm{arg}\underset{{C}_i}{\mathrm{max}}\enspace {p}(I(x)|{C}_i)p(L(x)|{C}_i)p\left({C}_i\right).$ (4)

Note that S^ML is equal to S^MAP with uniform p(L(x)|C_i) and p(C_i).

More properties may easily be incorporated in this framework under the Naive Bayes assumption. Examples include intensities in other wavelengths, magnetogram measurements, and optical flow velocities. Morphological and geometrical properties such as the area or properties describing the shape (Reiss et al. 2014) of connected components belonging to AR or CH classes could also be included.

In this paper, we show the performance of a MAP estimator given the intensity I(x) in one EUV channel as well as the heliographic latitude. We explain below how to estimate p(C_i) and p(L(x)|C_i) from a dataset of images.

2.3 Area coverage and latitude distribution estimation

Given a dataset of images to be segmented, the parameters of the distributions p(C_i) and p(L(x)|C_i) can be estimated using an EM procedure (Dempster et al. 1977) as follows:

Start with uniform p(C_i) and p(L(x)|C_i).
Compute MAP segmentations using this p(C_i) and p(L(x)|C_i) on the dataset (Maximization step)
Recalculate p(C_i) and p(L(x)|C_i) from these MAP segmentations on the dataset (Expectation step)
Repeat steps 2 and 3 until the values converge.

One or both of these likelihood functions could be prescribed by an expert, or built manually. It is for example well known that CHs appear preferentially at high latitudes (Lowder et al. 2014) and that ARs appear in two bands on both sides of the equator (Yeates et al. 2008). Hence one could manually build p(L(x)|C_i) and simplify the above EM procedure by computing only p(C_i).

In this EM-scheme, the probability distributions p(C_i) and p(L(x)|C_i) are assumed to be stationary (i.e. not time-dependent) during the period covered by the dataset. More precisely we are computing an average distribution over that time period. As p(C_i) and p(L(x)|C_i) evolve over the solar cycle, care must be taken to estimate these distributions on appropriate time periods. When an extended time period is covered, a sliding window approach should be considered in order to handle the different regimes present in a solar cycle.

The above procedure can also integrate information on morphological or geometrical properties. Estimating the corresponding distributions would require a first segmentation to obtain connected components of AR and CH. The initial likelihood function in step 1 of the EM-scheme would then be the likelihood of these properties computed from an ML-based segmentation on the given dataset.

2.4. Fuzzy segmentation

Instead of producing a crisp segmentation, it is often useful to provide a degree of membership to every class. The posterior distribution p(C_i|(I(x), L(x))) defined in Eqs. (2)–(4) allows us to compute a degree of membership for class C_i as: $M (x, C_{i}) = \frac{p (C_{i} | (I (x), L (x)))}{\sum_{j} p (C_{j} | (I (x), L (x)))} .$ $M\left(x,{C}_i\right)=\frac{p\left({C}_i|\left(I(x),L(x)\right)\right)}{\sum_j p\left({C}_j|\left(I(x),L(x)\right)\right)}.$ (5)

A fuzzy segmentation provides a degree of certainty about classified pixels. It allows trading one type of error for another. For example, one may want to include pixels having a relatively small AR membership into the AR class in order not to miss any AR pixels, even if it means misclassifying some QS pixels as AR. This could be accomplished by considering any pixel having an AR membership above a given treshold to be in the AR class. The membership value is also useful as an estimation of the accuracy of the segmentation at any given pixel.

3. Dataset and training masks

Since May 2010, the Atmospheric Imaging Assembly (AIA; Lemen et al. 2012) on board the Solar Dynamics Observatory (SDO) mission delivers 4096 × 4096 pixel images of the solar corona in 10 bandpasses. Since the structures we are interested in are large scale, and in order to reduce the computational and storage burdens, we use the level 1.5 calibrated and rebinned 1024 × 1024 pixel synoptic² images³.

Those synoptic AIA images are a proxy for the images to be produced by the SUVI instrument on board the next generation of GOES-R satellites. SUVI will return 1024 × 1024 images in six EUV channels at least once every minute. Those channels are similar or close to the ones of AIA. To demonstrate the properties of our algorithm we consider two datasets of SDO-AIA 19.3 nm images:

A four year Dataset spanning October 1st, 2010 up to and including September 30th, 2014 at a cadence of one image per day.
A three month Dataset going from January 1st, 2011 up to and including March 31st, 2011 at a higher cadence of one image per hour.

When properly preprocessed as described in Section 3.2, statistics of a series of images are comparable.

The four year dataset covers roughly the ascending part of Solar Cycle 24, where ARs are relatively well defined and isolated, and CHs may extend up to low latitudes. Extending the analysis on other time periods, such as solar minima or maxima, would require training masks for these different regimes too, and to process the data using for example a sliding window approach.

The remaining part of this section presents a noise analysis of EUV images (Sect. 3.1), the preprocessing steps (Sect. 3.2), and the training masks (Sect. 3.3).

3.1. Noise statistics

The incidence of photon flux on a EUV telescope is converted to digital numbers (DN) through a series of steps, each introducing some noise: photon counting produces Poisson noise, the read-out is affected by Gaussian noise, and multiplicative noise appears due to inhomogeneities in the CCD detector response or flatfield. Noisy observations can be modelled as the realization from a random variable Y, whose noise part can be decomposed into an additive, Poissonian, and multiplicative corruptions. Expectation E and Variance V of Y then verify: $E [Y] = x and V [Y] = σ^{2} + β x + α x^{2},$ $\mathbb{E}[Y]=x\hspace{1em}\mathrm{and}\hspace{1em}\mathbb{V}[Y]={\sigma }^2+{\beta x}+\alpha {x}^2,$ where σ is the standard deviation of the additive component and α, β are parameters. To estimate the range where the noise tends to follow a Poisson distribution, it is useful to reparametrize the model with $\alpha ={\sigma }^2/{x}_0^2$ and β = ησ²/x₀ leading to $V [Y] = σ^{2} [1 + η \frac{x}{x_{0}} + \frac{x^{2}}{x_{0}^{2}}],$ $\mathbb{V}[Y]={\sigma }^2\left[1+\eta \frac{x}{{x}_0}+\frac{{x}^2}{{x}_0^2}\right],$ (6)where x₀ is the point and η is the range around which the model behaves as Poissonian. In the first SDO-AIA 19.3 nm image recorded on February 11, 2011, we selected 24 areas in and outside the solar disk. We computed local means and variances in a 7 × 7 window within these areas and used those to fit Eq. (6). We obtain: $\sigma =\sqrt{3},{x}_0=25,\eta =1$ . CHs typically contain intensity values ranging between 5 and 50 DN, well within the Poissonian regime. This justifies the choice of a flexible non-parametric density estimator for the pixel intensity within each class as described in Section 2.1.

3.2. Preprocessing

The SDO-AIA synoptic images are already calibrated to level 1.5 by the instrument team. Our analysis uses the four year dataset, for which the intensity in each image is normalized to [0, 1] as follows:

Perform limb brightness correction (see Sect. 3.2.1)
Rescale to [0, 1] by dividing by the 99.9th percentile of the intensity values of one of the images (in our case the first one) and clipping the resulting value to the interval [0, 1]. This keeps rare extreme values from skewing the scaling of the images.

3.2.1. Limb brightness correction

The apparent brightening of the solar corona is due to the integration over the line of sight through the optically thin material that composes the solar corona. This effect may hinder the segmentation process and hence needs to be removed.

In Rigler et al. (2012), this is solved by assuming a constant height above the solar surface and computing the path length through this volume for the angle of view corresponding to each pixel. The corresponding image is then used as a pseudo-channel in a multi-channel procedure and accounts for limb brightening effects.

Our approach relies on an empirical functional fit, following the spirit of Harvey & White (1999) and Barra et al. (2009). Given that limb brightening depends only on the distance to the center of the solar disk as observed by the instrument, Barra et al. (2009) apply a polar transform to represent the image I in a (ρ, θ) plane. They compute a profile F(ρ) as the integral over all angles ρ of I(ρ, θ), and obtain the corrected image as the ratio between I(ρ, θ) and F(ρ) multiplied by the median value of on-disk intensities. This correction is abrupt near the limb, and Verbeeck et al. (2014) proposed a smoother parametric fit to account for this effect.

In this paper, we provide a non-parametric fit to the intensity profile as follows. We apply a polar coordinate transform on each image. In helioprojective coordinates, the distance ρ to the disk center goes from 0 arcsec to the observed radius of the solar disk in arcseconds. The angle θ goes from −180° to 180°. For each reprojected image, we compute the profile F_I(ρ) as the median over all angles θ for a given radius, i.e. F_I(ρ) = median_θI(ρ, θ). Even though the median over all angles is taken, F_I(ρ) will usually not be smooth due to the presence of ARs. Therefore the median of all image radial profiles is taken and used as the final radial profile to perform the limb brightness correction: F(ρ) = median_IF_I(ρ). The radial profile F(ρ) for the four year dataset is shown in Figure 1 in blue. The bump located at about 0.4 solar radii is due to the AR bands and is unwanted. To remove it, we consider a smaller dataset where relatively few ARs were present as proposed in Kraaikamp & Verbeeck (2015) and compute the radial intensity profile F(ρ) on this restricted dataset. We used the period from October 2010 to March 2011. The resulting radial intensity profile is shown in Figure 1 in green and is used as the final profile for the limb brightness correction.

Fig. 1.

Median radial intensity profile of the images from October 2010 until March 2011.

This profile is approximately linear between 0 and 0.7 solar radii. Once this section is detrended it has a relative variance of only 0.016%. This should be small enough to avoid creating artifacts during the preprocessing. Once this profile is obtained we divide each pixel of the original image by the corresponding value of this profile to obtain a limb brightness corrected image. Since we estimate the brightness profile up to one solar radius, we mask out pixels that are off-disk.

3.3. Training masks

For the results shown in Section 4, we use as a first training mask a manually segmented preprocessed image from our four year dataset. Figure 2a shows the image taken on April 23, 2013 at 00:00:07 UT together with the AR, QS, and CH boundaries that were prescribed manually. Such masks can easily be created by a standard image manipulation program that features layered editing, for example the free and open source program GIMP.

Fig. 2.

Two training masks drawn on the SDO-AIA 19.3 nm image on April 23, 2013 at 00:00:07 UT. Red contours are the AR class, green contours are the QS class, and cyan contours are the CH class. (a) Manual segmentation that selects extended ARs. (b) SPoCA segmentation that selects the core of ARs. The QS class consists of the pixels that are neither AR nor CH.

Care must be taken to minimize overlap between masks for different classes. Indeed, although the proposed method can in theory handle such overlap, in practice they will translate into an increased overlap between intensity distribution of various classes, leading to increased classification uncertainty for the range of intensities in the overlapping region. Conversely, it is also important to avoid big gaps between the masks of neighboring classes to ensure that the decision boundary found through Eq. (4) is as desired. For example, adequate AR segmentation requires to provide masks not only for the ARs themselves, but also for the QS that is close to the desired AR boundaries.

The SPoCA-suite (Verbeeck et al. 2014) provides the SPoCA-AR and SPoCA-CH modules which are integrated into the SDO Event Detection System (EDS). Every 4 h, the EDS generates and uploads the SPoCA entries into the AR and CH catalogs of the HEK or Heliophysics Events Knowledgebase (Hurlburt et al. 2012) For evaluation purposes we used as a second training set the AR and CH masks computed for the EDS modules, see Figure 2b. The pixels that were classified as neither AR nor CH were used as the QS class.

4. Maximum A Posteriori segmentation

This section discusses the MAP segmentation obtained on the dataset described above.

4.1. Distribution within each class

Figure 3a displays the kernel density estimates of observed intensities per class. Those were obtained thanks to the two training masks: the manually segmented training mask that considers an extended AR (Fig. 2a), and the SPoCA segmented training mask that selects only the AR core (Fig. 2b). Each class distribution is roughly unimodal with only a small amount of overlap in the transition zones between CH and QS, and between QS and AR.

Fig. 3.

(a) Kernel density estimate for pixel intensity; p(I(x)|C_i) for the manually segmented image and for the SPoCA segmented image, and (b) EM estimate of latitude distribution; p(L(x)|C_i) as estimated from the four year dataset using the manually segmented image and the SPoCA segmented image.

In the manually segmented image, we also observe some overlap between the dark CH and bright AR classes. This is most likely due to errors in the manual segmentation, e.g. bright points embedded in CHs that were misclassified as CH. The SPoCA training mask being computed with an automated threshold-based segmentation technique, produces a smaller overlap between the AR and QS classes. A small overlap is nevertheless still present due to post-processing, e.g. boundary smoothing and removal of small connected components from the AR class.

4.2. Distribution estimation using EM-scheme

We estimated the parameters of the distributions p(C_i) and p(L(x)|C_i) with the EM-scheme on the four year dataset, again using the manually segmented and the SPoCA segmented images. For both masks the EM-iteration converges to a maximum relative change of the values of p(C_i) of less than 1% in about five iterations. With our implementation, each iteration takes approximately 5 min of computation time on a server with two Intel Xeon E5-2680v2 processors with 10 cores each. The computation time can be decreased by subsampling the images in time. However, care must be taken that the subsampling does not interact with the solar rotation as it could bias the results toward certain Carrington longitudes. A good alternative to subsampling at regular intervals would be a random sampling. For the estimation of p(L(x)|C_i) it is important that the dataset spans over a time range that is long enough (multiple Carrington rotations) and is of a high enough cadence (at least one image per day). Once these distributions are estimated, they can be applied to all images in the dataset using Eq. (4). This way their final segmentation can be calculated in only one step.

The area coverage is computed as the number of pixels belonging to a particular class to the number of on-disk pixels in an area-preserving sinusoidal reprojection of the original helioprojective image. With the manually segmented training mask, estimation of P(C_i) over the four year data set results in an area coverage of 8.55%, 81.96% and 9.49% for the AR, QS and CH class, respectively. With the SPoCA training mask, those values are 2.87%, 90.37% and 6.76%. The smaller values for the AR coverage in the latter case reflects the fact that the SPoCA-AR module is tailored to extract only the core of ARs, while our manually segmented image considered the extended part of an AR. Both values for the CH coverage are within the range found by Lowder et al. (2014) (see Fig. 9 therein).

The results for p(L(x)|C_i) are shown in Figure 3b. The well-known AR bands are clearly defined, as well as the tendency for CHs to appear at high latitudes. Moreover, the North-South assymetry corresponding to the ascending phase of Solar Cycle 24 during 2010–2014 is noticeable in the CH and AR latitude distributions (Seaton et al. 2013). The AR latitude distribution for the segmentations using the SPoCA mask is slightly more peaked because this mask selects the core AR and their latitudinal extent is thus reduced. As the corresponding likelihood function is computed for pixels on the original helioprojective images, the number of pixels at high positive and negative latitude goes down closer to the poles, and hence the latitude distribution goes to zero at high latitudes for all three classes. The absolute values of these likelihoods are not important for our method, as we are only comparing the relative values of likelihood functions across classes.

Figure 4 visualizes, for the manually segmented training image, the two-dimensional (2D) histogram computed as the product between p(I(x)|C_i) and p(L(x)|C_i) displayed in Figures 3a and 3b, respectively. For comparison, Figure 5 shows the 2D histograms of the intensity and latitude values that are observed after the MAP estimation is performed, removing thus any overlap between classes. This final estimation of the joint PDF p((I(x), L(x))|C_i) exhibits an M-shape in the QS distribution, and shows dark features classified as CH even at low latitudes. The M-shape in the QS histogram is mainly due to the diffuse part around the AR, which produces two bands at the common AR latitudes.

Fig. 4.

p((I(x), L(x))|C_i) estimated with a kernel density estimator and the EM-scheme on the four year dataset using the manually segmented image.

Fig. 5.

2D normalized histogram of intensity and latitude observed in the final MAP segmentation of the four year dataset using the manually segmented image.

4.3. Comparison between ML and MAP classifiers

The difference between the ML and MAP classifiers is highlighted in Figure 6. The ML classifier attributes to the CH class some pixels located in the NW quadrant, whereas the extra information in the MAP classifier greatly helps in decreasing the CH membership values in that region.

Fig. 6.

Comparison between ML (left column) and MAP (right column) classifiers on the same image as the initial training mask. Top: segmentation contours, with red being the AR class and cyan being the CH class. Bottom: CH membership, with black being 0% membership and white being 100% membership.

For the image in Figure 6, the mean CH membership in the ML segmentation for all pixels that have less than 50% CH membership is 1.02 × 10⁻¹ with a variance of 1.15 × 10⁻². For the MAP segmentation we obtain a mean of 2.95 × 10⁻² with a variance of 4.44 × 10⁻³. The much lower mean and variance for the MAP membership values suggest that the overall noise in low membership values is reduced, and membership values quickly drop to zero outside the desired regions, resulting in more sharply defined CHs. This is clearly visible in the bottom row of Figure 6.

4.4. Post-processing

The results that were presented here were not post-processed in any way. For some applications however, smoother boundaries and noise removal are desired. For these applications, one can use standard methods such as morphological opening and closing as shown in Verbeeck et al. (2014).

5. Performance evaluation

To evaluate the performance of our classifier, we first compare it against a test set of new manually labelled images (Sect. 5.1). Second, we devise some desired properties of an accurate solar image segmentation and show how our classifier performs against those properties (Sect. 5.2).

5.1. Validation using a test dataset

Performance evaluation of a supervised method is classically done against a test dataset. In our case, the test dataset is constituted of new manually segmented images. While Rigler et al. (2012) studied the effect of having different experts specify the training and the test segmentation masks, in this work the test masks were created by the same person who made the initial training mask in order to avoid personal biases from influencing the results. Note that they still do not provide an ideal ground truth since these manual segmentations may contain errors themselves.

We provided rough segmentations for nine new randomly selected images from the four year dataset. In these segmentations, we included only those regions that were least likely to result in errors in the manual segmentation, i.e. we did not segment regions that are in the transition zones between AR and QS, and between QS and CH. The results are shown in Table 1. From the results we can see that the distinction between CH and QS is more difficult to make than the distinction between AR and QS. In the case of the QS it should be noted that difficulties only arise for QS pixels that are almost AR or CH pixels. Since the manual segmentations focused primarily on regions that were not in the transition zones in order to avoid classification errors in the constructed ground truth, this is only an approximation of the true performance.

Table 1.

Confusion matrix of our method on nine new manually segmented test images. Row i is expressed in percentage of the number of pixels that were attributed a label i by the manual segmentation.

The shape of the membership value probability distribution for the pixels that were wrongly classified (Fig. 7) reveals that most wrongly classified pixels have a membership value around 0.5, and are likely to be in the “fuzzy” boundary region of each class. This suggests that the membership value is a useful indication of the classification accuracy. However, there is a second peak close to 100% membership. These pixel misclassifications are likely to be due to errors in the manual segmentation rather than to a poor performance of the algorithm. Figure 8 shows one of the test manual segmentations along with contours around pixels classified differently by the MAP classifier. It highlights several parts where misclassifications are due to inaccuracies from the manual segmentation.

Fig. 7.

Probability distribution of membership values of pixels that were assigned a different class by the MAP classifier than by the manual segmentation.

Fig. 8.

Contours of manual segmentation in AR (red), QS (green), CH (cyan) and contours around regions of pixels that were classified differently by the MAP classifier (yellow). The image was taken on April 10, 2011 at 00:00:07 UT. Several misclassifications seem to be due to inaccuracies in the manual segmentation: 1. The small dark region in the SE quadrant was classified by the MAP classifier as CH instead of QS. 2. Some bright points that were manually segmented as QS or CH (when embedded in CH) but are more correctly labelled as AR by the MAP classifier. 3. Near the boundaries of the manual CH and AR segmentations some small regions were assigned to the QS class by the MAP classifier, which is arguably more correct for the cases shown in this figure.

5.2. Stability and consistency criteria

We saw in the previous section that creating a ground truth for a fully objective verification of this method is difficult: it would require a correct class assignment on a pixel by pixel basis. As an alternative, we look at desired properties of a segmentation method. We identify the following criteria for an accurate image segmentation method into large-scale features such as ARs and CHs:

Stable segmentations on short timescales in the absence of major solar activity.
Consistent results over longer time periods.
Consistent with the manually provided training masks: changes in the training mask should reflect in the resulting segmentations.

To show these properties we use the high cadence three month dataset and the same training masks as before. We reused the radial intensity profile as well as the distributions p(C_i) and p(L(x)|C_i) estimated on the four year dataset.

In the absence of major solar activity (e.g. flares), and on shorter timescales (on the order of hours) the observed size of CHs does not show large variations. The first property says that in these cases the area coverage computed from the segmentation should also show only minor variations over time. Figure 9 shows that this is indeed the case: the median variance in CH fractional area over a 24 h sliding window is 8.1 × 10⁻⁶, compared to a variance of 6.4 × 10⁻⁴ for the entire three month period. This addresses the first property.

Fig. 9.

CH area coverage at the original time, and shifted by 29.22 days (approximately one solar rotation at 55° latitude).

We can use the long term presence and relative stability of CHs to show the second property: similar area coverage should be observed on segmented images from one solar rotation to the next, provided the observed area of the CHs stays stable. We calculated the mean squared error between the CH area data at the original time and shifted by various offsets between 26 and 35 days, and found a minimum mean squared error at an offset of 29.22 days (Fig. 9). This rotation period corresponds to the differential rotation at a latitude of approximately 55°, which is consistent with our results for the latitude distribution of CH pixels in Figure 3b. This addresses the second property.

For the third property we compare the results from two different training masks: the manually segmented image that selects the extended AR (Fig. 2a) and the SPoCA segmented image that selects the core AR, and attributes to the QS class the diffuse part of the ARs (Fig. 2b). A comparison between the resulting segmentations in Figure 10 shows that, when using the SPoCA training mask, the AR class is reduced to only the brightest core of the ARs. Figure 11 compares area coverages resulting from the MAP segmentations (using the manually segmented training mask and the SPoCA segmented training mask) and from the unsupervised SPoCA segmentations. The area coverages of ARs produced by SPoCA are similar to the ones produced by the MAP classifier trained with the SPoCA mask. The CH area coverages produced by the MAP classifier trained on one SPoCA segmentated image are correlated with the SPoCA-CH area coverages, but with a variable offset. This offset results from the fact that the SPoCA-CH module puts a minimum lifetime requirement of three days on CH connected components, which reduces the number of pixels attributed to the CH class.

Fig. 10.

Comparison between diffuse AR (left column) and core AR (right column) classifiers on the image taken on March 8, 2011 at 00:00:06 UT. Top: segmentation contours, with red being the AR class and cyan being the CH class. Bottom: AR membership, with black being 0% membership and white being 100% membership.

Fig. 11.

Comparison between area coverages obtained from 1. MAP segmentation trained with the manually segmented mask (in blue), 2. MAP segmentation trained with the SPoCA segmented mask (in green), 3. unsupervised SPoCA segmentation (in red). (a) Comparison of AR area coverages; (b) Comparison of CH area coverages.

6. Conclusions

We proposed a flexible way of segmenting coronal EUV images into active regions, quiet Sun, and coronal holes that can be tailored to the user’s needs. Our method produces crisp as well as fuzzy segmentations together with a measure of accuracy of the segmentation.

Contrary to many existing methods, our approach is not a “black box” and is easily adaptable to a user’s needs through the manual segmentation of an example image as input. It provides a segmentation that is consistent with what the user, e.g. a space-weather forecaster, would provide. However, this also means that the method is heavily influenced by the user’s skill and interests. Our method allows for special requirements on properties such as area and heliographic location distribution to be enforced via a suitable likelihood function for each parameter. These likelihood functions can be manually specified in order to e.g. exclude certain unwanted features, but can also be automatically estimated from the data by using an Expectation-Maximization procedure. This paper showed that including such information improves upon the final segmentation.

The proposed method can be extended to multiple bandpasses and instruments by using the Naive Bayes assumption and multiplying the marginal intensity likelihood of the different bandpasses. Such an approach will however not take into account the correlation between intensities observed in different bandpasses. One way to account for such correlation is to combine mono-channel results using fusion theory (Barra et al. 2007; Colak & Qahwaji 2013). This allows one to define how to deal with consonant and partially conflicting information from the various channels (Bloch 1996). This method can also be extended to applications other than coronal segmentation, for example to segment the photosphere and to detect sunspot umbra and penumbra.

The coronal hole class extracted with our method possibly contains some filament channels, which also appear as dark features in EUV images. Distinguishing between filament channels and coronal holes would require to include morphological or geometrical properties such as the ones defined in Reiss et al. (2015) or magnetogram information (Scholl & Habbal 2008; Lowder et al. 2014).

The Maximum Likelihood and Maximum a posteriori segmentations presented here provide a consistent way for comparing manual segmentations done by different experts, or segmentations obtained from different automated algorithms. Given distinct training sets, the dispersion in the resulting class intensity probability density function could for example be measured using a Kullback-Leibler distance (Kullback 1959). The segmentations resulting from the various training sets can also be compared against stability and consistency criteria such as the ones defined in Section 5.2. From such a study, one could derive a commonly accepted standard, e.g. defining how far an active region should extend. This would provide a benchmark for comparing different feature detection algorithms.

The software implementing this method along with a script to reproduce all results outlined in this paper is available at http://bitbucket.org/rubendv/bayesian_segmentation_code.

Acknowledgments

The authors would like to thank Benjamin Mampaey for providing the SPoCA masks. RDV and VD acknowledge support from the Belgian Federal Science Policy Office through the ESA-PRODEX program, Grant No. 4000103240. RDV further acknowledges support from the BRAIN.be program of the Belgian Federal Science Policy Office. Thanks are extended to the two referees for their pertinent comments that helped improving this manuscript. The editor thanks Michael Kirk and an anonymous referee for their assistance in evaluating this paper.

¹

See http://www.swpc.noaa.gov/products/solar-synoptic-map.

²

Not to be confused with the synoptic data product of HMI.

³

Available at http://jsoc2.stanford.edu/data/aia/synoptic/.

References

Aschwanden, M.J. Image processing techniques and feature recognition in solar physics. Sol. Phys., 262, 235–275, 2010, DOI: 10.1007/s11207-009-9474-y. [Google Scholar]
Barra, V., V. Delouille, and J. Hochedez. Segmentation of extreme ultraviolet solar images using a multispectral data fusion process, in IEEE International Conference on Fuzzy Systems, 1–6, 2007, DOI: 10.1109/FUZZY.2007.4295367. [Google Scholar]
Barra, V., V. Delouille, M. Kretzschmar, and J.F. Hochedez. Fast and robust segmentation of solar EUV images: algorithm and results for solar cycle 23. A&A, 505, 361–371, 2009, DOI: 10.1051/0004-6361/200811416. [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Benkhalil, A., V. Zharkova, S. Zharkov, and S. Ipson. Automatic identification of active regions (Plages) in the full-disk solar images using local thresholding and region growing techniques, in Proceedings of the AISB'03 Symposium, Aberystwyth, 11 April 2003, 66–73, 2003. [Google Scholar]
Benkhalil, A., V.V. Zharkova, S. Zharkov, and S. Ipson. Active region detection and verification with the solar feature catalogue. Sol. Phys., 235, 87–106, 2006, DOI: 10.1007/s11207-006-0023-7. [Google Scholar]
Bloch, I. Information combination operators for data fusion: a comparative review with classification. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 26 (1), 52–67, 1996, DOI: 10.1109/3468.477860. [CrossRef] [Google Scholar]
Caballero, C., and M.C. Aranda. A comparative study of clustering methods for active region detection in solar EUV images. Sol. Phys., 283, 691–717, 2013, DOI: 10.1007/s11207-013-0239-2. [Google Scholar]
Colak, T., and R. Qahwaji. Automated McIntosh-based classification of sunspot groups using MDI images. Sol. Phys., 248, 277–296, 2008, DOI: 10.1007/s11207-007-9094-3. [Google Scholar]
Colak, T., and R. Qahwaji. Prediction of EVE/ESP irradiance from SDO/AIA images using Fuzzy image processing and machine learning. Sol. Phys., 283, 143–156, 2013, DOI: 10.1007/s11207-011-9880-9. [NASA ADS] [CrossRef] [Google Scholar]
Curto, J.J., M. Blanca, and E. Martínez. Automatic sunspots detection on full-disk solar images using mathematical morphology. Sol. Phys., 250, 411–429, 2008, DOI: 10.1007/s11207-008-9224-6. [Google Scholar]
de Toma, G. Evolution of coronal holes and implications for high-speed solar wind during the minimum between cycles 23 and 24. Sol. Phys., 274, 195–217, 2011, DOI: 10.1007/s11207-010-9677-2. [NASA ADS] [CrossRef] [Google Scholar]
Dempster, A.P., N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc.: Ser. B, 39, 1–38, 1977. [Google Scholar]
Dudok de Wit, T. Fast segmentation of solar extreme ultraviolet images. Sol. Phys., 239, 519–530, 2006. [Google Scholar]
Haberreiter, M., V. Delouille, B. Mampaey, C. Verbeeck, G. Del Zanna, and S. Wieman. Reconstruction of the solar EUV irradiance from 1996 to 2010 based on SOHO/EIT images. J. Space Weather Space Clim., 4 (27), A30, 2014, DOI: 10.1051/swsc/2014027. [CrossRef] [EDP Sciences] [Google Scholar]
Harvey, K.L., and O.R. White. Magnetic and radiative variability of solar surface structures. I. Image decomposition and magnetic-intensity mapping. Astrophys. J., 515, 812–831, 1999, DOI: 10.1086/307035. [Google Scholar]
Higgins, P.A., P.T. Gallagher, R.T.J. McAteer, and D.S. Bloomfield. Solar magnetic feature detection and tracking for space weather monitoring. Adv. Space Res., 47, 2105–2117, 2011, DOI: 10.1016/j.asr.2010.06.024. [NASA ADS] [CrossRef] [Google Scholar]
Hurlburt, N., M. Cheung, C. Schrijver, L. Chang, S. Freeland, et al. Heliophysics event knowledgebase for the Solar Dynamics Observatory (SDO) and beyond. Sol. Phys., 275 (1–2), 67–78, 2012, DOI: 10.1007/s11207-010-9624-2. [Google Scholar]
Kraaikamp, E., and C. Verbeeck. Solar Demon – an approach to detecting flares, dimmings, and EUV waves on SDO/AIA images. J. Space Weather Space Clim., 5, A18, 2015, DOI: 10.1051/swsc/2015019. [CrossRef] [EDP Sciences] [Google Scholar]
Krieger, A.S., A.F. Timothy, and E.C. Roelof. A coronal hole and its identification as the source of a high velocity solar wind stream. Sol. Phys., 29, 505–525, 1973, DOI: 10.1007/BF00150828. [NASA ADS] [CrossRef] [Google Scholar]
Krista, L.D., and P.T. Gallagher. Automated coronal hole detection using local intensity thresholding techniques. Sol. Phys., 256, 87–100, 2009, DOI: 10.1007/s11207-009-9357-2. [Google Scholar]
Kullback, S. Information theory and statistics, New York: John Wiley, 1959. [Google Scholar]
Lemen, J.R., A.M. Title, D.J. Akin, P.F. Boerner, and the AIA team. The Atmospheric Imaging Assembly (AIA) on the Solar Dynamics Observatory (SDO). Sol. Phys., 275, 17–40, 2012, DOI: 10.1007/s11207-011-9776-8. [NASA ADS] [CrossRef] [Google Scholar]
Lowder, C., J. Qiu, R. Leamon, and Y. Liu. Measurements of EUV coronal holes and open magnetic flux. Astrophys. J., 783 (2), 142, 2014, DOI: 10.1088/0004-637X/783/2/142. [CrossRef] [Google Scholar]
Martens, P.C.H., G.D.R. Attrill, A.R. Davey, A. Engell, S. Farid, et al. Computer vision for the Solar Dynamics Observatory (SDO). Sol. Phys., 275, 79–113, 2012, DOI: 10.1007/s11207-010-9697-y. [NASA ADS] [CrossRef] [Google Scholar]
McAteer, R.T.J., P.T. Gallagher, J. Ireland, and C.A. Young. Automated boundary-extraction and region-growing techniques applied to solar magnetograms. Sol. Phys., 228, 55–66, 2005, DOI: 10.1007/s11207-005-4075-x. [NASA ADS] [CrossRef] [Google Scholar]
Pettauer, T., and P. Brandt. On novel methods to determine areas of sunspots from photoheliograms. Sol. Phys., 175, 197–203, 1997. [NASA ADS] [CrossRef] [Google Scholar]
Preminger, D., S. Walton, and G. Chapman. Solar feature identification using contrasts and contiguity. Sol. Phys., 171, 303–330, 1997. [NASA ADS] [CrossRef] [Google Scholar]
Reiss, M., M. Temmer, R. Rotter, S. Hofmeister, and A. Veronig. Identification of coronal holes and lament channels in SDO/AIA 193A images via geometrical classification methods. Cent. Eur. Astrophys. Bull., 1, 95–104, 2014. [Google Scholar]
Reiss, M.A., S.J. Hofmeister, R. De Visscher, M. Temmer, A.M. Veronig, V. Delouille, B. Mampaey, and H. Ahammer. Improvements on coronal hole detection in SDO/AIA images using supervised classification. Journal of Space Weather and Space Climate, 5, A23, 2015, DOI: 10.1051/swsc/2015025. [CrossRef] [EDP Sciences] [Google Scholar]
Richards, J. Remote sensing digital image analysis, Springer-Verlag, Berlin, ISBN: 0471056693, 1999. [Google Scholar]
Rigler, E.J., S.M. Hill, A.A. Reinard, and R.A. Steenburgh. Solar thematic maps for space weather operations. Space Weather, 10, S08009, 2012, DOI: 10.1029/2012SW000780. [CrossRef] [Google Scholar]
Scholl, I.F., and S.R. Habbal. Automatic detection and classification of coronal holes and filaments based on EUV and magnetogram observations of the solar disk. Sol. Phys., 248, 425–439, 2008, DOI: 10.1007/s11207-007-9075-6. [NASA ADS] [CrossRef] [MathSciNet] [PubMed] [Google Scholar]
Scott, D.W. Multivariate density estimation: theory, practice, and visualization (Wiley Series in probability and statistics), 2 edn. Wiley, ISBN: 978-0-471-69755-8, 2015. [Google Scholar]
Seaton, D.B., A. De Groof, P. Shearer, D. Berghmans, and B. Nicula. SWAP observations of the long-term, large-scale evolution of the extreme-ultraviolet solar corona. Astrophys. J., 777, 72, 2013, DOI: 10.1088/0004-637X/777/1/72. [Google Scholar]
Steinegger, M., J. Bonet, M. Vazquez, and A. Jimenez. On the intensity thresholds of the network and plage regions. Sol. Phys., 177, 279–286, 1998. [NASA ADS] [CrossRef] [Google Scholar]
Turmon, M., J.M. Pap, and S. Mukhtar. Statistical pattern recognition for labeling solar active regions: application to SOHO/MDI imagery. Astrophys. J., 568, 396–407, 2002, DOI: 10.1086/338681. [Google Scholar]
Verbanac, G., B. Vršnak, S. Živković, T. Hojsak, A.M. Veronig, and M. Temmer. Solar wind high-speed streams and related geomagnetic activity in the declining phase of solar cycle 23. A&A, 533, A49, 2011, DOI: 10.1051/0004-6361/201116615. [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Verbeeck, C., V. Delouille, B. Mampaey, and R. De Visscher. The SPoCA-suite: software for extraction, characterization, and tracking of active regions and coronal holes on EUV images. A&A, 561, A29, 2014, DOI: 10.1051/0004-6361/201321243. [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Verbeeck, C., P.A. Higgins, T. Colak, F.T. Watson, V. Delouille, B. Mampaey, and R. Qahwaji. A multi-wavelength analysis of active regions and sunspots by comparison of automatic detection algorithms. Sol. Phys., 283, 67–95, 2013, DOI: 10.1007/s11207-011-9859-6. [NASA ADS] [CrossRef] [Google Scholar]
Watson, F., L. Fletcher, S. Dalla, and S. Marshall. Modelling the longitudinal asymmetry in sunspot emergence: the role of the Wilson depression. Sol. Phys., 260, 5–19, 2009, DOI: 10.1007/s11207-009-9420-z. [NASA ADS] [CrossRef] [Google Scholar]
Worden, J., T. Woods, W. Neupert, and J. Delaboudiniere. Evolution of chromospheric structures: how chromospheric structures contribute to the solar He II 30.4 nanometer irradiance and variability. Astrophys. J., 511, 965–975, 1999. [Google Scholar]
Yeates, A.R., D.H. Mackay, and A.A. van Ballegooijen. Evolution and distribution of current helicity in full-sun simulations. Astrophys. J., 680, L165–L168, 2008, DOI: 10.1086/590057. [CrossRef] [Google Scholar]
Zhang, H., J.E. Fritts, and S.A. Goldman. Image segmentation evaluation: a survey of unsupervised methods. Comput. Vis. Image Underst., 110 (2), 260–280, 2008, DOI: 10.1016/j.cviu.2007.08.003. [Google Scholar]
Zharkov, S., V.V. Zharkova, and S.S. Ipson. Statistical properties of sunspots in 1996–2004: I. Detection, North South asymmetry and area distribution. Sol. Phys., 228, 377–397, 2005, DOI: 10.1007/s11207-005-5005-7. [NASA ADS] [CrossRef] [Google Scholar]

Cite this article as: De Visscher R, Delouille V, Dupont P & Deledalle C-A. Supervised classification of solar features using prior information. J. Space Weather Space Clim., 5, A34, 2015, DOI: 10.1051/swsc/2015033.

All Tables

Table 1.

Confusion matrix of our method on nine new manually segmented test images. Row i is expressed in percentage of the number of pixels that were attributed a label i by the manual segmentation.

In the text

All Figures

	Fig. 1. Median radial intensity profile of the images from October 2010 until March 2011.
In the text

	Fig. 2. Two training masks drawn on the SDO-AIA 19.3 nm image on April 23, 2013 at 00:00:07 UT. Red contours are the AR class, green contours are the QS class, and cyan contours are the CH class. (a) Manual segmentation that selects extended ARs. (b) SPoCA segmentation that selects the core of ARs. The QS class consists of the pixels that are neither AR nor CH.
In the text

	Fig. 3. (a) Kernel density estimate for pixel intensity; p(I(x)\|C_i) for the manually segmented image and for the SPoCA segmented image, and (b) EM estimate of latitude distribution; p(L(x)\|C_i) as estimated from the four year dataset using the manually segmented image and the SPoCA segmented image.
In the text

	Fig. 4. p((I(x), L(x))\|C_i) estimated with a kernel density estimator and the EM-scheme on the four year dataset using the manually segmented image.
In the text

	Fig. 5. 2D normalized histogram of intensity and latitude observed in the final MAP segmentation of the four year dataset using the manually segmented image.
In the text

	Fig. 6. Comparison between ML (left column) and MAP (right column) classifiers on the same image as the initial training mask. Top: segmentation contours, with red being the AR class and cyan being the CH class. Bottom: CH membership, with black being 0% membership and white being 100% membership.
In the text

	Fig. 7. Probability distribution of membership values of pixels that were assigned a different class by the MAP classifier than by the manual segmentation.
In the text

Fig. 8.

Contours of manual segmentation in AR (red), QS (green), CH (cyan) and contours around regions of pixels that were classified differently by the MAP classifier (yellow). The image was taken on April 10, 2011 at 00:00:07 UT. Several misclassifications seem to be due to inaccuracies in the manual segmentation: 1. The small dark region in the SE quadrant was classified by the MAP classifier as CH instead of QS. 2. Some bright points that were manually segmented as QS or CH (when embedded in CH) but are more correctly labelled as AR by the MAP classifier. 3. Near the boundaries of the manual CH and AR segmentations some small regions were assigned to the QS class by the MAP classifier, which is arguably more correct for the cases shown in this figure.

In the text

	Fig. 9. CH area coverage at the original time, and shifted by 29.22 days (approximately one solar rotation at 55° latitude).
In the text

	Fig. 10. Comparison between diffuse AR (left column) and core AR (right column) classifiers on the image taken on March 8, 2011 at 00:00:06 UT. Top: segmentation contours, with red being the AR class and cyan being the CH class. Bottom: AR membership, with black being 0% membership and white being 100% membership.
In the text

	Fig. 11. Comparison between area coverages obtained from 1. MAP segmentation trained with the manually segmented mask (in blue), 2. MAP segmentation trained with the SPoCA segmented mask (in green), 3. unsupervised SPoCA segmentation (in red). (a) Comparison of AR area coverages; (b) Comparison of CH area coverages.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.