Issue 
J. Space Weather Space Clim.
Volume 9, 2019



Article Number  A17  
Number of page(s)  7  
DOI  https://doi.org/10.1051/swsc/2019016  
Published online  17 May 2019 
Research Article
Why do some probabilistic forecasts lack reliability?
National Institute of Information and Communications Technology, Tokyo
1848795, Japan
^{*} Corresponding author: kubo@nict.go.jp
Received:
30
July
2018
Accepted:
16
April
2019
In this work, we investigate the reliability of the probabilistic binary forecast. We mathematically prove that a necessary, but not sufficient, condition for achieving a reliable probabilistic forecast is maximizing the Peirce Skill Score (PSS) at the threshold probability of the climatological base rate. The condition is confirmed by using artificially synthesized forecast–outcome pair data and previously published probabilistic solar flare forecast models. The condition gives a partial answer as to why some probabilistic forecast system lack reliability, because the system, which does not satisfy the proved condition, can never be reliable. Therefore, the proved condition is very important for the developers of a probabilistic forecast system. The result implies that those who want to develop a reliable probabilistic forecast system must adjust or train the system so as to maximize PSS near the threshold probability of the climatological base rate.
Key words: probabilistic forecast / reliability / necessary condition / Peirce Skill Score / forecast model
© Y. Kubo, Published by EDP Sciences 2019
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Forecasts of space weather phenomena have become operational. There are at least two types of forecast of the occurrence of space weather phenomena, namely, deterministic and probabilistic. Because it is difficult to forecast the occurrence of natural phenomena deterministically, a probabilistic forecast is suitable for the occurrence of space weather phenomena, such as solar flare. Moreover, the deterministic forecast can easily be derived by thresholding to a probabilistic forecast (e.g., Jolliffe & Stephenson, 2012). Converting the probabilistic forecast to a deterministic forecast can be performed by forecast users themselves, whose threshold probabilities to determine event occurrence are different. Several authors (e.g., Murphy, 1977; Richardson, 2000; Zhu et al., 2002) showed in a framework of decisionanalytic models that a relative economic value of a probabilistic forecast is higher than that of a deterministic forecast, which meant that a probabilistic forecast is more useful than a deterministic forecast in the sense of economic value. Murphy (1993) mentioned in the sense of forecast consistency that
For these reasons, probabilistic forecast models for the occurrence of space weather phenomena have been developed by several authors.“Since forecasters’ judgments necessarily contain an element of uncertainty, their forecasts must reflect this uncertainty accurately in order to satisfy the basic maxim of forecasting. In general, then, forecasts must be expressed in probabilistic terms.”
Solar flare occurrence forecasts have been actively studied in the operational space weather forecast community. Recently, many articles related to solar flare occurrence forecasts have been published, which include deterministic forecasts as well as probabilistic forecasts. Examples are humanjudged forecasts (Crown, 2012; Devos et al., 2014; Kubo et al., 2017; Murray et al., 2017), statistical methods (Wheatland, 2005; Falconer et al., 2011; Bloomfield et al., 2012; McCloskey et al., 2016; Steward et al., 2017; Leka et al., 2018), and machine learning forecasts (Bobra & Couvidat, 2015; Muranushi et al., 2015; Huang et al., 2018; Nishizuka et al., 2017, 2018). Many authors have assessed the performance of the forecast models. However, many of the probabilistic forecast models verify a discrimination performance only by using a relative operating characteristic curve, and do not verify other attributes such as reliability, which is one of the most important attributes to be assessed in forecast verification. Several authors (e.g., Jolliffe & Stephenson, 2012; Kubo et al., 2017) mentioned that there are many attributes to be assessed for forecast verification, such as bias, accuracy, discrimination, reliability, and skill. Murphy (1991) pointed out that only one verification measure was not enough to correctly assess the forecast performance due to high dimensionality of a joint probability density of the outcome and forecast. For example, at least three verification measures are required in case of a dichotomous deterministic forecast because the dimensionality in this situation is three.
Efforts on comparing the performances of several forecast models have also been in progress. Barnes et al. (2016) compared eleven probabilistic solar flare forecast models and used the relative operating characteristic curve, reliability diagram, and Brier skill score as a verification measures, together with some skill scores for a contingency table created using only one threshold probability of the probabilistic forecast. The reliability diagrams shown in Barnes et al. (2016) showed that several probabilistic solar flare forecast models lack reliability. In the terrestrial weather forecast community, an unreliable probabilistic forecast model is often calibrated (e.g., Gneiting et al., 2007; Primo et al., 2009). However, the calibration of an unreliable probabilistic forecast model is not yet popular in the space weather forecast community. Therefore, it is better that direct outputs from the probabilistic forecast model are already reliable. To realize reliable outputs of probabilistic forecast models, we must investigate the reason why some probabilistic forecast models lack reliability.
We investigate a condition for a probabilistic binary forecast to be reliable in this work. In Section 2, we investigate the condition mathematically, and derive a necessary, but not a sufficient, condition for a probabilistic binary forecast to be reliable. In Section 3, the condition will be confirmed by using artificially synthesized forecast probabilities with corresponding outcomes and several probabilistic solar flare forecast models. The discussion and conclusion are described in Sections 4 and 5, respectively.
2 Mathematical derivation of the condition
One of the important attributes to be satisfied for the probabilistic forecast system is reliability. Reliability means a coincidence between the forecast of an event occurrence probability x with the probability density function p(x) (0 ≤ x ≤ 1) and the conditional expectation value of the outcome given the probability x; (e.g., Jolliffe & Stephenson, 2012). If a probabilistic forecast system is perfectly reliable, should be equal to x. In the case of a binary event, as the outcome o is 1 (100% probability) for an event and 0 (0% probability) for no event, can be rewritten as:
(1)where p(o = 1, x) and p(o = 0, x) are joint probability densities of the outcome and forecast. Therefore, the equation,
(2)must be satisfied for a perfectly reliable probabilistic forecast system.
By using Bayes’ theorem, can be rewritten as:
(3)where and are the conditional probability density functions given the outcome of event and no event, respectively. Hereafter, we refer to and as p _{1}(x) and p _{0}(x), respectively. As p(o = 1) is a climatological base rate, we write p(o = 1) and p(o = 0) as s and 1 − s, respectively. From the equations (2) and (3), the equation
(4)is derived for a perfectly reliable forecast system. Here, we define the function f(x) as:
(5) f(x) takes zero for x = s, positive or zero for 0 ≤ x < s, and negative or zero for s < x ≤ 1, because p _{1}(x) takes a positive or zero value for 0 ≤ x ≤ 1.
Because p _{1}(x) and p _{0}(x) are conditional probability density functions given the outcome of event and no event, respectively, the integrals of the functions p _{1}(x) and p _{0}(x) from x to 1 are regarded as a Probability of Detection (POD) and a Probability of False Detection (POFD), respectively, in the forecast verification measure. Therefore, a derivative of the Peirce Skill Score^{1} (PSS = POD − POFD) by x becomes f(x). As already mentioned, because f(x) takes zero for x = s, positive or zero for 0 ≤ x < s, and negative or zero for s < x ≤ 1, PSS(x) is maximum at x = s. In conclusion, we were able to prove that the proposition,
(6)is true. This means that the maximization of PSS at a threshold probability, which is equal to the climatological base rate, is a necessary condition for a reliable probabilistic forecast.
In the following section, we investigate whether the derived necessary condition is sufficient. If a probabilistic forecast system is unreliable, the conditional expectation value of the outcome given a forecast of the event occurrence probability is not equals to the forecast probability, that is,
(7)can be assumed, where g(x) is a function representing a reliability curve. From the equations (1), (3), and (7), the equation
(8)is derived. As p _{1}(x) and p _{0}(x) are conditional probability density functions given the outcome of event and no event, respectively, a derivative of PSS by x is written as:
If there exists a function g(x) satisfying
(10)then PSS(x) can be maximum at x = s, because the derivative of PSS(x) by x takes zero for x = s, positive or zero for 0 ≤ x < s, and negative or zero for s < x ≤ 1. Actually, because the function
(11)satisfies the equation (10), PSS(x) is maximum at x = s for the unreliable forecast system. Therefore, the proposition,
(12)is false. This means that the proposition,
(13)is false, and the maximization of PSS at a threshold probability equal to the climatological base rate is a necessary, but not sufficient, condition for a reliable probabilistic forecast system.
An important point is that no assumption is made for a functional form of the probability density p _{1}(x) and p _{0}(x) when deriving the condition. This means that the condition is independent of the form of the probability density function.
3 Confirmation using forecast data and models
The necessary condition derived in the previous section is based on continuous probability density functions, which implies that it is based on an infinite number of sample data. However, no infinite number of samples is available in reality. Therefore, the derived condition should be confirmed by using a finite number of sample data. In this section, we confirm the derived condition first by using artificially sampled forecast–outcome pairs and then by using several probabilistic solar flare forecast models described by Barnes et al. (2016).
3.1 Synthetic forecast data
A probabilistic binary forecast system is fully determined by defining the climatological base rate s and two conditional probability density functions of event occurrence probability, p _{1}(x) and p _{0}(x). Synthetic forecast–outcome pairs are randomly sampled from p _{1}(x) and p _{0}(x), so as to climatological base rate being s. In this article, the climatological base rate s is fixed at 0.1, which represents a somewhat rare event case. The total number of sampled forecast–outcome pairs is 10 000.
Because the independent variable of the conditional probability density functions is the probability x, the range of x must be from 0 to 1. Therefore, the beta distribution Be(x; a, b) is employed for probability density functions, whose definition appears in Appendix. While a beta distribution can flexibly change its shape depending on the two parameters, it is suitable for investigating various types of situations. Three cases are investigated: (1) perfectly reliable, (2) PSS is maximum at the probability largely different from the climatological base rate, and (3) PSS is maximum at the probability equal to the climatological base rate but unreliable. Although only specific forms of probability density function are considered in the subsequent three subsections, the results of the studies are independent of the form of the probability density function.
3.2 Case 1: Perfectly reliable forecast
In case 1, the two conditional probability density functions of event occurrence probability, p _{1}(x) and p _{0}(x), are set as Be(x; 1.1, 0.9) and [10Be(x; 0.1, 0.9) − Be(x; 1.1, 0.9)]/9, respectively, so that the two density functions satisfy the equation (4), which states that the probabilistic forecast system is reliable. Randomly sampled variates from the probability density functions are pooled as the artificial forecast–outcome pairs.
Figure 1a shows a reliability diagram for case 1. The blue dots connected by lines depict the conditional expectation values of the outcome. A perfect reliability curve is depicted by the diagonal dashed line, on which the 99% consistency bars (Bröcker & Smith, 2007, Jolliffe & Stephenson, 2012) are drawn as vertical dashes. The 99% consistency bar shows the range within which 99% of the conditional expectation value of the outcome given the probability would fall, if it were assumed that the original data is sampled from the perfectly reliable probabilistic forecast system. The red histograms with the right axis show a number of probabilistic forecasts within bins. It is clear that all the conditional expectation values of the outcome are located within the 99% consistency bars. This means that the synthetic probabilistic forecast is almost perfectly reliable (of course, it is by definition).
Fig. 1 (a) Reliability diagram for case 1. Blue dots connected by lines depict the conditional expectation values of outcome. A perfect reliability curve is depicted by the diagonal dashed line, on which the 99% consistency bars are drawn as vertical dashes. Red histograms with the right axis show the number of probabilistic forecasts within bins. (b) PSS versus threshold probability for case 1. 
According to the condition derived in Section 2, PSS must be maximum at the threshold probability of 0.1, which is a climatological base rate. Figure 1b shows the variation of PSS versus the various threshold probabilities calculated using the synthetic forecast–outcome pairs. We can clearly see that PSS is maximum at around the climatological base rate.
3.3 Case 2: Maximize PSS at a probability different from the climatological base rate
In case 2, p _{1}(x) and p _{0}(x) are set as Be(x; 2.2, 0.4) and [10Be(x; 0.2, 0.4) − Be(x; 2.2, 0.4)]/9, respectively, for which PSS is maximum at the threshold probability of 0.5, which is largely different from the climatological base rate.
Figure 2b depicts the plot of PSS versus various threshold probabilities. The diagram shows that PSS is maximum at the threshold probability of around 0.5 (by definition), which is far from the climatological base rate.
Fig. 2 Same as Figure 1, but for case 2. (a) Reliability diagram for case 2. (b) PSS versus threshold probability for case 2. 
According to the condition mathematically derived in Section 2, the probabilistic forecast on case 2 must be unreliable. In the following, we will confirm that the forecast is unreliable by drawing the reliability diagram. Figure 2a shows a reliability diagram for case 2. The dots, lines, dashes, and histogram represent the same quantities as those in case 1. We can recognize from the figure that the conditional expectation values of the outcome are not on the perfect reliability line. This fact confirms that case 2 is an unreliable probabilistic forecast system.
3.4 Case 3: Maximize PSS at the climatological base rate but unreliable forecast
In case 3, p _{1}(x) and p _{0}(x) are set as Be(x; 0.83, 1.19) and [10Be(x; 0.23, 1.19) − Be(x; 0.83, 1.19)]/9, respectively, for which PSS is maximum at the threshold probability of the climatological base rate.
A plot of PSS versus various threshold probabilities is depicted in Figure 3b using a finite number of synthetic forecast–outcome pairs. The figure shows that PSS is maximum at around the climatological base rate (by definition). However, as proven in the previous section, because the maximization of PSS at the threshold probability of the climatological base rate is not sufficient condition for probabilistic forecast to be reliable, whether the probabilistic forecast is reliable should not be decided. To confirm this theoretically derived result, a reliability diagram for case 3 is drawn in Figure 3a. The dots, lines, dashes, and histogram represent the same quantities as those in case 1. Clearly, the conditional expectation values of the outcome do not follow a perfect reliability line. This fact shows that the probabilistic forecast is unreliable even if PSS is maximum at around the climatological base rate.
Fig. 3 Same as Figure 1, but for case 3. (a) Reliability diagram for case 3. (b) PSS versus threshold probability for case 3. 
3.5 Solar flare forecast models
As Barnes et al. (2016) plotted reliability diagrams and estimated threshold probabilities maximizing PSS^{2} for eleven solar flare forecast models, these results are used for confirming the validity of the condition derived in this study. Although they dealt with three event definitions, we refer to only one event definition (C1.0 or greater flare) because, as there were few flare event samples for other the two event definitions, the error bars for the reliability diagrams were large. In this subsection, the terms “table” and “figure” denote the table and figure that appeared in Barnes et al. (2016) unless explicitly stated.
Ten models out of eleven can forecast the events of C1.0 or greater flare, and were assessed for the events (figures 11, 12, 13, 17, 19, 20, 22, 23, 25, and 26). Climatological base rates for the ten models were shown in the tables 8, 9, 10, 12, 13, 14, 15, 16, 17, and 18, respectively. Reliability diagrams (top panels) for figures 12, 13, 15, 19, 20, and 22 show that the reliabilities for these models were relatively good (of course, no models has perfect reliability). According to the condition derived in Section 2, the threshold probabilities maximizing PSS for these models should be near the climatological base rate. As the threshold probabilities maximizing PSS are shown in the bottom panels of the figures, we refer to these values. The absolute values of difference between the climatological base rate and the threshold probability maximizing PSS for the relatively reliable forecast models were between 0.015 and 0.049, which shows that the threshold probabilities maximizing PSS were very close to the climatological base rates. On the other hand, the absolute values of difference between the climatological base rate and the threshold probability maximizing PSS for the models shown in figures 11, 23, 25, and 26 were 0.193, 0.268, 0.393, and 0.150, respectively, which meant that the threshold probabilities maximizing PSS were largely far from the climatological base rates. We clearly recognize from the figures 11, 23, 25, and 26 that the reliabilities for these models were relatively poor. This result shows that the model that has a threshold probability maximizing PSS far from a climatological base rate lacks reliability. These results are consistent with the mathematically derived condition. These results are summarized in Table 1 in this paper.
Summary of climatological base rate (s) and threshold probability maximizing PSS (p _{th}) appeared in Barnes et al. (2016).
From the examples shown in this section, it is confirmed that the maximization of PSS at a climatological base rate is a necessary, but not sufficient, condition for a reliable probabilistic forecast. We used beta distributions to describe the probability densities in the examples. However, we emphasize again that the confirmed result does not depend on the form of the probability density as shown in Section 3.5, so the result is quite general.
4 Discussion
The condition that PSS is maximum at a threshold probability of a climatological base rate is a necessary condition for a probabilistic forecast system to be reliable. That is, if the probabilistic forecast system is reliable, the PSS of the system is maximum at the threshold probability of, definitely, the climatological base rate. In other words, a probabilistic forecast system whose PSS is maximum at a largely different climatological base rate can never become reliable. This claim is very important for developers of probabilistic forecast systems. Those who want to develop a reliable probabilistic forecast system must adjust or train their system so that PSS is maximum at the threshold probability of the climatological base rate. Of course, the adjustment or training alone is not necessarily enough for a reliable system, because the condition is not a sufficient condition. However, if no adjustment or training is carried out, their system can never become reliable.
A joint probability density of forecast–outcome pairs can be factored into a conditional probability density and marginal probability density. In a distributionoriented forecast verification framework (Murphy & Winkler, 1987), two types of factorization are possible. One is a calibrationrefinement factorization, which is a factorization into the conditional probability density of observation given forecast (calibration distribution) and the marginal probability density of a forecast (refinement distribution). The other is a likelihoodbase rate factorization, which is a factorization into the conditional probability density of forecast given observation (likelihood distribution) and a marginal probability density of observation (base rate distribution). While an attribute of reliability is directly related with the calibration distribution, PSS is only related with the likelihood distribution, which implies that PSS can say nothing on reliability. That is, the completely different aspects of joint probability density are assessed on the basis of reliability and PSS. It is interesting that, despite this fact, a reliable probabilistic forecast is directly related with the maximization of PSS. The interesting question as to why the maximization of PSS at the threshold probability of a climatological base rate is related with the reliable probabilistic forecast system, can partly be accounted for by considering the factorization of the joint probability density. A combination of likelihood and base rate distributions can completely describe the joint probability density of forecast–outcome. This means that although the likelihood distribution alone cannot assess a calibration distribution, the combination of likelihood and base rate distributions can do so. Therefore, the combination of information of PSS and the climatological base rate is required for assessing information of reliability.
Some related literatures with this study have published in meteorological forecast verification. Richardson (2000) discussed a relative economic value of forecasts in the framework of a decisionanalytic models. He mentioned that the maximum relative economic value for a deterministic forecast was attained at the point where an user’s cost–loss ratio equals to a climatological base rate and was given by PSS^{3}. This meant that a maximum relative economic value for probabilistic forecast was given by a maximum PSS under the condition that an user’s cost–loss ratio equals to a climatological base rate. The fact that the relationship between a climatological base rate and maximum PSS appears in several kinds of situation for forecast verification is interesting. This point should be further investigated.
5 Conclusion
We mathematically derived a necessary condition for a probabilistic binary forecast to be reliable. The condition was maximizing a PSS at a threshold probability of a climatological base rate. The condition was confirmed by using artificially synthesized forecast–outcome pair data and several published probabilistic solar flare forecast models. An important point is that the condition is derived without assuming the form of the probability density function. This means that the condition generally holds. This condition is quite important for the developers of probabilistic forecast systems. When a reliable probabilistic binary forecast system is developed, the developer must adjust or train the system so as to maximize PSS at the threshold probability of the climatological base rate. The condition gives a partial answer as to why some probabilistic forecast systems lack reliability because the system that does not satisfy the condition can never be reliable.
Acknowledgments
I would like to thank Dr. KD. Leka for a valuable comment on the use of the data in Barnes et al. (2016). I also would like to thank two anonymous referees and editor for useful comment on the manuscript. The editor thanks two anonymous referees for their assistance in evaluating this paper. This work was supported partly by MEXT/JSPS KAKENHI Grant Number JP15H05813.
PSS was referred as H&KSS in Barnes et al. (2016).
PSS was referred as KS in Richardson (2000).
References
 Barnes G, Leka KD, Schrijver CJ, Colak T, Qahwaji R, et al. 2016. A comparison of flare forecasting methods. I. Results from the “allclear” workshop. Astrophys J 829: 89. DOI: 10.3847/0004637X/829/2/89. [NASA ADS] [CrossRef] [Google Scholar]
 Bloomfield DS, Higgins PA, James McAteer RT, Gallagher P. 2012. Toward reliable benchmarking of solar flare forecasting methods. Astrophys J 747: L41. DOI: 10.1088/20418205/747/L41. [NASA ADS] [CrossRef] [Google Scholar]
 Bobra MG, Couvidat S. 2015. Solar flare prediction using SDO/HMI vector magnetic field data with a machinelearning algorithm. Astrophys J 798: 135. DOI: 10.1088/0004637X/798/2/135. [NASA ADS] [CrossRef] [Google Scholar]
 Bröcker J, Smith LA. 2007. Increasing the reliability of reliability diagrams. Weather Forecast 22: 651. [CrossRef] [Google Scholar]
 Crown MD. 2012. Validation of the NOAA Space Weather Prediction Center’s solar flare forecasting lookup table and forecasterissued probabilities. Space Weather 10: S06006. DOI: 10.1029/2011SW000760. [CrossRef] [Google Scholar]
 Devos A, Verbeeck C, Robbrecht E. 2014. Verification of space weather forecasting at the Regional Warning Center in Belgium. J Space Weather Space Clim 4: A29. DOI: 10.1051/swsc/2014025. [CrossRef] [EDP Sciences] [Google Scholar]
 Falconer D, Barghouty AF, Khazanov I, Moore R. 2011. A tool for empirical forecasting of major flares, coronal mass ejections, and solar particle events from a proxy of activeregion free magnetic energy. Space Weather 9: S04003. DOI: 10.1029/2009SW000537. [NASA ADS] [CrossRef] [Google Scholar]
 Gneiting T, Balabdaoui F, Raftery AE. 2007. Probabilistic forecasts, calibration and sharpness. J R Statist Soc B 69: 243. [CrossRef] [Google Scholar]
 Huang X, Wang H, Xu L, Liu J, Li R, Dai X. 2018. Deep learning based solar flare forecasting model. I. Results for lineofsight magnetograms. Astrophys J 856: 7. DOI: 10.3847/15384357/aaae00. [CrossRef] [Google Scholar]
 Jolliffe IT, Stephenson DB. 2012. Forecast verification: A practitioner’s guide in atmospheric science, 2nd edn. John Wiley and Sons Ltd., Chichester, UK. [Google Scholar]
 Kubo Y, Den M, Ishii M. 2017. Verification of operational solar flare forecast: Case of Regional Warning Center Japan. J Space Weather Space Clim 7: A20. DOI: 10.1051/swsc/2017018. [CrossRef] [Google Scholar]
 Leka KD, Barnes G, Wagner E. 2018. The NWRA classification infrastructure: Description and extension to the Discriminant Analysis Flare Forecasting System (DAFFS). J Space Weather Space Clim 8: A25. DOI: 10.1051/swsc/2018004. [CrossRef] [Google Scholar]
 McCloskey AE, Gallagher PT, Bloomfield DS. 2016. Flaring rates and the evolution of sunspot group mcintosh classifications. Sol Phys 291: 1711. DOI: 10.1007/s112070160933y. [CrossRef] [Google Scholar]
 Muranushi T, Shibayama T, Muranushi YH, Isobe H, Nemoto S, Komazaki K, Shibata K. 2015. UFCORIN: A fully automated predictor of solar flares in GOES Xray flux. Space Weather 13: 778. DOI: 10.1002/2015SW001257. [CrossRef] [Google Scholar]
 Murphy AH. 1977. The value of climatological, categorical and probabilistic forecasts in the costloss ratio situation. Mon Weather Rev 105: 803. [CrossRef] [Google Scholar]
 Murphy AH. 1991. Forecast verification: Its complexity and dimensionality. Mon Weather Rev 119: 1590. [CrossRef] [Google Scholar]
 Murphy AH. 1993. What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather Forecast 8: 281. [CrossRef] [Google Scholar]
 Murphy AH, Winkler RL. 1987. A general framework for forecast verification. Mon Weather Rev 115: 1330. [CrossRef] [Google Scholar]
 Murray SA, Bingham S, Sharpe M, Jackson DR. 2017. Flare forecasting at the Met Office Space Weather Operations Centre. Space Weather 15: 577. DOI: 10.1002/2016SW001579. [CrossRef] [Google Scholar]
 Nishizuka N, Sugiura K, Kubo Y, Den M, Watari S, Ishii M. 2017. Solar flare prediction model with three machinelearning algorithms using ultraviolet brightening and vector magnetograms. Astrophys J 835: 156. DOI: 10.3847/15384357/835/2/156. [CrossRef] [Google Scholar]
 Nishizuka N, Sugiura K, Kubo Y, Den M, Ishii M. 2018. Deep Flare Net (DeFN) model for solar flare prediction. Astrophys J 858: 113. DOI: 10.3847/15384357/aab9a7. [CrossRef] [Google Scholar]
 Primo C, Ferro CAT, Jolliffe IT, Stephenson DB. 2009. Calibration of probabilistic forecasts of binary events. Mon Weather Rev 137: 1142. DOI: 10.1175/2008MWR2579.1. [CrossRef] [Google Scholar]
 Richardson DS. 2000. Skill and relative economic value of the ECMWF ensemble prediction system. Q J R Meteorol Soc 126: 649. [CrossRef] [Google Scholar]
 Steward G, Lobzin V, Cairns IH, Li B, Neudegg D. 2017. Automatic recognition of complex magnetic regions on the Sun in SDO magnetogram images and prediction of flares: Techniques and results for the revised flare prediction program Flarecast. Space Weather 15: 1151. DOI: 10.1002/2017SW001595. [CrossRef] [Google Scholar]
 Wheatland MS. 2005. A statistical solar flare forecast method. Space Weather 3: S07003. DOI: 10.1029/2004SW000131. [NASA ADS] [CrossRef] [Google Scholar]
 Zhu Y, Toth Z, Wobus R, Richardson D, Mylne K. 2002. The economic value of ensemblebased weather forecasting. Bull Am Meteorol Soc 83: 73. [CrossRef] [Google Scholar]
Appendix
The beta distribution is expressed as:
(A.1)where Γ shows a gamma function, and a and b are shape parameters. When the conditional probability density functions p_{1}(x) and p_{0}(x) are defined as:
(A.3)the theoretical reliability curve g(x) is derived as:
In case (1) mentioned in Section 3.2, the parameters are set as a = 1.1, b = 0.9, and β = 1, from which α = 1 is derived using the equation (A.4) when s = 0.1. Therefore, the theoretical reliability curve is g(x) = x, which means perfect reliability. PSS is maximum at the threshold probability of 0.1, which is a climatological base rate. In case (2) mentioned in Section 3.3, we set a = 2.2, b = 0.4, and β = 2, and α = 0.4 is derived using the equation (A.4) when s = 0.1. These parameters yield the theoretical reliability curve g(x) = 0.4x^{2}. The maximum PSS is realized for g(x) = s, that is, at a threshold probability of 0.5. In case (3) mentioned in Section 3.4, we set a = 0.83, b = 1.19, and β = 0.6, and α ≈ 0.398 ≈ s^{0.4} is derived using the equation (A.4) when s = 0.1. In this case, the theoretical reliability curve is g(x) = s^{0.4}x^{0.6}. PSS is maximum at x = s, that is, a threshold probability of 0.1.
Cite this article as: Kubo Y 2019. Why do some probabilistic forecasts lack reliability? J. Space Weather Space Clim. 9, A17.
All Tables
Summary of climatological base rate (s) and threshold probability maximizing PSS (p _{th}) appeared in Barnes et al. (2016).
All Figures
Fig. 1 (a) Reliability diagram for case 1. Blue dots connected by lines depict the conditional expectation values of outcome. A perfect reliability curve is depicted by the diagonal dashed line, on which the 99% consistency bars are drawn as vertical dashes. Red histograms with the right axis show the number of probabilistic forecasts within bins. (b) PSS versus threshold probability for case 1. 

In the text 
Fig. 2 Same as Figure 1, but for case 2. (a) Reliability diagram for case 2. (b) PSS versus threshold probability for case 2. 

In the text 
Fig. 3 Same as Figure 1, but for case 3. (a) Reliability diagram for case 3. (b) PSS versus threshold probability for case 3. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.