Issue
J. Space Weather Space Clim.
Volume 14, 2024
Topical Issue - CMEs, ICMEs, SEPs: Observational, Modelling, and Forecasting Advances
Article Number 6
Number of page(s) 14
DOI https://doi.org/10.1051/swsc/2024004
Published online 29 March 2024

© R. Mugatwala et al., Published by EDP Sciences 2024

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

ICMEs (Interplanetary Coronal Mass Ejections) are eruptions of plasma and magnetic fields from the Sun’s corona that propagate in the Heliosphere (Webb & Howard, 2012). These plasma and magnetic field structures are ejected from the Sun, travel through the interplanetary space environment, and reach the 1 AU range within 1–5 days (Chen, 2011). In in-situ data, ICMEs can be discerned from the average solar wind by their distinct signatures, such as the enhanced magnetic field, the higher particle speed, and the variations in plasma density (Liu et al., 2010; Papaioannou et al., 2016). They can also be observed remotely by using instruments such as coronagraphs (particularly SOHO/LASCO with coronagraphs C1/C2/C3 Brueckner et al., 1995; Domingo et al., 1995, STEREO/SECCHI with COR1/COR2 Kaiser et al., 2008; Howard et al., 2008), and Heliographic imagers (HI1/HI2) (Eyles et al., 2009).

ICMEs are among the main drivers of Space Weather, impacting the space environment and human technologies (Tsurutani et al., 1988; Gosling et al., 1991; Schwenn, 2006; Pulkkinen, 2007; Temmer, 2021). The plasma and magnetic fields ejected from the Sun can interact with Earth’s magnetic field, leading to geospace disturbances (Koskinen & Huttunen, 2007), which affect a wide range of technological systems in space, such as satellites, telecommunications, and the GNSS systems (Shea & Smart, 1998; Schrijver & Siscoe, 2010; Aquino & Sreeja, 2013; Piersanti et al., 2017). The present strategies to mitigate the effects of ICMEs on space-based technologies and infrastructures require the knowledge of the ICME arrival time with low uncertainty to allow operators to take action to protect their equipment, by shutting them down or putting them in a safe mode (Barbieri & Mahmot, 2004; Sreeja, 2016; Veettil et al., 2019).

In the last decades, space agencies have designed and launched missions to observe the Sun and monitor the solar wind characteristics, and track CMEs and ICMEs as they travel through space, with the aim to study their interactions with the interplanetary environment. Despite these advancements in space weather forecasting, accurately predicting the characteristics of ICMEs such as their Time-of-Arrival (ToA) and Speed-at-Arrival (SaA) at Earth, as well as the magnitude and direction of the southward component of their magnetic field (which is crucial for determining the intensity of geomagnetic storms Koskinen & Huttunen, 2006), remains a challenging task for the scientific community (Manchester et al., 2017; Riley et al., 2018; Vourlidas et al., 2019).

Following the evolution of numerical methods and the increase of available computational power, a number of empirical methods, physics-based analytical models, and MHD numerical simulations for the ICME kinematics have been developed. In the MHD approximation, the boundary conditions are derived from observed magnetograms and coronographic images and model the propagation of the ejecta by numerically solving the magneto-hydrodynamic equations (ENLIL, HAFv.2 (Hakamada-Akasofu-Fry version 2)+3DMHD, EUHFORIA (EUropean Heliospheric FORecasting Information Asset) Odstrcil et al., 2003; Wu et al., 2007; Pomoell & Poedts, 2018). These simulations allow for the inclusion and consideration of the physical processes being modelled. However, their use requires substantial computing resources due to their computationally intensive nature, making them expensive to run. The complete understanding of the physical processes involved in the Sun-Earth relation relies heavily on numerical modelling techniques. However, with present observation capabilities, the forecasting performance of empirical and analytical methods are comparable to, or in some cases slightly better than, those achieved with numerical methods due to uncertainties in the input parameters (Manchester et al., 2017). This implies that the existing empirical and analytical approaches are still effective and competitive in terms of their predictive capabilities and that the near future of space weather forecasting lies with the use of these computationally light approaches and Machine Learning (ML).

In general, analytical methods are computationally lighter and their parameters can be easily updated with new incoming data. Also, physics-based analytical models (e.g. Vršnak et al., 2013; Rollett et al., 2016; Paouris & Mavromichalaki, 2017; Napoletano et al., 2018) can shed light on the ICME dynamics, and this knowledge would possibly help us in refining also numerical methods. On the other hand, the relationships between ToA and SaA and various CME parameters measured at (or close to) their launch, have been used in empirical prediction methods (e.g. Manoharan, 2006; Gopalswamy, 2009), and most recently in a plethora of ML approaches. ML techniques have become more and more used in space weather, as recently reviewed in Camporeale (2019). In the last years, there have been many attempts to leverage ML algorithms to obtain the characteristics of an ICME at L1 from the associated CME observables (Bobra & Ilonidis, 2016; Liu et al., 2018; Wang et al., 2019, just to list a few). These ML algorithms use catalogues of CME/ICME characteristics for the training, in order to set their parameters, validate their results and check their performances. Consequently, it becomes more and more important to build CME/ICME datasets with a large number of events and small uncertainties (ML methods typically need numerous, relevant and reliable examples in the datasets in order to give accurate results VanderPlas et al., 2012; Ivezić et al., 2014).

In this paper, we present a method to update the catalogue of CME–ICME pairs published in Napoletano et al. (2022), by using a constrained Monte-Carlo strategy to validate its entries. The constrained Monte Carlo strategy allowed us to explore the parameter space in a more effective way. We then make use of this updated catalogue to revisit the Probability Distribution Functions (PDFs) to use for the P-DBM method (Napoletano et al., 2018; Del Moro et al., 2019). Finally, we present a comparison of these PDFs for different solar wind conditions and against previous literature.

The paper is organized as follows. Section 2 describes the DBM model and the mathematical methodologies to retrieve PDFs from the catalogue. In Section 3, we analyze the results of the inversion procedure and use them to relabel the CME/ICME catalogue entries and to obtain PDFs for different ICME types. Section 4 is dedicated to conclusions and discussions. The CME–ICME dataset compiled and used in this work can be found at https://zenodo.org/record/8063404 and a description of the different column headers is provided in Appendix A.

2 Methods

2.1 Drag-Based Model (DBM)

The Drag-Based Model is one of the simplest models that describes CME propagation through the heliosphere. Due to its simplicity and calculation speed, it is one of the most popular models used in CME forecast tools. In recent years, DBM has been used in many studies to describe CME propagation which is summarised in Dumbović et al. (2021). DBM is based on the assumption that the Lorentz force responsible for CME launch is negligible in the upper part of the solar corona, specifically beyond a heliocentric distance of 20 R. However, this assumption is not always valid; for many CME events the Lorentz force is still comparable with the drag force and the exact distance can vary from event to even Vršnak (2001), Vršnak et al. (2004), Sachdeva et al. (2015, 2017). Beyond this heliocentric distance, the dynamics of the ICME are dominantly governed by its interaction with the ambient solar wind via MHD drag (Cargill, 2004; Vršnak et al., 2013). Due to MHD drag force ICMEs that are faster (slower) than solar wind tend to decelerate (accelerate) during propagation, which was also supported by observations (Gopalswamy et al., 2000). CME radial acceleration according to the DBM approach is given as:

(1)

where a(r) and v(r)CME are the instantaneous acceleration and speed of ICME, respectively, w is the instantaneous ambient solar wind speed, and γ is the drag parameter that is also called drag efficiency. It is important to note that all the quantities in equation (1) are space and time-dependent. Also, beyond 20 R, γ and w may be approximated to be constant throughout the heliosphere (Cargill, 2004; Vršnak et al., 2013). Under such approximation, equation (1) can be solved analytically to obtain heliospheric distance and speed of ICME as a function of time (Vršnak et al., 2013):

(2)

(3)

where ± sign accounts for accelerated/decelerated CMEs i.e., plus for v0 > w and minus for v0 < w. Equations (2) and (3) give us the speed and distance as a function of CME propagation time from an initial distance (at t = 0) r0 and take-off speed v0. From those, one can determine the transit time t1AU and impact speed v1AU at 1AU.

2.2 DBM inversion procedure

DBM solution, as given in Vršnak et al. (2013), can be used to obtain the analytical values of free DBM parameters. If the ICME follows the DBM model, and if its boundary conditions, i.e, initial position r0, initial speed v0, ToA t1AU and impact speed v1AU are known, then the free parameters of the model, namely drag parameter γ and solar wind speed w, can be obtained via a mathematical inversion of the set of presented equations (2) and (3).

(4)

Equation (4) is solved numerically to obtain w, then equation (5) is used to directly compute γ:

(5)

2.3 Mathematical framework

In search for the unique distribution for the free DBM parameters, we applied the DBM inversion procedure to the existing dataset, published in the previous works of Napoletano et al. (2018, 2022). A comprehensive description of the dataset is provided in Appendix A, while the summary of a few particular quantities used in this study and associated results are tabulated in Table A.1. In the process of DBM inversion, we discovered that the majority of the CME events in the dataset lack analytical solutions for equations (4) and (5). This collided with our null hypothesis for the dataset: that the DBM assumptions (Vršnak et al., 2013) were generally valid for the propagation of most of the CME events in the list and that the large experimental uncertainties associated with the ICME quantities allowed for a parameter space large enough to find at least one γ-w couple to solve the equation system. The reason behind this discrepancy is that errors associated with the initial position (r0), target position (r1), transit time (t1AU), impact speed (v1) and initial speed (v0) were omitted in the inversion procedure. An alternative explanation could be that DBM is not properly describing the CME motion (e.g. w = constant is not a realistic approximation; CME–CME interaction is also possible).

However, for this study, we adhere to our null hypothesis and consider the possibility of including uncertainties for the measured CME properties. To incorporate the errors associated with those quantities, we adapted a pairwise selection approach. It is important to highlight that Napoletano et al. (2022) also employed a probabilistic approach in the inversion procedure to obtain w and γ. In order to do that, they assumed that [r0r1v0v1t1AU] follows a normal distribution and draws random samples, where the majority of samples are concentrated around the peak of the Gaussian curve. However, our pairwise approach allowed us to explore other parts of parameter space where less probable values exist. We have assumed that two parameters, r0 and r1, do not suffer any errors because their values are fixed. We took r0 = 20 R and, for r1 we have used the actual Sun-Earth Distance at a time when CME is at r0. The arrival speed of CME in the dataset is calculated as a mean of solar wind speed during a disturbance in plasma and therefore it has an associated intrinsic error. The error associated with arrival speed is relatively small compared to the initial speed and arrival time, therefore it is neglected in the study. The two remaining quantities with large errors are thus v0 and t1AU. Next, we made a pairwise selection of (t1AUv0) for each DBM inversion iteration from the normal distribution followed by both quantities where μ is the observed value (“Transit_Time” and “v_r” is taken as μ) and σ is an error associated with the observed quantity (“Transit_time_err” and “v_r_err” taken as σ). Here “v_r” is the deprojected speed of CME at 20 R taken from the Napoletano et al. (2022) and “v_r_err” is an error associated with deprojected speed. It is important to keep in mind that the tails of the normal distribution function are 3σ width. For a pairwise selection, we draw 200 samples for t1AU and the same for v0. So in the end, we have a total of 40,000 possible pairs. After this pair selection, we performed the DBM inversion to obtain values of w and γ, respectively.

The DBM inversion procedure is a Monte Carlo process and after the inversion procedure, we have 40,000 possible solutions for w and γ. Many of these values can not be physically feasible, for example, negative values of w. Vršnak et al. (2013) provided a brief description of γ in their work, and from there we deduced that the drag parameter γ has a relation with the mass and cross-sectional area of CME along with ambient solar wind density. In our primary analysis, we found that the inversion procedure also provides very high values for γ which can not be explained by the typical value range of CME features like mass, cross-sectional area and background solar wind density. Therefore, it is necessary to employ constraints on the values obtained through the inversion procedure. The constraints that we imposed on inversion values are given below.

  1. 0 ≤ w ≤ 1000 km/s

Solar wind speed cannot be negative and the typical speed for fast solar wind in literature is 800 km/s. It is worth noting that the condition of realistic solar wind speed in Paouris et al. (2021) is 300–600 km/s which is very narrow compared to us.

  1. 0.1 × 10−7 ≤ γ ≤ 3.0 × 10−7 km−1

It is important to note that the typical range for the γ parameter, in Vršnak et al. (2013), is 0.2 − 2.0 × 10−7, but we widen this range to accept a few more extreme solutions. Similarly, Paouris et al. (2021) has a range of 0.01 − 0.59 × 10−7 for realistic drag parameter but their obtained values are in the range of 0.21 − 0.42 × 10−7 (see Table 4 of Paouris et al., 2021) which is comparable to our range.

After this, we derive the four main quantities namely Wmean, γmean, Wopt and γopt from the accepted values of w and γ; the opt values correspond to the DBM input that produced the minimum deviation from the observed transit time. In order to evaluate the “goodness” of the inversion procedure, we define the “Acceptance Rate” as the ratio between the number of meaningful solutions to the total number of possible solutions, represented by the number of samples.

(6)

Here, m is the no. of solutions accepted after applying constrain and n is no. the of samples drawn from the t1AU and v0 distributions each. Figure 1 shows the flow diagram for the DBM inversion procedure that we implemented on the CME dataset and how the results of the inversion procedure are analysed.

thumbnail Figure 1

Schematic of DBM inversion procedure. The values of boundary conditions are fed into the equations (4) and (5) using a pairwise approach to obtain w and γ. The obtained values are checked for selection criteria. The accepted values are used to determine the solar wind condition, the most suitable PDF of model parameters and the CME labelling scheme.

3 Results

3.1 Inversion procedure results

The inversion procedure was performed on the entire CME–ICME pair catalogue and it turned out to be successful for 204 out of 213 events. At the end of the inversion procedure, we obtained 3,664,748 possible values of w and γ that enable us to provide a statistical distribution for them. Figure 2 illustrates, the (γ,w) phase space for the entire ICME dataset and there we can identify the predictive line for a few individual CMEs. From the DBM equation (1), one can easily notice that CMEs either accelerate or decelerate during their propagation. Based on these propagation conditions, we derived two different distributions divided into accelerated and decelerated CMEs. Furthermore, a free DBM parameter w can also be divided into two groups called the slow and fast solar wind, and therefore we can draw two more joint distributions based on solar wind speed conditions.

thumbnail Figure 2

Joint distribution of (γ,w) from the inversion procedure. Top Panel: (γ,w) Phase space for the whole dataset (3,644,748 values). Bottom left Panel: (γ,w) phase space for the dataset of accelerated CMEs (25,428 values). Bottom right Panel: (γ,w) phase space for the dataset of decelerated CMEs (3,619,320 values).

3.2 Determining the quality of inversion process for each CME event

It is important to note that, we claimed that the DBM inversion was successful for 204 events and therefore there should be 8,160,000 possible values of γ and w, which are more than double the numbers we have obtained from the DBM inversion procedure. This discrepancy results from either the DBM inversion procedure failing or the obtained values of (γ,w) being discarded as they did not fulfil the constraints. This can also be observed in the (γ,w) phase space of different CME events. Based on the density in the (γ,w) phase space, we label the event as “Optimal Fit”, “Suboptimal Fit” and “Inadequate Fit”. This labelling helps us to determine if the propagation of CME events in the dataset can be described by DBM. To stay consistent in the labelling procedure, we used the Acceptance Rate (AR) defined by equation (6). The description for the labels is as follows.

  1. Optimal (Nice) Fit: AR > 0.5; the DBM approximation is demonstrably accurate for this kind of CME event as the inversion procedure is successful for more than 50% of the pairs. Therefore, there is a very sharp trendline in (γ,w) phase space.

  2. Suboptimal (Poor) Fit: 0.25 ≤ AR ≤ 0.5; the DBM is moderately accurate as one can still see the trendline in (γ,w) phase space.

  3. Inadequate (Bad) Fit: AR < 0.25; the DBM approximation is less applicable for the events and it is hard to find the trend line in phase space.

Figure 3 shows the percentage of events in each assigned label. We want to stress here the fact that, this labelling scheme is a key point for the dataset that is created as a result of this work. This labelling helps us to determine which CME event in the dataset is described well by the DBM. Events flagged as “Suboptimal Fit” or “Inadequate Fit” require further investigation. Hereafter, we only focused on the Optimal Fit events to obtain the PDFs for w and γ as it helps to improve the PDFs. Eventually, these better statistics will lead to better accuracy in CME arrival forecasting.

thumbnail Figure 3

A pie chart showing a percentage of events in the “Optimal Fit”, “Suboptimal Fit” and “Inadequate Fit”.

3.3 Relabelling the solar wind condition

We found that there are only 28 CME events that are accelerating during propagation, these are around 13% of the entire dataset, therefore statistics for accelerating CMEs are not very well resolved. In order to find a distribution for the free DBM parameters we established a group of CME based on solar wind conditions. A dataset that is already obtained as a part of previous work of Napoletano et al. (2022) contains information about the solar wind speed type (see Appendix A Column: SW_type – S/F) based on the presence of Coronal Holes close to the source of a CME. The group of CMEs formed based on coronal hole presence data provides two completely overlapping distributions for fast and slow solar wind speeds. This difference is inconsistent with our expected knowledge of distinct solar and fast solar wind circumstances. The discrepancy arises from incorporating the presence of coronal holes in a calculation of PDF for solar wind speed without explicit consideration of solar wind speed itself. For many events, this connection between solar wind type and a coronal hole is wrong due to various reasons such as CME does not encounter a fast solar wind stream at all during propagation. This leads to the overlap of distribution, as DBM inversion provides a small value of w and the presence of coronal holes identifies that solar wind as fast solar wind. Furthermore, the large standard deviation makes the model unsuitable for precise and reliable real-time space weather forecasting applications. Therefore we redefine the solar wind type associated with each CME by threshold Wsim ≥ 500 km/s to discriminate the fast solar wind from the slow one. This threshold is similar to one that is used in Napoletano et al. (2018). In Figure 4 (γ,w) phase space is shown for the two “SW_type” and “Wind_type” solar wind labelling. One important point to note here is, that the tail part of any distribution in a negative region is due to the plotting style not due to the presence of any value. Also, from now onward we focus on this new labelling scheme for solar wind speed.

thumbnail Figure 4

(γw) Phase space in different solar wind speed condition labeling scheme. On the x-axis, W_sim shows solar wind speed obtained from a DBM inversion with a unit of km/s while on the y-axis drag (Gamma_sim) value obtained from the inversion procedure is shown on a unit scale of km−1.

3.4 PDF for solar wind speed

From the joint distribution shown in Figure 2, we can extract a distribution function for the solar wind speed w. Here, we have fitted Gaussian, Student-t and Lognormal functions to the distribution function as these three functions returned a better fit among different PDFs available in the distfit package (Taskesen, 2023). In Figure 5, the histogram obtained from the dataset and fitted PDFs are shown. Here we have considered the RSS (Residual Sum of Squares) value to determine which one is the best fit. In most cases, all 3 distribution functions show a similar RSS value which is clear from the figure as well. So, in the end, we decided to select the Gaussian distribution function for the solar wind speed w to be consistent with previous works, e.g., Dumbović et al. (2018, 2021), Napoletano et al. (2018, 2022).

thumbnail Figure 5

Probability distribution functions for solar wind speed w for accelerated and decelerated CMEs with a kernel density ρ on the y-axis. Left: w PDFs for accelerated CMEs with Optimal Fit label. Note that the normal and student-t distributions overlap with each other. Right: w PDFs for decelerated CMEs with Optimal Fit label. All three distribution functions overlap with each other. The overlapping of functions is evident through RSS values.

We categorized our dataset into Slow and Fast CMEs based on the ambient solar wind condition experienced by the CME during its propagation (using a threshold of 500 km/s to separate fast and slow solar wind conditions), as described before in this section, and attempted again to fit the same three distribution functions. Unlike the prior attempt, the fitting’s RSS value is not the same for slow and fast solar wind conditions. For slow CMEs, the “student-t” distribution describes the best PDF while for fast CMEs the lognormal function is the most suitable PDF. Here, we only emphasize the fact that “student-t” and “lognormal” distributions are the best fits and are strongly biased by the hard thresholding. In Figure 6 PDFs for the slow and fast solar wind conditions are shown. The parameters for the fitted distributions are reported in Table 1.

thumbnail Figure 6

Probability distribution functions for solar wind speed w for slow and fast CMEs. Left: w PDFs for slow CMEs with Optimal Fit label. Right: w PDFs for fast CMEs with Optimal Fit label. (In both cases the Normal and Student-t functions overlapped with each other).

Table 1

Parameters for the different functions used to model the solar wind speed distribution. For the Lognormal function, tabulated values can not be used directly as average and standard deviation. The transformation from the fitting parameters to values used in the model can be done by equation (B.4).

Paouris et al. (2021) studied the same 16 CME–ICME events from Dumbović et al. (2018) to compare the performance of the Effective Acceleration Model (EAM) with Drag Based Ensemble Model (DBEM). They have also performed the inversion technique to find optimal values of solar wind speed w and drag parameter γ. In Table 2, optimal values of w from different studies have been shown. It is important to note that the sample size employed in Napoletano et al. (2022) and this work is large, which helps to explain the higher value of the standard deviation.

Table 2

Optimal(mean) values for solar wind speed w from different studies.

3.5 PDF for drag parameter

For the drag parameters, we employed the same methods and distribution functions that we have used for the solar wind to infer the PDF. The RSS values obtained from the various fits are significantly different. The lognormal distribution consistently emerges as the best fit among various considered distribution functions throughout a wide range of cases. In Figure 7, distributions fitting for the accelerated and decelerated CMEs are shown, while in Figure 8 PDFs for slow and fast CMEs are shown. The fitting parameters for the different distribution functions of the drag parameter γ are tabulated in Table 3.

thumbnail Figure 7

Probability distribution functions for drag parameters γ for accelerated and decelerated CMEs. Left: w PDFs for accelerated CMEs with Optimal Fit label. Right: w PDFs for decelerated CMEs with Optimal Fit label.

thumbnail Figure 8

Probability distribution functions for drag parameters γ for slow and fast CMEs. Left: w PDFs for fast CMEs with Optimal Fit label. Right: w PDFs for slow CMEs with Optimal Fit label.

Table 3

Parameters for different PDFs used to model drag parameter distribution. For the Lognormal function, tabulated values can not be used directly as average and standard deviation. The transformation from the fitting parameters to values used in the model can be done by equation (B.4).

4 Discussion and conclusions

The CME–ICME pair published in Napoletano et al. (2022) is improved by the inclusion of the predicted DBM data, PDF fitting parameters, and various other significant variables e.g., CME arrival time and speed, Dst index, source location, Bz component, etc related to each CME–ICME occurrence. By quantifying the success rate of the DBM inversion procedure, we were able to identify a subset of CME–ICME pairs that are well described by the DBM during their heliospheric propagation and added to the dataset. For the space weather community, this kind of categorization can provide significant insight into the conditions that make the DBM forecast fail to predict the correct transit time for a CME event. On the other hand, for those CME events where the DBM forecast is accurate, it can contribute to providing information about the model parameters w and γ. Thus, all the CME–ICME entries that do not follow the DBM approximation deserve even further investigation, since we cannot tell if a “no solution”, a “suboptimal” or an “inadequate” label comes from a possible error in the initial CME–ICME association, a shortage in the geometrical description of the ICME, or something happening during the ICME propagation that cannot be described by the DBM (e.g. a CME–CME interaction). This, however, would require a thorough analysis of every single ICME and is beyond the scope of this work and may be the subject of a different work. The revised CME–ICME collection we are presenting also includes additional details such as the solar wind speed conditions experienced by propagating CME events, more parameters about the validation of the DBM hypothesis, and information about the acceleration or deceleration mechanisms during their propagation. The list of the improvements over the previous version published by Napoletano et al. (2022) is summarised in Table 5. The revised dataset compiled and used in this work has been published at https://zenodo.org/record/8063404 and a description of its columns is also provided in Appendix A.

Just as mentioned, the subset of events where the DBM approximation holds can be employed to extract the γ and w parameters of the DBM via a Monte Carlo-like inversion procedure. In this statistical study, we consider the uncertainties associated with the measure and the observation and incorporate them as input for the model and we only consider those CME events with more than 50% acceptance rate in the inversion procedure. The reason behind this criterion is to ensure that the CME propagation is modelled by DBM with enough confidence. It is important to highlight that the cone geometry of CME is not included in our calculation. We simply employ the 1-dimensional version of DBM in our calculation. Different versions of the DBM, that take into account the different cone geometries discussed in Dumbović et al. (2021), Schwenn (2006) can be seamlessly implemented in the calculation in the future by including additional free variables in the DBM inversion procedure. We have retrieved γ and w for 204 out of 213 ICMEs, which enables us to obtain robust statistics. The empirical PDF for the solar wind w is modelled using two separate distributions for slow and fast solar wind conditions respectively with a threshold value of w = 500 km/s for the fast solar wind. In Dumbović et al. (2018, 2021), Napoletano et al. (2018, 2022), a Gaussian distribution is assumed as input PDF for w. Here, we have used the threshold of w = 500 km/s for the fast solar wind speed, therefore, a normal distribution is no longer the ideal PDF. With this new threshold, the Student’s t-distribution is the best choice for most CME events. This latter finding is also supported by fitting PDFs for w in a single CME approach. In Figure 9, a histogram of the most suitable PDFs for w in individual CMEs approach is shown. Here, the Student’s t-distribution is strongly biased by the fact of hard thresholding and the RSS values of Student’s-t and normal distribution are fairly comparable, we therefore prefer the Gaussian PDF for the solar wind w. In our study, the mean value for slow solar wind speed is wslow = 370 km/s with a standard deviation of 88 km/s, which is comparable with Napoletano et al. (2022). While for fast solar wind speed, the mean value is wfast = 579 ± 68 km/s and this value is somewhat higher than Napoletano et al. (2022). Notably, the median values are 386 km/s for the slow solar wind speed and 547 km/s for the fast solar wind scenario, respectively. It is important to realize that these mean values are marginally different from the median values because some extreme values are accepted in the inversion procedure. For instance, low values of some accepted solar wind speeds shift the mean leftward for the slow solar wind, whereas high values cause the mean to move rightward for the fast solar wind speed. The comparison of values of solar wind speed w for different previous studies is tabulated in Table 2. Also, the most probable value of solar wind speed obtained in different situations like slow, fast, decelerating and accelerating is provided in Table 4.

thumbnail Figure 9

Histogram illustrating most suitable PDF for solar wind speed w and drag parameter γ in single CME approach. Within each PDF, various types of CME events are stratified and effectively stacked on top of one another.

Table 4

Statistical values for all CME events flagged as “Optimal Fit” analyzed together to obtain drag parameter γ and ambient solar wind speed w using DBM inversion procedure.

The PDF for γ is up for discussion from the previous works of Čalogović et al. (2021), Dumbović et al. (2018, 2021) and Napoletano et al. (2018, 2022). One group employs a lognormal function, while the other group uses a Gaussian Function as input PDF. We have tried to fit the PDF on the entire dataset and single CME events, and our study has provided light on the preference for these two different functions. From Table 3, it is clear that lognormal distribution is the most favourable PDF as the RSS value is lower among other PDFs. On the contrary, when searching for the most suitable PDF in the single CME approach, the Gaussian PDF seems to be the best. In Figure 9, a histogram of the most suitable PDFs for γ in individual CME approaches is shown. A possible reason behind this discrepancy is the extensive dataset. Here, we have used a vast dataset of CMEs, which covers different ranges of mass and cross-sections of CME, also including almost two solar cycles’ length of CME events resulting in several kinds of solar wind density fluctuations in CME propagation. The inclusion of all these background parameters in fitting a PDF through a dataset leads to the long-tailed lognormal function since the γ-parameter is a quantitative measure of the drag efficiency that depends on many factors such as the mass and the cross-section of the CME, and on the solar wind density (Vršnak et al., 2013). The values of drag parameters obtained in this study are quite higher compared to Čalogović et al. (2021), Dumbović et al. (2018, 2021). These higher values are the result of a long-tailed lognormal function. The statistical measurements for the values of drag parameter γ are tabulated in Table 4. It is important to note that one can not use the values mentioned in Table 4 directly as model input to predict CME arrival time and speed. To determine the CME arrival time and speed using DBM one has to use values provided in Table 3 using a transformation described in Appendix B.

The refined dataset and the updated method presented in this work allowed us to explore a larger part of the wγ parameter space of the P-DBM model, including extreme values. We have investigated the possibility of γ being a function of the ICME kinematic properties (i.e., accelerating or decelerating) or the solar wind properties (i.e., fast or slow). While there seems to be some difference between accelerating or decelerating ICMEs (see Table 3 and Fig. 7), the statistics need to be more robust to draw strong conclusions. Therefore, similar efforts have been carried out for solar wind speed. The note-worthy iteration over a work of Napoletano et al. (2022) is to redetermine the solar wind type associated with a CME propagation in the heliosphere. Using 500 km/s as a threshold for the fast solar wind speed, we were able to infer distinguishable PDFs for fast and slow solar wind speed.

We suggest that the space weather community will benefit from our findings, especially the improved list of CME–ICME since it will provide a test bench to compare how well we can predict CME arrival time and impact. Also, the information associated with every CME ICME entry can help improve the accuracy and precision of other CME propagation models, by including other relevant parameters. For example, we want to use the Markov Chain Monte Carlo (MCMC) method in the future to constrain the PDF for w and γ. This catalogue’s new entries are expected to play a relevant part in this work, promoting the convergence of Markov chains and boosting the performance of our strategy.

Acknowledgments

This research work has been a part of the Space Weather Awareness Training NETwork (SWATNet) project. SWATNet has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 955620. This research has been also carried out in the framework of the CAESAR project, supported by the Italian Space Agency and the National Institute of Astrophysics through the ASI-INAF n.2020-35-HH.0 agreement for the development of the ASPIS prototype of the scientific data centre for Space Weather. This research has received financial support from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 824135 (SOLARNET). E.C. was partially supported by NASA grants 80NSSC20K1580 “Ensemble Learning for Accurate and Reliable Uncertainty Quantification” and 80NSSC20K1275 “Global Evolution and Local Dynamics of the Kinetic Solar Wind”. R.E. is grateful to STFC (UK, grant No. ST/M000826/1), NKFIH OTKA (Hungary, grant No. K142987), and the Royal Society. D.D.M. is grateful to the Italian Space Weather Community (SWICo). R.F. acknowledges support from the project “EVENTFUL” (ANR-20-CE30-0011), funded by the French “Agence Nationale de la Recherche” – ANR through the program AAPG-2020. The editor thanks two anonymous reviewers for their assistance in evaluating this paper.

Data availability statement

The ICME catalogue built as a part of this work along with data visualisation and PDF analysis modules for implementing the DBM inversion procedure can be downloaded from https://zenodo.org/record/8063404 (Mugatwala et al., 2023).

References

Appendix A

Description of revised dataset

As mentioned above, the DBM inversion procedure requires initial position r0, target position r1AU, transit time t1AU, initial speed v0 and arrival speed va to obtain w and γ. For the purpose of this work, we have used the CME–ICME dataset from Napoletano et al. (2022). This dataset contains all the required input quantities for the DBM inversion procedure. This dataset consists of 213 CME–ICME pairs from the year 1997 to 2018, which cover a time span of two solar cycles 23 and 24. In this dataset, information about the kinematic properties of CMEs at launch time was retrieved from the SOHO/LASCO CME Catalog1. While arrival time and speed of the related ICMEs have been obtained from the Richardson & Cane (2010).

As mentioned in Section 2.2, the uncertainties associated with different quantities are included in the inversion procedure. SOHO/LASCO catalogue provides CME speed in the plane of sky (POS) but to make a DBM forecast more accurate de projected speed has been used in the calculation. De projected radial speed has been obtained using equation (1) of Gopalswamy (2009). A more detailed explanation is given in Appendix A2 of Napoletano et al. (2022). Associated solar wind speed type (column: SW_type) for each event is hypothesized by determining the presence of a coronal hole close to the CME source region (see Appendix A3 of Napoletano et al., 2022). The description of different columns in the dataset and their source work is provided in a Table A.1.

Table A.1

Column description of the ICME dataset created as a part of this work.

Appendix B

Mathematical description of lognormal distribution

To find the parameters of lognormal PDF we have used a Python package named distfit Taskesen (2023) which relies on SciPy Virtanen et al. (2020). The standardized form of the lognormal function is given as:

(B.1)

To shift and/or scale the above distribution function, SciPy or distfit use two more input parameters namely loc and scale. With these 2 more parameters, the new function will be:

(B.2)

where . Suppose, a variable X is following a normal distribution with parameters μ and σ. Then, lognormally distributed variable Y = exp(X) has μ = ln(scale) and σ = s. The simplified version of formula (B.2) is given as follow:

(B.3)

while a lognormal function used by Napoletano et al. (2018) is

(B.4)


Cite this article as: Mugatwala R, Chierichini S, Francisco G, Napoletano G, Foldes R, et al. 2024. A catalogue of observed geo-effective CME/ICME characteristics. J. Space Weather Space Clim. 14, 6. https://doi.org/10.1051/swsc/2024004.

All Tables

Table 1

Parameters for the different functions used to model the solar wind speed distribution. For the Lognormal function, tabulated values can not be used directly as average and standard deviation. The transformation from the fitting parameters to values used in the model can be done by equation (B.4).

Table 2

Optimal(mean) values for solar wind speed w from different studies.

Table 3

Parameters for different PDFs used to model drag parameter distribution. For the Lognormal function, tabulated values can not be used directly as average and standard deviation. The transformation from the fitting parameters to values used in the model can be done by equation (B.4).

Table 4

Statistical values for all CME events flagged as “Optimal Fit” analyzed together to obtain drag parameter γ and ambient solar wind speed w using DBM inversion procedure.

Table A.1

Column description of the ICME dataset created as a part of this work.

All Figures

thumbnail Figure 1

Schematic of DBM inversion procedure. The values of boundary conditions are fed into the equations (4) and (5) using a pairwise approach to obtain w and γ. The obtained values are checked for selection criteria. The accepted values are used to determine the solar wind condition, the most suitable PDF of model parameters and the CME labelling scheme.

In the text
thumbnail Figure 2

Joint distribution of (γ,w) from the inversion procedure. Top Panel: (γ,w) Phase space for the whole dataset (3,644,748 values). Bottom left Panel: (γ,w) phase space for the dataset of accelerated CMEs (25,428 values). Bottom right Panel: (γ,w) phase space for the dataset of decelerated CMEs (3,619,320 values).

In the text
thumbnail Figure 3

A pie chart showing a percentage of events in the “Optimal Fit”, “Suboptimal Fit” and “Inadequate Fit”.

In the text
thumbnail Figure 4

(γw) Phase space in different solar wind speed condition labeling scheme. On the x-axis, W_sim shows solar wind speed obtained from a DBM inversion with a unit of km/s while on the y-axis drag (Gamma_sim) value obtained from the inversion procedure is shown on a unit scale of km−1.

In the text
thumbnail Figure 5

Probability distribution functions for solar wind speed w for accelerated and decelerated CMEs with a kernel density ρ on the y-axis. Left: w PDFs for accelerated CMEs with Optimal Fit label. Note that the normal and student-t distributions overlap with each other. Right: w PDFs for decelerated CMEs with Optimal Fit label. All three distribution functions overlap with each other. The overlapping of functions is evident through RSS values.

In the text
thumbnail Figure 6

Probability distribution functions for solar wind speed w for slow and fast CMEs. Left: w PDFs for slow CMEs with Optimal Fit label. Right: w PDFs for fast CMEs with Optimal Fit label. (In both cases the Normal and Student-t functions overlapped with each other).

In the text
thumbnail Figure 7

Probability distribution functions for drag parameters γ for accelerated and decelerated CMEs. Left: w PDFs for accelerated CMEs with Optimal Fit label. Right: w PDFs for decelerated CMEs with Optimal Fit label.

In the text
thumbnail Figure 8

Probability distribution functions for drag parameters γ for slow and fast CMEs. Left: w PDFs for fast CMEs with Optimal Fit label. Right: w PDFs for slow CMEs with Optimal Fit label.

In the text
thumbnail Figure 9

Histogram illustrating most suitable PDF for solar wind speed w and drag parameter γ in single CME approach. Within each PDF, various types of CME events are stratified and effectively stacked on top of one another.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.