published in:Studia geophysicae et geophysica et geodaetica, vol 39 (1995), No.1, p.84-100
language: English; includes 6 figures and 1 table


The Binary Decision Tree: The Growing Algorithm and Application to Thunderstorm Forecasting


Martin Dubrovsky
Institute of Atmospheric Physics, Hradec Kralove Czech Republic

www.ufa.cas.cz/dub/dub@htm  


ABSTRACT:
The paper deals with the probabilistic prediction of event occurrence with use of the binary decision tree which is grown from the learning sample. The tree growing algorithm consists in recursive partition of the predictor space by either single-predictor-based (SP) splits or by hyperplanes perpendicular to the best linear discriminant function (BLDF), and is intended to maximally effectively discriminate the elements of the learning sample with event occurrence from the elements without event occurrence. The predictand is the thunderstorm occurrence in the afternoon in Prague, the set of predictors includes variables derived from a midday single-station TEMP-A data (Perfect Prog approach), persistence predictors and predictors related to passages of the fronts across Prague. The experiments are designed to test the performance of the tree growing algorithm - with a stress upon indeterminateness following from the limited size of the learning sample - and to evaluate the predictive potential of the predictors for thunderstorm forecasting. The stability of the tree structure, the optimal size of the tree and the related prognostic skill score increase with increasing size of the learning sample. Employment of the BLDF splits allows quicker and more effective partition of the predictor space on the assumption that the predictor vector has lower dimension and is `well behaved' (preferably normally distributed). The stability indices of Faust, Showalter and Adedokun were found to be the most effective predictors. Persistence and frontal predictors only slightly contribute to the total prediction skill of the decision tree. The optimally sized tree has only five splitting nodes and employs three thermodynamical predictors, one frontal and one persistence predictor.


Figures:


Figure 3. The optimal binary decision tree with single-predictor-based splits developed in the Perfect Prog approach (the tree was built from the learning sample with values of predictors being derived from the noon aerological soundings). The tree estimates the probability of thunderstorm occurrence in Prague in the afternoon. The horizontal position of each node is proportional to the conditional probability of thunderstorm occurrence related to the node. The terminal nodes provide prognostic probability in terms of the fraction of the total number of elements falling into the terminal node (denominator) and the number of elements with event occurrence (numerator).
Predictors: SICP = modified Showalter index, FI = Faust index, POSSFC = energy released by a surface parcel during buoyant rise beyond the level of free convection, PERS = number of stations in Bohemia reporting TS occurrence within a 24-h interval ending at 06 GMT, F<12,18> = (0 or 1) passage of the front (cold or occluded) across Prague within <12,18> GMT.