Parameters setup

Next: Interesting rules Up: Cross-validation Previous: Cross-validation

Parameters setup

The experimentations were carried out using Progol [5] and make use of a new feature introduced in version 4.4 which allows to estimate numerical parameters [3]. In order to select suitable parameters, noise, inflate as well as the number of negative examples were varied. Three different percentages of noise were sampled: 0, 10 and 20, three inflate rates: 100, 200 and 400, as well as three sets of negative examples having: $\times 1$ , $\times 2$ and $\times 4$ as much negative examples as positives. All combinations of parameters were tested on the most populated fold of each class: DNA-binding 3-helical bundle, Immunoglobulin-like $\beta$ -sandwich, $\beta/\alpha$ (TIM)-barrel and Ferredoxin-like. The combination: noise=20, inflate=200 and $\times 2$ gave the best result, it minimises the sum of the number of rules and number of remaining examples (examples which produce no compression). This combination is used through out our tests.

The number of nodes explored was set to 1000. This allowed us to keep the execution time for the whole cross-validation under two days using six of the twelve processors R10000 of our Silicon Graphics PowerChallenge computer. This number of nodes was reached for 33.87% of the runs (average=578, median=623). Finally, the parameter c, which controls the length of the clauses, was set to 9, however it was never reached.

Performance analyses were carried over cross-validation test sets . Two different tests were applied depending on how much data were available. If the total number of examples (positive + negative) was greater than 60 a 10-fold cross-validation test was applied otherwise a leave-one-out test was applied. In order to estimate the average value and standard error of each measure, we ran bootstrap studies [6]. For each fold we ran a cross-validation test, 10-fold or leave-one-out, gathered the information into a confusion matrix and ran 250 Monte Carlo iterations.

Next: Interesting rules Up: Cross-validation Previous: Cross-validation

Marcel Turcotte
1999-10-20