next up previous contents
Next: Parameters setup Up: Results Previous: Results

Cross-validation

Rules were learnt to recognise members of a fold from negative examples of the same class. The experiment is further refined with use of integrity constraints which ensure that every rule considered contains at least one of the following predicate, unit_len, unit_aveh, unit_hmom, coil or has_pro, this adds complexity and richness to the rules as judged by our knowledge of protein structure. Knowing that the dataset was constructed with twice as many negative examples as positives, one could devise a rule which would rejects every examples, fold(X,_) :- fail, and produce 66% overall accuracy. Where accuracy is defined as the sum of the number of true positive and true negative over the total number of cases. The cross-validated overall accuracy in our test is 74.75%, which is statistically better (t-test at %99.0 confidence level), see Table 1. The overall accuracy is slightly higher for folds of the all-$\alpha$ and all-$\beta$ classes; these proteins are in general smaller and less complex than those of the $\alpha/\beta$ and $\alpha+\beta$ classes.


 

 
Table 1: The 20 folds used through out this work. Dom indicates the number of domains, i.e. the number of positive examples, Super is the number of super-families, Fam is the number of families, Label is the label used in subsequent tables and, finally, Fold is the name of the fold as defined in SCOP database.

Super Fam Dom AccErr Fold

All-$\alpha$:
 
  139 210 111    other folds (92)
  4 17 30 81.92 $\pm$ 3.15 DNA-binding 3-helical bundle
  2 7 14 68.48 $\pm$ 5.10 EF Hand-like
  1 2 13 94.56 $\pm$ 2.54 Globin-like
  1 3 10 73.13 $\pm$ 5.67 4-helical cytokines
  1 3 10 63.37 $\pm$ 5.95 $\lambda$ repressor-like DNA-binding domains
        76.29 $\pm$ 10.99 average
All-$\beta$:  
  123 220 90    other folds (56)
  8 12 45 71.07 $\pm$ 2.85 Immunoglobulin-like beta-sandwich
  1 4 21 81.47 $\pm$ 3.58 Trypsin-like serine proteases
  4 11 20 76.92 $\pm$ 3.99 OB-fold
  6 7 16 76.53 $\pm$ 4.52 SH3-like barrel
  1 2 14 78.50 $\pm$ 3.97 Lipocalins
        76.90 $\pm$ 3.39 average
$\alpha/\beta$:  
  131 200 88    other folds (70)
  17 28 55 66.14 $\pm$ 2.61 $\beta/\alpha$ (TIM)-barrel
  1 7 21 78.47 $\pm$ 3.69 NAD(P)-binding Rossmann-fold domains
  1 4 14 81.21 $\pm$ 4.10 P-loop containing nucleotide triphosphate hydrolases
  1 2 13 62.94 $\pm$ 5.82 Periplasmic binding protein-like II
  1 10 12 75.08 $\pm$ 4.80 $\alpha/\beta$-Hydrolases
        72.77 $\pm$ 7.07 average
$\alpha+\beta$:  
  158 240 113    other folds (96)
  17 21 26 80.38 $\pm$ 3.40 Ferredoxin-like
  2 8 13 56.30 $\pm$ 5.71 Zincin-like
  1 1 13 79.38 $\pm$ 4.39 SH2-like
  6 6 12 63.56 $\pm$ 5.79 beta-Grasp
  1 1 9 85.63 $\pm$ 4.42 Interleukin 8-like chemokines
        73.05 $\pm$ 11.16 average
        74.75 $\pm$ 8.95 overall




 
next up previous contents
Next: Parameters setup Up: Results Previous: Results
Marcel Turcotte
1999-10-20