Bogofilter Training: Comparing Full Training with Training-on-error, part 2

Introduction and general description:

There are two possible ways to train bogofilter starting from scratch, assuming that corpora of spam and nonspam messages have been accumulated for the purpose.  One is simply to run
bogofilter -s < spam_corpus
bogofilter -n < nonspam_corpus
which registers every message in both corpora.  The other extreme is to train on error: messages are fed to the classifier in random order, and whenever a classification is wrong or uncertain, the message in question is used for training before the next message is classified.

I run bogofilter with the Robinson-Fisher method of calculation (described in Appendix A below), in which values of a guess a priori and a weight parameter for that guess are used in calculating individual token-probability estimates, and Fisher's method of combining probabilities is applied in calculation of the overall message "spamicity."  In another experiment, I had investigated three things about training for this approach:

  1. Does full training or on-error training lead to fewer classification errors?
  2. Does the number of classification errors diminish if more than one round of on-error training is performed?
  3. Does the number of classification errors diminish if a round of on-error training is performed after full training?

It seemed as though full training gave fewer, but not far fewer, classification errors than on-error training; however, not many rounds of training could be performed in that experiment.  It seemed worthwhile to conduct a second test with more messages and more rounds of training. Anecdotal reports were in circulation that suggested full training should be used until the training database was of a respectable size (whatever that might mean), and thereafter, training-on-error could be effective.  I myself had gained the impression that this might be true.  I therefore wanted, in this second test, not only to compare pure full training with pure training on error, but also to see if switching from full to error after several rounds might be beneficial.

Procedure and Results:

For this experiment I used a corpus of 32,070 nonspams and 21,170 spam emails.  The nonspams and spams were "dealt" out into ten pairs of mailboxes.  Three runs were performed; the pairs of mailboxes (spam and nonspam) were chosen in a different random sequence each time.  Pairs were selected without replacement, so that each of the ten pairs was used exactly once in each run.

A run consisted of ten rounds.  In round 0 only training was performed, and in round nine only testing.  In each remaining round, the spam and nonspam files for that round were classified with the training databases from the preceding round.  After classification, the same files were used to train the databases further.

Three training databases were built.  Two of them, called "full" and "error", were started empty.  In each round, the "full" database was fully trained, as the name implies, and the "error" database was trained on error only.

After round 2's training was complete, the "full" database, which had by that point been trained with 6,351 spams and 9,621 nonspams, was copied to create a new database called "half."  In each of rounds 3 through 8, the "half" database was trained on error.

The training methods, for full training and for training on error, were briefly described in the foregoing section.  For training on error, I wrote a script called randomtrain that produces a list of messages, in random order, with flags to indicate whether each message is spam or nonspam, and then uses the list to feed messages to bogofilter for classification and, when needed, for training.

Comparison of training methods: classification errors

The first table shows the percentages of misclassifications in each round from 1 to 9, for full training and for error training.  95% confidence limits are shown for each mean percentage error.


  round meanfullpc flcl95 fucl95 meanerrorpc elcl95 eucl95
1     1       8.55   7.61   9.48       12.89  11.95  13.82
2     2       7.26   6.33   8.20       11.60  10.66  12.53
3     3       6.65   5.71   7.59       10.69   9.75  11.62
4     4       6.32   5.38   7.25        9.07   8.14  10.01
5     5       5.90   4.96   6.83        8.53   7.60   9.47
6     6       5.87   4.94   6.81        8.06   7.12   8.99
7     7       5.75   4.82   6.69        7.88   6.94   8.81
8     8       5.50   4.57   6.44        7.34   6.41   8.28
9     9       5.49   4.55   6.43        7.26   6.33   8.20

Full training does seem to be superior to training-on-error; however, as the number of messages used in training increases, the error rates for the two training methods appear to be converging.  As might be expected, the effect on the error rate seems to diminish from round to round; full training beyond round 5 (10,585 spams and 16,035 nonspams) had relatively little effect on the error rate.

The next table covers rounds 4 to 9 and adds data for the mixed training method (three rounds of full training followed by a switch to training-on-error):

  round meanfull flcl95 fucl95 meanerror elcl95 eucl95 meanhalf hlcl95 hucl95
1     4     6.32   5.80   6.83      9.07   8.56   9.59     6.34   5.83   6.86
2     5     5.90   5.38   6.41      8.53   8.02   9.05     6.04   5.53   6.56
3     6     5.87   5.36   6.39      8.06   7.54   8.57     5.99   5.47   6.50
4     7     5.75   5.24   6.27      7.88   7.36   8.39     6.13   5.61   6.65
5     8     5.50   4.99   6.02      7.34   6.83   7.86     5.75   5.23   6.26
6     9     5.49   4.97   6.01      7.26   6.75   7.78     5.74   5.23   6.26

These results are plotted on the left-hand graph below.  It doesn't seem that switching to training-on-error after a period of full training leads to better results than are obtained by continuing to train fully.  (The error and full data points are slightly offset to the left and right respectively so that all of the confidence limits, indicated by the vertical bars, can be seen.)  It does appear, however, that switching to training-on-error is almost as effective as continuing to train fully; the difference between the full-training results (black) and the results obtained after switching (blue) is small, and lies within the limits of experimental error.

graphs

The classification error rate is also reflected in the number of messages used in training-on-error at each round.  The right-hand graph above, and the following table, show that the advantage in error rate gained by beginning with full training is preserved during further rounds of training-on-error, though the difference diminishes as more rounds of training are performed.  (To permit displaying the data with greater resolution, the graph of differences (the black line) has been offset by +4.5 on the y axis.)


  round errorreg elcl95 eucl95 halfreg hlcl95 hucl95
1     3     16.0   15.1   16.8    12.6  11.67   13.4
2     4     14.9   14.0   15.8    12.2  11.35   13.1
3     5     13.3   12.5   14.2    11.2  10.33   12.1
4     6     12.7   11.9   13.6    10.9   9.97   11.7
5     7     11.9   11.0   12.8    10.5   9.57   11.3
6     8     11.4   10.5   12.3    10.0   9.15   10.9

Conclusions:

  1. The Fisher-based method of calculation worked better with full training than with training-on-error when the training database was small, and the error rates tended to converge as the training database grew larger.  This would be expected, since the training vocabularies produced by the two methods should become more similar as the database grows.
  2. When a training database of 6,351 spams and 9,621 nonspams was built by full training and then maintained thereafter by training-on-error, the error rate remained similar to that obtained by continuing to train fully.  From the lefthand graph above it would appear that it might have been better to switch after four rounds rather than three; that is, with full training on about 8,500 spams and 12,800 nonspams.  One is tempted to propose a general rule-of-thumb target of 10,000 of each as the suitable point to switch over.  (The advantage in switching lies in the fact that all messages used in training need to be reviewed by a human to confirm their classification as spam or nonspam; training on error uses a much smaller number of messages, and correspondingly reduces the human effort needed to keep the training database current.)

Appendix A: Calculation methods:

Robinson's approach involves calculating a value of f(w) for each unique token in a message, based on the numbers of times that the token has been encountered in spam during training (badcount) and in nonspam (goodcount), scaled by the numbers of messages used to build the training database (bad- or goodlist_messagecount):
                     (badcount/badlist_messagecount)
p(w) = -----------------------------------------------------------------
       (badcount/badlist_messagecount + goodcount/goodlist_messagecount)

n = badcount + goodcount

f(w) = (s * x + n * p(w)) / (s + n)
An alternative used in some implementations, that gives the same result, is:
scalefactor = badlist_messagecount / goodlist_messagecount

f(w) = (s * x + badcount) / (s + badcount + goodcount * scalefactor)
The scale factor is the ratio of the number of messages used to make up the list of spam tokens to the number of messages used to make up the list of nonspam tokens.  The f(w) value to use when an unknown token is encountered is represented by x, and the degree to which x should have weight in the calculation of f(w) when a token has been seen only a few times before is determined by parameter s.  The number of unique tokens in the message is represented below by N.  Not obvious, but implicit in the second formula, is the replacement of n by n' = badcount + goodount * scalefactor; this should have a negligible effect, unless the training set is extremely lopsided.

Fisher's method uses an inverse chi-squared function, prbx, to get the probability associated with -2 times the sum of the logs of f(w) with 2N degrees of freedom:

P = prbx(-2 * sum(ln(1-f(w))), 2*N)
Q = prbx(-2 * sum(ln(f(w))), 2*N)
S = (1 + Q - P) / 2

Appendix B: Notes on experimental procedure:

The following notes should suffice if anyone wishes to repeat the experiment:

Corpus consists of 21170 spams and 32070 nonspams:

mutt
 32070 kept, 0 deleted.
mutt
 21170 kept, 0 deleted.
grep -c '^From ' agg*
 aggregate.bad:21170
 aggregate.good:32070

The spams and nonspams were "dealt" out into ten files each:

cat ~/bin/tenths
 #! /bin/sh
 let n=${FILENO}%10
 fname=cgx-$n
 cat >>$fname

FILENO=0 formail -s ~/bin/tenths < aggregate.good
rename gx ns cgx*
FILENO=0 formail -s ~/bin/tenths < aggregate.bad
rename gx sp cgx*

The files were moved into a separate directory and subdirectories were
created for the bogofilter training databases:

mkdir train10-2
mv csp* cns* train10
cd train10-2
mkdir full half error

Random sequences were created by shuffling 0-9 with this little perl
script:

cat /usr/local/bin/shuffle
 #! /usr/bin/perl
 #  shuffle -- echo stdin lines in a random order
 srand ( time() ^ ($$ + ($$ << 15)) );
 foreach $key (<>) {
     $shuf{$key} = rand;
 }
 foreach $key (sort { $shuf{$b} <=> $shuf{$a} } keys %shuf ) {
     print $key;
 }

We do three runs of ten rounds.  In round 0 we do training only, and in
round 9 testing only; otherwise, we first test the spam and nonspam
files for the current round against the training dbs from the preceding
round, and then train with the files just tested.  After round 2 we
train the "half" db on error, and after round 3 we test against that db
in addition to the other two:

cat runex
 #! /bin/bash
 seq 0 9 >sequence
 fmbf="formail -s /usr/bin/bogofilter"

 for run in 0 1 2; do
   echo "run $run"
   file=( `shuffle sequence` )
   /bin/rm full/* error/* half/*
   for round in 0 1 2 3 4 5 6 7 8 9; do
     fnam=${file[$round]}
     echo "round $round, files $fnam"
     if [ $round -gt 0 ]; then
       for method in full error; do
         $fmbf -d $method -v < csp-$fnam &> sp-$method-$run-$round
         $fmbf -d $method -v < cns-$fnam &> ns-$method-$run-$round
       done
     fi
     if [ $round -gt 3 ]; then
       $fmbf -d half -v < csp-$fnam &> sp-half-$run-$round
       $fmbf -d half -v < cns-$fnam &> ns-half-$run-$round
     fi
     if [ $round -lt 9 ]; then
       /usr/bin/bogofilter -d full -v -n < cns-$fnam
       /usr/bin/bogofilter -d full -v -s < csp-$fnam
       randomtrain error -n cns-$fnam -s csp-$fnam
       if [ $round -eq 2 ]; then
         cp full/* half
       fi
       if [ $round -gt 2 ]; then
         randomtrain half -n cns-$fnam -s csp-$fnam
       fi
     fi
   done
 done

output went to nohup.out which was edited to change \r to $
then
sed 's/.*\$//' nohup.out >runex.log

cat runex.log

run 0
round 0, files 0
# 1368656 words, 3207 messages
# 1289609 words, 2117 messages
error
 spam  reg   good  reg
 2117  889   3207  882
round 1, files 2
# 1310658 words, 3207 messages
# 1305731 words, 2117 messages
error
 spam  reg   good  reg
 2117  674   3207  495
round 2, files 4
# 1548414 words, 3207 messages
# 1299363 words, 2117 messages
error
 spam  reg   good  reg
 2117  571   3207  400
round 3, files 1
# 1435450 words, 3207 messages
# 1444462 words, 2117 messages
error
 spam  reg   good  reg
 2117  490   3207  300
half
 spam  reg   good  reg
 2117  249   3207  377
round 4, files 8
# 1521052 words, 3207 messages
# 1326994 words, 2117 messages
error
 spam  reg   good  reg
 2117  430   3207  380
half
 spam  reg   good  reg
 2117  257   3207  395
round 5, files 9
# 1604435 words, 3207 messages
# 1475321 words, 2117 messages
error
 spam  reg   good  reg
 2117  390   3207  288
half
 spam  reg   good  reg
 2117  244   3207  337
round 6, files 6
# 1824156 words, 3207 messages
# 1271805 words, 2117 messages
error
 spam  reg   good  reg
 2117  422   3207  297
half
 spam  reg   good  reg
 2117  288   3207  322
round 7, files 7
# 1580733 words, 3207 messages
# 1218496 words, 2117 messages
error
 spam  reg   good  reg
 2117  387   3207  242
half
 spam  reg   good  reg
 2117  261   3207  274
round 8, files 5
# 1717412 words, 3207 messages
# 1293816 words, 2117 messages
error
 spam  reg   good  reg
 2117  364   3207  229
half
 spam  reg   good  reg
 2117  274   3207  249
round 9, files 3
run 1
round 0, files 1
# 1435450 words, 3207 messages
# 1444462 words, 2117 messages
error
 spam  reg   good  reg
 2117  900   3207  888
round 1, files 8
# 1521052 words, 3207 messages
# 1326994 words, 2117 messages
error
 spam  reg   good  reg
 2117  581   3207  511
round 2, files 9
# 1604435 words, 3207 messages
# 1475321 words, 2117 messages
error
 spam  reg   good  reg
 2117  504   3207  379
round 3, files 4
# 1548414 words, 3207 messages
# 1299363 words, 2117 messages
error
 spam  reg   good  reg
 2117  544   3207  367
half
 spam  reg   good  reg
 2117  281   3207  430
round 4, files 6
# 1824156 words, 3207 messages
# 1271805 words, 2117 messages
error
 spam  reg   good  reg
 2117  485   3207  331
half
 spam  reg   good  reg
 2117  280   3207  389
round 5, files 3
# 1340454 words, 3207 messages
# 1269049 words, 2117 messages
error
 spam  reg   good  reg
 2117  466   3207  310
half
 spam  reg   good  reg
 2117  304   3207  338
round 6, files 2
# 1310658 words, 3207 messages
# 1305731 words, 2117 messages
error
 spam  reg   good  reg
 2117  417   3207  272
half
 spam  reg   good  reg
 2117  294   3207  301
round 7, files 5
# 1717412 words, 3207 messages
# 1293816 words, 2117 messages
error
 spam  reg   good  reg
 2117  396   3207  249
half
 spam  reg   good  reg
 2117  287   3207  290
round 8, files 0
# 1368656 words, 3207 messages
# 1289609 words, 2117 messages
error
 spam  reg   good  reg
 2117  363   3207  243
half
 spam  reg   good  reg
 2117  243   3207  265
round 9, files 7
run 2
round 0, files 9
# 1604435 words, 3207 messages
# 1475321 words, 2117 messages
error
 spam  reg   good  reg
 2117  927   3207  920
round 1, files 7
# 1580733 words, 3207 messages
# 1218496 words, 2117 messages
error
 spam  reg   good  reg
 2117  603   3207  509
round 2, files 2
# 1310658 words, 3207 messages
# 1305731 words, 2117 messages
error
 spam  reg   good  reg
 2117  586   3207  402
round 3, files 4
# 1548414 words, 3207 messages
# 1299363 words, 2117 messages
error
 spam  reg   good  reg
 2117  520   3207  328
half
 spam  reg   good  reg
 2117  272   3207  396
round 4, files 3
# 1340454 words, 3207 messages
# 1269049 words, 2117 messages
error
 spam  reg   good  reg
 2117  446   3207  308
half
 spam  reg   good  reg
 2117  286   3207  346
round 5, files 1
# 1435450 words, 3207 messages
# 1444462 words, 2117 messages
error
 spam  reg   good  reg
 2117  426   3207  252
half
 spam  reg   good  reg
 2117  278   3207  290
round 6, files 8
# 1521052 words, 3207 messages
# 1326994 words, 2117 messages
error
 spam  reg   good  reg
 2117  367   3207  259
half
 spam  reg   good  reg
 2117  242   3207  287
round 7, files 5
# 1717412 words, 3207 messages
# 1293816 words, 2117 messages
error
 spam  reg   good  reg
 2117  382   3207  248
half
 spam  reg   good  reg
 2117  281   3207  277
round 8, files 6
# 1824156 words, 3207 messages
# 1271805 words, 2117 messages
error
 spam  reg   good  reg
 2117  367   3207  251
half
 spam  reg   good  reg
 2117  289   3207  282
round 9, files 0

grep '^ 2117' runex.log >errortrain

Collect fp and fn figures

for method in full error half; do
  for round in 1 2 3 4 5 6 7 8 9; do
    for run in 0 1 2; do
      test -f ns-$method-$run-$round \
	 && grep -c '^1' ns-$method-$run-$round >>fp-$method
      test -f sp-$method-$run-$round \
	 && grep -c -v '^1' sp-$method-$run-$round >>fn-$method
    done
  done
done

Remaining data reduction performed in R:

errortrain <- read.table("F/errortrain")
errortrain$round <- c(rep(c(0,1,2,3,3,4,4,5,5,6,6,7,7,8,8),3))
errortrain$method <- c(rep(c("error","error","error",
  rep(c("error","half"),6)),3))
errortrain
     V1  V2   V3  V4 round method
1  2117 889 3207 882     0  error
2  2117 674 3207 495     1  error
3  2117 571 3207 400     2  error
4  2117 490 3207 300     3  error
5  2117 249 3207 377     3   half
6  2117 430 3207 380     4  error
7  2117 257 3207 395     4   half
8  2117 390 3207 288     5  error
9  2117 244 3207 337     5   half
10 2117 422 3207 297     6  error
11 2117 288 3207 322     6   half
12 2117 387 3207 242     7  error
13 2117 261 3207 274     7   half
14 2117 364 3207 229     8  error
15 2117 274 3207 249     8   half
16 2117 900 3207 888     0  error
17 2117 581 3207 511     1  error
18 2117 504 3207 379     2  error
19 2117 544 3207 367     3  error
20 2117 281 3207 430     3   half
21 2117 485 3207 331     4  error
22 2117 280 3207 389     4   half
23 2117 466 3207 310     5  error
24 2117 304 3207 338     5   half
25 2117 417 3207 272     6  error
26 2117 294 3207 301     6   half
27 2117 396 3207 249     7  error
28 2117 287 3207 290     7   half
29 2117 363 3207 243     8  error
30 2117 243 3207 265     8   half
31 2117 927 3207 920     0  error
32 2117 603 3207 509     1  error
33 2117 586 3207 402     2  error
34 2117 520 3207 328     3  error
35 2117 272 3207 396     3   half
36 2117 446 3207 308     4  error
37 2117 286 3207 346     4   half
38 2117 426 3207 252     5  error
39 2117 278 3207 290     5   half
40 2117 367 3207 259     6  error
41 2117 242 3207 287     6   half
42 2117 382 3207 248     7  error
43 2117 281 3207 277     7   half
44 2117 367 3207 251     8  error
45 2117 289 3207 282     8   half

roundmethod <- function(x) {
  x[5] == r && x[6] == m
}

rerr <- function(x,rnd) {
  y <- 0
  for (i in rnd) {
    r <<- i
    y <- c(y,x[apply(errortrain,1,roundmethod)])
  }
  y[2:length(y)]
}

m <- "error"
errorspamreg <- rerr(errortrain$V2, 0:8)
errornsreg <- rerr(errortrain$V4, 0:8)
errorreg <- data.frame(
  round=c(0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8),
  run=c(rep(c(0,1,2),9)), spam=errorspamreg,nonspam=errornsreg)  
errorreg
1      0   0  889     882
2      0   1  900     888
3      0   2  927     920
4      1   0  674     495
5      1   1  581     511
6      1   2  603     509
7      2   0  571     400
8      2   1  504     379
9      2   2  586     402
10     3   0  490     300
11     3   1  544     367
12     3   2  520     328
13     4   0  430     380
14     4   1  485     331
15     4   2  446     308
16     5   0  390     288
17     5   1  466     310
18     5   2  426     252
19     6   0  422     297
20     6   1  417     272
21     6   2  367     259
22     7   0  387     242
23     7   1  396     249
24     7   2  382     248
25     8   0  364     229
26     8   1  363     243
27     8   2  367     251

m <- "half"
halfspamreg <- rerr(errortrain$V2, 3:8)
halfnsreg <- rerr(errortrain$V4, 3:8)

halfreg <- data.frame(round=c(3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8),
  run=c(rep(c(0,1,2),6)), spam=halfspamreg,nonspam=halfnsreg)

ro <- rep(halfreg$round, 2)
me <- c(rep("error", 18), rep("half", 18))
sp <- c(errorreg$spam[10:27], halfreg$spam)
ns <- c(errorreg$nonspam[10:27], halfreg$nonspam)
data.frame(method=me, round=ro, run=rep(halfreg$run, 2),
  spam=sp, nonspam=ns, reg=sp+ns, percent=(sp+ns)*100/(3207+2117)) ->
trainreg
print(trainreg,digits=3)
   method round run spam nonspam reg percent
1   error     3   0  490     300 790   14.84
2   error     3   1  544     367 911   17.11
3   error     3   2  520     328 848   15.93
4   error     4   0  430     380 810   15.21
5   error     4   1  485     331 816   15.33
6   error     4   2  446     308 754   14.16
7   error     5   0  390     288 678   12.73
8   error     5   1  466     310 776   14.58
9   error     5   2  426     252 678   12.73
10  error     6   0  422     297 719   13.50
11  error     6   1  417     272 689   12.94
12  error     6   2  367     259 626   11.76
13  error     7   0  387     242 629   11.81
14  error     7   1  396     249 645   12.11
15  error     7   2  382     248 630   11.83
16  error     8   0  364     229 593   11.14
17  error     8   1  363     243 606   11.38
18  error     8   2  367     251 618   11.61
19   half     3   0  249     377 626   11.76
20   half     3   1  281     430 711   13.35
21   half     3   2  272     396 668   12.55
22   half     4   0  257     395 652   12.25
23   half     4   1  280     389 669   12.57
24   half     4   2  286     346 632   11.87
25   half     5   0  244     337 581   10.91
26   half     5   1  304     338 642   12.06
27   half     5   2  278     290 568   10.67
28   half     6   0  288     322 610   11.46
29   half     6   1  294     301 595   11.18
30   half     6   2  242     287 529    9.94
31   half     7   0  261     274 535   10.05
32   half     7   1  287     290 577   10.84
33   half     7   2  281     277 558   10.48
34   half     8   0  274     249 523    9.82
35   half     8   1  243     265 508    9.54
36   half     8   2  289     282 571   10.73

regaov <- aov(percent ~ method + round, data=trainreg)
summary(regaov)
            Df Sum Sq Mean Sq F value    Pr(>F)    
method       1 41.627  41.627  73.938 6.151e-10 ***
round        1 55.207  55.207  98.059 2.068e-11 ***
Residuals   33 18.579   0.563                      

d <- c(1.95996, 0.412, 0.423)
rdf <- 33
rms <- deviance(regaov)/rdf
z <- (d[1] + 1 / (rdf * d[2] - d[3])) * sqrt(rms/3)
meanreg <- apply(array(trainreg$percent,dim=c(3,12)), 2, mean)
lcl95 <- meanreg - z          
ucl95 <- meanreg + z          

data.frame(round=c(3,4,5,6,7,8), errorreg=meanreg[1:6],
elcl95=lcl95[1:6],
  eucl95=ucl95[1:6], halfreg=meanreg[7:12], hlcl95=lcl95[7:12],
  hucl95=ucl95[7:12]) -> regresults

print(regresults, digits=3)
  round errorreg elcl95 eucl95 halfreg hlcl95 hucl95
1     3     16.0   15.1   16.8    12.6  11.67   13.4
2     4     14.9   14.0   15.8    12.2  11.35   13.1
3     5     13.3   12.5   14.2    11.2  10.33   12.1
4     6     12.7   11.9   13.6    10.9   9.97   11.7
5     7     11.9   11.0   12.8    10.5   9.57   11.3
6     8     11.4   10.5   12.3    10.0   9.15   10.9

X11(width=3.5, height=3.5)
plot(regresults$round - 0.02, regresults$errorreg,
  main="Error training vs mixed training", ylim=c(6,17),
  xlab="Number of training cycles", ylab="Percent registered",
col="red")
lines(regresults$round - 0.02, regresults$eucl95, type="h", col="red")   
lines(regresults$round - 0.02, regresults$elcl95, type="h",
col="white") 
lines(regresults$round - 0.02, regresults$errorreg, col="red")
points(regresults$round + 0.02, regresults$halfreg, col="blue")
lines(regresults$round + 0.02, regresults$hucl95, type="h", col="blue")
lines(regresults$round + 0.02, regresults$hlcl95, type="h",
col="white")
lines(regresults$round + 0.02, regresults$halfreg, col="blue")
points(regresults$round, regresults$errorreg - regresults$halfreg + 4.5)
lines(regresults$round, regresults$errorreg - regresults$halfreg + 4.5) 
text(6, 16.7, labels="error", col="red", pos=4)
text(6, 16.1, labels="half", col="blue", pos=4)
text(6, 15.5, labels="difference+4.5", pos=4)

read.table("F/fp-full") -> fpfull
read.table("F/fn-full") -> fnfull
read.table("F/fp-error") -> fperror
read.table("F/fn-error") -> fnerror
read.table("F/fp-half") -> fphalf 
read.table("F/fn-half") -> fnhalf

tenround <- data.frame(method=c(rep("full", 27), rep("error", 27)),
round=c(rep(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9),2)),
  run=c(rep(0:2, 18)), fpos=c(fpfull$V1, fperror$V1),
  fneg=c(fnfull$V1, fnerror$V1))
tenround$err <- tenround$fpos + tenround$fneg
tenround$percent <- tenround$err * 100 / (3207 + 2117)

print(tenround, digits=3)
   method round run fpos fneg err percent
1    full     1   0   89  399 488    9.17
2    full     1   1   98  332 430    8.08
3    full     1   2   85  362 447    8.40
4    full     2   0   96  312 408    7.66
5    full     2   1  100  251 351    6.59
6    full     2   2  104  297 401    7.53
7    full     3   0   72  244 316    5.94
8    full     3   1  101  270 371    6.97
9    full     3   2  102  273 375    7.04
10   full     4   0   83  235 318    5.97
11   full     4   1  112  246 358    6.72
12   full     4   2   87  246 333    6.25
13   full     5   0   82  211 293    5.50
14   full     5   1  101  239 340    6.39
15   full     5   2   78  231 309    5.80
16   full     6   0   99  235 334    6.27
17   full     6   1   87  249 336    6.31
18   full     6   2   66  202 268    5.03
19   full     7   0   90  210 300    5.63
20   full     7   1   95  215 310    5.82
21   full     7   2   92  217 309    5.80
22   full     8   0   83  203 286    5.37
23   full     8   1   80  193 273    5.13
24   full     8   2   96  224 320    6.01
25   full     9   0   92  213 305    5.73
26   full     9   1   87  207 294    5.52
27   full     9   2   82  196 278    5.22
28  error     1   0   31  709 740   13.90
29  error     1   1   24  633 657   12.34
30  error     1   2   26  635 661   12.42
31  error     2   0   27  621 648   12.17
32  error     2   1   25  514 539   10.12
33  error     2   2   27  638 665   12.49
34  error     3   0   21  512 533   10.01
35  error     3   1   25  587 612   11.50
36  error     3   2   24  538 562   10.56
37  error     4   0   22  443 465    8.73
38  error     4   1   26  470 496    9.32
39  error     4   2   26  462 488    9.17
40  error     5   0   25  379 404    7.59
41  error     5   1   28  479 507    9.52
42  error     5   2   20  432 452    8.49
43  error     6   0   20  419 439    8.25
44  error     6   1   25  427 452    8.49
45  error     6   2   18  378 396    7.44
46  error     7   0   25  394 419    7.87
47  error     7   1   23  415 438    8.23
48  error     7   2   23  378 401    7.53
49  error     8   0   22  388 410    7.70
50  error     8   1   20  366 386    7.25
51  error     8   2   16  361 377    7.08
52  error     9   0   24  372 396    7.44
53  error     9   1   21  382 403    7.57
54  error     9   2   25  336 361    6.78

sixround <- data.frame(
  method=c(rep("full",18), rep("error", 18), rep("half", 18)),
  round=c(rep(c(4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9),3)),
  run=c(rep(0:2, 18)),
  fpos=c(fpfull$V1[10:27], fperror$V1[10:27], fphalf$V1),
  fneg=c(fnfull$V1[10:27], fnerror$V1[10:27], fnhalf$V1))
sixround$err <- sixround$fpos + sixround$fneg
sixround$percent <- sixround$err * 100 / (3207 + 2117)

print(sixround, digits=3)
   method round run fpos fneg err percent
1    full     4   0   83  235 318    5.97
2    full     4   1  112  246 358    6.72
3    full     4   2   87  246 333    6.25
4    full     5   0   82  211 293    5.50
5    full     5   1  101  239 340    6.39
6    full     5   2   78  231 309    5.80
7    full     6   0   99  235 334    6.27
8    full     6   1   87  249 336    6.31
9    full     6   2   66  202 268    5.03
10   full     7   0   90  210 300    5.63
11   full     7   1   95  215 310    5.82
12   full     7   2   92  217 309    5.80
13   full     8   0   83  203 286    5.37
14   full     8   1   80  193 273    5.13
15   full     8   2   96  224 320    6.01
16   full     9   0   92  213 305    5.73
17   full     9   1   87  207 294    5.52
18   full     9   2   82  196 278    5.22
19  error     4   0   22  443 465    8.73
20  error     4   1   26  470 496    9.32
21  error     4   2   26  462 488    9.17
22  error     5   0   25  379 404    7.59
23  error     5   1   28  479 507    9.52
24  error     5   2   20  432 452    8.49
25  error     6   0   20  419 439    8.25
26  error     6   1   25  427 452    8.49
27  error     6   2   18  378 396    7.44
28  error     7   0   25  394 419    7.87
29  error     7   1   23  415 438    8.23
30  error     7   2   23  378 401    7.53
31  error     8   0   22  388 410    7.70
32  error     8   1   20  366 386    7.25
33  error     8   2   16  361 377    7.08
34  error     9   0   24  372 396    7.44
35  error     9   1   21  382 403    7.57
36  error     9   2   25  336 361    6.78
37   half     4   0   64  252 316    5.94
38   half     4   1   81  273 354    6.65
39   half     4   2   64  279 343    6.44
40   half     5   0   56  237 293    5.50
41   half     5   1   64  294 358    6.72
42   half     5   2   44  270 314    5.90
43   half     6   0   54  272 326    6.12
44   half     6   1   52  298 350    6.57
45   half     6   2   39  241 280    5.26
46   half     7   0   50  269 319    5.99
47   half     7   1   44  288 332    6.24
48   half     7   2   51  277 328    6.16
49   half     8   0   45  272 317    5.95
50   half     8   1   38  241 279    5.24
51   half     8   2   36  286 322    6.05
52   half     9   0   37  279 316    5.94
53   half     9   1   44  277 321    6.03
54   half     9   2   40  240 280    5.26

tenround$method <- factor(tenround$method)
tenround$round <- factor(tenround$round)
tenround$run <- factor(tenround$run)
tenaov <- aov(percent ~ method + round + run, data=tenround)
summary(tenaov)
            Df  Sum Sq Mean Sq  F value    Pr(>F)    
method       1 112.845 112.845 174.6996 < 2.2e-16 ***
round        8 107.077  13.385  20.7213 2.793e-12 ***
run          2   0.228   0.114   0.1765    0.8388    
Residuals   42  27.129   0.646                       

d <- c(1.95996, 0.412, 0.423)
rdf <- 42
rms <- deviance(tenaov)/rdf
z <- (d[1] + 1 / (rdf * d[2] - d[3])) * sqrt(rms/3)
meanerr <- apply(array(tenround$percent,dim=c(3,18)), 2, mean)
lcl95 <- meanerr - z          
ucl95 <- meanerr + z       

tenres <- data.frame(round=c(1:9),
  meanfullpc=meanerr[1:9], flcl95=lcl95[1:9], fucl95=ucl95[1:9],
  meanerrorpc=meanerr[10:18], elcl95=lcl95[10:18], eucl95=ucl95[10:18])

print(tenres,digits=3)
  round meanfullpc flcl95 fucl95 meanerrorpc elcl95 eucl95
1     1       8.55   7.61   9.48       12.89  11.95  13.82
2     2       7.26   6.33   8.20       11.60  10.66  12.53
3     3       6.65   5.71   7.59       10.69   9.75  11.62
4     4       6.32   5.38   7.25        9.07   8.14  10.01
5     5       5.90   4.96   6.83        8.53   7.60   9.47
6     6       5.87   4.94   6.81        8.06   7.12   8.99
7     7       5.75   4.82   6.69        7.88   6.94   8.81
8     8       5.50   4.57   6.44        7.34   6.41   8.28
9     9       5.49   4.55   6.43        7.26   6.33   8.20

sixround$method <- factor(sixround$method)
sixround$round <- factor(sixround$round)
sixround$run <- factor(sixround$run)
sixaov <- aov(percent ~ method + round + run, data=sixround)
summary(sixaov)
            Df Sum Sq Mean Sq  F value    Pr(>F)    
method       2 54.390  27.195 138.4376 < 2.2e-16 ***
round        5  7.351   1.470   7.4838 3.735e-05 ***
run          2  1.974   0.987   5.0245   0.01083 *  
Residuals   44  8.643   0.196                       

d <- c(1.95996, 0.412, 0.423)
rdf <- 44
rms <- deviance(sixaov)/rdf
z <- (d[1] + 1 / (rdf * d[2] - d[3])) * sqrt(rms/3)
meanerr <- apply(array(sixround$percent,dim=c(3,18)), 2, mean)
lcl95 <- meanerr - z          
ucl95 <- meanerr + z       

sixres <- data.frame(round=c(4:9),
  meanfull=meanerr[1:6], flcl95=lcl95[1:6], fucl95=ucl95[1:6],
  meanerror=meanerr[7:12], elcl95=lcl95[7:12], eucl95=ucl95[7:12],
  meanhalf=meanerr[13:18], hlcl95=lcl95[13:18], hucl95=ucl95[13:18])

print(sixres,digits=3)
  round meanfull flcl95 fucl95 meanerror elcl95 eucl95 meanhalf hlcl95 hucl95
1     4     6.32   5.80   6.83      9.07   8.56   9.59     6.34   5.83   6.86
2     5     5.90   5.38   6.41      8.53   8.02   9.05     6.04   5.53   6.56
3     6     5.87   5.36   6.39      8.06   7.54   8.57     5.99   5.47   6.50
4     7     5.75   5.24   6.27      7.88   7.36   8.39     6.13   5.61   6.65
5     8     5.50   4.99   6.02      7.34   6.83   7.86     5.75   5.23   6.26
6     9     5.49   4.97   6.01      7.26   6.75   7.78     5.74   5.23   6.26

X11(width=3.5, height=3.5)
plot(tenres$round - 0.06, tenres$meanerrorpc,
   main="Full training vs training on error", ylim=c(4,14),
   xlab="Number of training cycles", ylab="Percent error", col="red")
lines(tenres$round - 0.06, tenres$eucl95, type="h", col="red")
lines(tenres$round - 0.06, tenres$elcl95, type="h", col="white")
lines(tenres$round - 0.06, tenres$meanerrorpc, col="red")
points(tenres$round + 0.06, tenres$meanfullpc) 
lines(tenres$round + 0.06, tenres$fucl95, type="h")
lines(tenres$round + 0.06, tenres$flcl95, type="h", col="white")
lines(tenres$round + 0.06, tenres$meanfullpc)
points(sixres$round, sixres$meanhalf, col="blue")
lines(sixres$round, sixres$hucl95, type="h", col="blue")
lines(sixres$round, sixres$hlcl95, type="h", col="white")
lines(sixres$round, sixres$meanhalf, col="blue")
lines(c(3,4), c(tenres$meanfullpc[3], sixres$meanhalf[1]), col="blue")
text(6, 13.75, labels="error", col="red", pos=4)
text(6, 13.2, labels="full", pos=4)
text(6, 12.65, labels="half", col="blue", pos=4)
axis(1,at=c(1,2,3,4,5,6,7,8,9))

Greg Louis, 2002, 2003; last modified 2003-04-12]