The bogofilter manual says
Integration with Mail Transport Agent (MTA) 1. bogofilter can also be integrated into an MTA to filter all incoming mail. While the specific implementation is MTA dependent, the gen- eral steps are as follows 2. Install bogofilter on the mail server 3. Prime the bogofilter databases with a spam and non-spam corpus. Since bogofilter will be serving a larger community, it is important to prime it with a representative set of messages. 4. Set up the MTA to invoke bogofilter on each message. While this is an MTA specific step, you'll probably need to use the -p, -u, and -e options. 5. Set up a mechanism for users to register spam/nonspam messages, as well as to correct mis-classifications. The most generic solution is to set up alias email addresses to which users bounce messages.
Any sysadmin will tell you that expecting ordinary users to register spam/nonspam messages is... naïve. The most we can hope for is that, if we set up a mail alias for spam, some portion of the false negatives will be fed to it. Also fed to it, however, will be legitimate mail from people the user is mad at, and all sorts of such guff. Manual inspection will be essential. Worse: there's a strong possibility that bogofilter will be misled by the extra headers that bounced email will carry.
Where does this leave us?
Now imagine a site just ten times the size of mine. Hand-checking 3500 emails every day is a bit of a pain, right? What about a site fifty times the size of mine?
We haven't yet addressed the point (it's discussed in Graham's original paper, though) that Bayesian spam filtering is likely to be most efficient when it's done on an individual basis. Nonspam emails for a large user population are, taken as a whole, less dissimilar from spam than any individual's nonspam email is likely to be. I don't know whether this effect would be significant at the 100-user, the 1000-user or the 10000-user level. It would have to be tested.
My conclusion is that bogofilter, and Bayesian filtering in general, aren't necessarily less labour-intensive, on a large scale, than rule-based spam filtering.
Note added two years later (2004-09-10)
With less
rigorous training than described above, sites very much larger than
mine -- 60,000 users in one case and 150,000 users in another -- are
using bogofilter with one wordlist for all users successfully in their
fight against spam. See Blosser, J. and Josephsen, D.,
Scalable Centralized Bayesian Spam Mitigation with
Bogofilter, 2004 LISA XVIII - Novermber 14-19, 2004 - Atlanta,
GA, USA, in press, for an account.
[© Greg Louis, 2002; last modified 2004-09-10]