Bogofilter FAQ

Revision: $Id: bogofilter-faq.html,v 1.3 2003/03/06 20:45:23 relson Exp $
Official Version: http://bogofilter.sourceforge.net/bogofilter-faq.html
Maintainer: Adrian Otto < aotto at aotto.com >

This document is intended to answer frequently asked questions about bogofilter.


What is bogofilter?

Bogofilter is a fast Bayesian spam filter along the lines suggested by Paul Graham in his article A Plan For Spam. Gary Robinson's geometric-mean algorithm is also included as is his modification of the algorithm to use Fisher's method.

Check the bogofilter home page for more information.


Bogofilter History

Bogofilter was written by Eric S. Raymond on August 19, 2002. It gained popularity in September 2002, and a number of other authors have started to contribute to the project.


Mailing Lists

There are currently four mailing lists for bogofilter

List Address Description
bogofilter-announce@aotto.com [subscribe] An announcement-only list where new versions are announced.
bogofilter@aotto.com [subscribe] A discussion list where any conversation about bogofilter may take place.
bogofilter-dev@aotto.com [subscribe] A list for sharing patches, development, and technical discussions
bogofilter-cvs@lists.sourceforge.net [subscribe] Mailing list for announcing code changes to the CVS archive


Bogogfilter Home Page

The bogofilter home page at sourceforge is the central clearinghouse for bogofilter resources.


How do I find the spam and ham counts for a token?

Use bogoutil's '-w' option to display info about a token (word). For example, "bogoutil -d $BOGOFILTER_DIR example.com" gives the good and bad counts for "example.com".


How do I tell how many messages are in my wordlists?

Use bogoutil's '-w' option to display the value of special token .MSG_COUNT, i.e. run command "bogoutil -w $BOGOFILTER_DIR .MSG_COUNT" to see the counts for the spam and ham wordlists.


How do I tell how many tokens are in my wordlists?

Pipe the output of bogoutil's dump command to command "wc", i.e. use "bogoutil -d $BOGOFILTER_DIR/spamlist.db | wc -l " to display the count for the spamlist and use "bogoutil -d $BOGOFILTER_DIR/goodlist.db | wc -l " to display the count for the goodlist.


How do I get bogofilter working on Solaris, BSD, etc?

If you don't already have a v3.0+ version of BerkeleyDB, then download it unpack it, and do these commands in the 'dist' directory:

$ ./configure
$ make
$ make install

Next, download a portable version of bogofilter.

On Solaris

Unpack it, and then do:

$ ./configure --with-db=/usr/local/BerkeleyDB-4.0
$ make && make install

You will either want to put a symlink to libdb.so in /usr/lib, or use a modified LD_LIBRARY_PATH environment variable before you start bogofilter.

$ LD_LIBRARY_PATH=/usr/lib:/usr/local/lib:/usr/local/BerkeleyDB-4.0

On FreeBSD

Unpack it, and do:

$ ./configure --with-db=/usr/local/BerkeleyDB.4.0
$ make && make install
$ ldconfig /usr/lib /usr/local/lib \
/usr/local/BerkeleyDB.4.0/lib

The 'ldconfig' command gives the runtime linker a way to find the new libraries. Be careful not to clobber other paths you may haev added using ldconfig.

On HP-UX

See the file README.hp-ux in the source distribution.


Can I share wordlists over NFS?

Yes, provided you use the correct file locking to avoid data corruption. When you compile bogofilter, you will need to verify that the configure script has set "#define HAVE_FCNTL 1" in your config.h file. Popular UNIX operating systems will all support this. If you are running an unsual, or an older version of an operating system, make sure it supports fcntl(). If "#define HAVE_FCNTL 1" is set, then comment out "#define HAVE_FLOCK 1" so that the locking system uses fcntl() locking instead of the default of flock() locking. If your system does not support fcntl, then you will not be able to share wordlist files over NFS without the risk of data corruption.

Next, make sure you have NFS set up properly, with "lockd" running. Refer to your NFS documentation for more information about running "lockd" or "rpc.lockd". Most operating systems with NFS turn this on by default.


Why does bogofilter give return codes like 0 and 256 when it's run from inside a program?

Likely the return codes are being reformatted by waitpid(2). Use WEXITSTATUS(status) in sys/wait.h, or comparable macro, to get the correct value.


$Id: bogofilter-faq.html,v 1.3 2003/03/06 20:45:23 relson Exp $