Comparison of Bayesian spam filters

Monday, 11 August 2003, 11:21 AM EST

Spam e-mail has become an ever increasing problem, and these days it is next to impossible to use e-mail without receiving it in large amounts. Various techniques exits to combat the problem; keyword-based filters, source blacklists, signature blacklists, source verification and combinations of these to name a few. All of them have problems; keyword filters needs to be constantly updated manually and are not very accurate; blacklists also need to be constantly updated, and will always lag behind spammers.

Fortunately, just as we seemed to be losing the war on spam, a new technique appeared on the scene after a paper by Paul Graham: Bayesian filters, our last, best hope for spam-free inboxes. Without going into details on how they work (more information can be found here and here), they are based on statistical methods which gives a probability for an e-mail belonging to a given class (usually just two classes are used; spam and not-spam, but this is not a limitation of the technique, and indeed, POPFile supports an arbitrary number of classes). The beauty of bayesian filtering is that the filter can be trained by each individual user simply by categorizing each received e-mail as either spam or not-spam; after the user has categorized a few e-mails the filter will begin to make this categorization by itself, and usually with a very high level of accuracy. If the filter makes a mistake, the user re-categorizes the e-mail; the filter learns from its mistakes. No complicated maintenance is required after the filter is installed; it's so easy even grandma can use it.

[ Read more ]

Related items


What's the real cost of a security breach?

The majority of business decision makers admit that their organisation will suffer an information security breach and that the cost of recovery could start from around $1 million.

Weekly newsletter

Reading our newsletter every Monday will keep you up-to-date with security news.

Daily digest

Receive a daily digest of the latest security news.

Thu, Feb 11th