Table of ContentsPreviousNextIndex

Why Bayesian filtering is better

1. The Bayesian method takes the whole message into account - It recognizes keywords that identify spam, but it also recognizes words that denote valid mail. For example: not every email that contains the word "free" and "cash" is spam. The advantage of the Bayesian method is that it considers the most interesting words (as defined by their deviation from the mean) and comes up with a probability that a message is spam. The Bayesian method would find the words "cash" and "free" interesting but it would also recognize the name of the business contact who sent the message and thus classify the message as legitimate, for instance; it allows words to "balance" each other out. In other words, Bayesian filtering is a much more intelligent approach because it examines all aspects of a message, as opposed to keyword checking that classifies a mail as spam on the basis of a single word.

2. A Bayesian filter is constantly self-adapting - By learning from new spam and new valid outbound mails, the Bayesian filter evolves and adapts to new spam techniques. For example, when spammers started using "f-r-e-e" instead of "free" they succeeded in evading keyword checking until "f-r-e-e" was also included in the keyword database. On the other hand, the Bayesian filter automatically notices such tactics; in fact if the word "f-r-e-e" is found, it is an even better spam indicator, since its unlikely to occur in a ham mail. Another example would be using the word "5ex" instead of "Sex". You would probably not have a word 5ex in a ham mail, and therefore the likelihood that its spam increases.

3. The Bayesian technique is sensitive to the user - It learns the email habits of the company and understands that, for example, the word `mortgage' might indicate spam if the company running the filter is, say, a car dealership, whereas it would not indicate it as spam if the company is a financial institution dealing with mortgages.

4. The Bayesian method is multi-lingual and international - A Bayesian anti-spam filter, being adaptive, can be used for any language required. Most keyword lists are available in English only and are therefore quite useless in non English-speaking regions. The Bayesian filter also takes into account certain languages deviations or the diverse usage of certain words in different areas, even if the same language is spoken. This intelligence enables such a filter to catch more spam.

5. A Bayesian filter is difficult to fool, as opposed to a keyword filter - An advanced spammer who wants to trick a Bayesian filter can either use fewer words that usually indicate spam (such as free, Viagra, etc), or more words that generally indicate valid mail (such as a valid contact name, etc). Doing the latter is impossible because the spammer would have to know the email profile of each recipient - and a spammer can never hope to gather this kind of information from every intended recipient. Using neutral words, for example the word "public", would not work since these are disregarded in the final analysis. Breaking up words associated with spam, such as using "m-o-r-t-g-a-g-e" instead of "mortgage", will only increase the chance of the message being spam, since a legitimate user will rarely write the word "mortgage" as "m-o-r-t-g-a-g-e".

What's the catch?

Bayesian filtering, if implemented the right way and tailored to your company is by far the most effective technology to combat spam. Is there a downside? Well, in a way there is one downside, but this can easily be overcome: Before you can use and judge the Bayesian filter, you have to wait for it to learn for at least two weeks - that or create the ham or spam databases yourself. This task can be quite complex, so it is best to wait until the filter has had time to learn. Over time, the Bayesian filter becomes more and more effective as it learns more about your organization's email habits. To quote the old saying, good things come to he who waits.


Table of ContentsPreviousNextIndex