How to train the Bayesian Analysis Antispam Filter:
Download the SPAM database:
GFI provides a database containing spam e-mail information which can be used to improve the spam information in the Bayesian database. GFI updates this database with new spam types; therefore you may want to configure GFI MailEssentials to check for new updates of the spam database automatically. This will help the Bayesian filter block more spam.
Learn from Outbound mails:
In this case, GFI MailEssentials will automatically use the information in outbound emails to improve the HAM information (information on legitimate emails). This will help reduce false positives.
GFI Anti-Spam Public Folders:
Spam emails which are not blocked by the Bayesian filter, or legitimate emails which have been incorrectly blocked as spam, can be used to train the Bayesian filter. The GFI Anti-Spam Public Folders can be used for this purpose. You can drag a spam email which has not been blocked to the Public folder "This is spam email". GFI MailEssentials will use the information from these emails to improve the SPAM information in the Bayesian database.
On the other hand, a legitimate email which has been blocked can be dragged to the "This is legitimate email" public folder for the Bayesian filter to learn that such mails should not be blocked. The GFI Anti-Spam Public folders can be used with Microsoft Exchange server or any other mail server which supports IMAP.
Bayesian Wizard:
The Bayesian analysis wizard is a tool which can be used to update the Bayesian database with information from your spam and legitimate email. The wizard provides the following functionality:
Improves the ham information in the Bayesian database. This can be done by retrieving this information from the emails located in an Exchange mailbox, or in an Outlook PST file (rather then waiting for it to learn from outbound mail). This will make the Bayesian filter more effective, sooner
If you have been collecting spam emails in one mailbox, you can use the Bayesian Wizard to scan this location (Microsoft Exchange mailbox, or Outlook PST file) and improve the SPAM information in the Bayesian database.
The following points should be taken into consideration when training the Bayesian filter:
- The amount of SPAM in the Bayesian database should always be more than double the amount of HAM. This can be confirmed from the GFI MailEssentials configuration -> Anti-Spam -> Bayesian filter.
- After you enable the Bayesian filter, you need to monitor the emails which are blocked for false positives. You can use the false positives to teach the Bayesian that such mails are ham by using the GFI MailEssentials Public Folders
- Any spam emails which are not blocked by GFI MailEssentials, need to be used to teach the Bayesian filter that such emails are spam. The Public Folders can be used for this purpose
- It is only necessary to enable automatic learning from outbound emails if you notice that the Bayesian is blocking false positives regularly
- Occasionally, GFI MailEssentials will compact the Bayesian database. This involves removing redundant data from the database. Therefore the size and data in the Bayesian database may eventually decrease. This process will not degrade the effectiveness of the Bayesian filter
- GFI MailEssentials will not necessarily add data from all emails which are passed on to the Bayesian database for learning. This occurs when the data in the database is already at an optimum level
- Information from outbound emails is not added immediately to the Bayesian database. The information from multiple emails is first cached, and added to the database at one go. Therefore you may not notice the HAM count increasing immediately
- The Bayesian filter is also effective if your legitimate emails are not in English
How to manually update the Bayesian Database:
Occasionally, you may need to update the Bayesian database with information from a PST file or a mailbox which is not accessible from the GFI MailEssentials machines. In such cases, the Bayesian database will need to be updated through the Bayesian Analysis Wizard installed on the remote machine by:
- Installing the Bayesian Wizard on the remote machine
- Updating the Bayesian database
- Copying the Bayesian database back to the GFI MailEssentials machine
Step 1: Install the Bayesian Wizard on the remote machine
- On the GFI MailEssentials machine, browse to the ...\GFI\MailEssentials\Antispam\BSW
- Locate the file bayesianwiz.exe and copy it to the machine where the email information you want to update the Bayesian database with is available
- Run bayesianwiz.exe and proceed with the installation
Step 2: Update the Bayesian database
- On the GFI MailEssentials machine, browse to the ..GFI\MailEssentials\Antispam\Data folder
- Locate the file weights.bsp and copy it to the machine where the email information you want to update the Bayesian database with is available
- On the remote machine, launch the Bayesian Analysis Wizard, installed in Step 1
- In the Bayesian SPAM Profile window, select Create or Update Bayesian Spam profile (bsp) file
- Specify the location of the weights.bsp file copied from the GFI MailEssentials machine
- Select whether to import the email data as spam or as legitimate emails and select from where to import
- Proceed through the wizard until the end
Step 3: Copy the Bayesian database back to the GFI MailEssentials machine.
- Stop the GFI MailEssentials services and the IIS Admin service (taking note of dependent services that are stopped). Note that this will stop emails from being received by IIS, and will also stop the Microsoft Exchange server if it is installed on the same machine
- Copy the weights.bsp file from the remote machine to the GFI MailEssentials machine, into the <..\GFI\MailEssentials\Antispam\Data folder.
- Re-start the services stopped in point 1
How to create a new Bayesian Database:
To start with a new Bayesian database, you will first need to remove the one that you are using.
Proceed as follows:
- Stop the GFI MailEssentials services and the IIS Admin service (taking note of dependencies that are stopped) NOTE: this will stop emails from being received by IIS, and will also stop Exchange server if it is installed on the same machine
- Browse to <..\GFI\MailEssentials\Antispam\data folder
- Remove weights.bsp
- Open the GFI MailEssentials configuration > Anti-Spam > Anti-Spam Filters > Select the Bayesian filter.
- To the right, Disable the Bayesian filter
- Enable Auto learning from outbound emails
- Delete <..\GFI\MailEssentials\Antispam\autoupdate\bayesian10.txt in MailEssentials)
- Download a new weights.bsp from
- Extract weights.bsp from the downloaded file to the <..\GFI\MailEssentials\Antispam\data folder
- Start the services stopped in (1)
- Download the SPAM database from our site. This can be done from the GFI MailEssentials configuration > Anti-Spam > Anti-Spam Filters > Bayesian Analysis > Updates tab. Click 'Download Updates'.
- In GFI MailEssentials, learning can be accelerated by making use of the Public folders or the IMAP folders. More information on this can be found in the GFI MailEssentials documentation.
Priyanka Bhotika