The 20 Minute E-mail Solution!
TOC PREV NEXT INDEX

Updating the antispam-table.txt File


The antispamseeder.exe utility, located in the IMail top directory, is used to manage the spam and non-spam word counts for a host. These word counts are stored in the antispam-table.txt file, which is used by the statistical filter to determine which messages are spam. Antispamseeder.exe updates this file based on messages in a mailbox or words entered in the command line.You can use this utility to modify the antispam-table.txt file in the following ways:

Warning: Never update the antispam-table.txt file manually. This will cause content filtering to fail.

Command Syntax

The following parameters can be placed in any order within the antispamseeder.exe command. Note that each parameter does not contain spaces.

antispamseeder.exe [-h<hostname>] [-w<word>] [-c<word count>] [-x] [-spam|-good] 

Parameter Function
-c<word count> Represents the spam count or non-spam count of a word. This can also represent the total number of times the word has occurred in all e-mail messages.
-good Identifies a word or mailbox as non-spam.
-h<hostname> Represents the name of the host.
-m<mailbox> The mailbox name or path.
-spam Identifies a word or mailbox as spam.
-w<word> Represents a word. This is used in conjunction with -c to set the spam or non-spam count for a word within the antispam-table.txt file. It is also used in conjunction with -x to delete a word from the antispam-table.txt file.
-x Deletes the word specified by the -w parameter from the antispam-table.txt file.
-e<exclude.txt file> Prevents a domain name from being added to the URL Domain Black List. Used when you are importing a mailbox into the URL Domain Black List that contains domain names that are not spam.
-l Adds a mailbox or domain to the URL Domain Black List, and updates the antispam-table.txt file. -l can only be used with the -spam parameter, not -good.
-lo Use this parameter in a command to update only the URL Domain Black List.
-t<antispam-table.txt> Identifies the antispam-table.txt file that will be merged with the specified host's antispam-table.txt file.

Reading the Antispam-table.txt File

The antispam-table.txt file is the file that contains each word that content filtering uses to determine if a message is spam. Beside each word there are three numbers. The first number is an identifier assigned by the anti-spam engine. The second number is the number of times that the word has occurred in non-spam e-mail messages. The third number is the number of times that the word has occurred in spam e-mail messages.

You may find that the words and values contained in the antispam-table.txt file are not appropriate for your use. In this case, you can customize the file based on your needs by using the antispamseeder.exe utility. See also "Customizing the Antispam-table.txt File". The following image is an excerpt from the antispam-table.txt file.

Modifying the antispam-table.txt File

The antispamseeder.exe utility, located in the IMail top directory, is used to manage the spam and non-spam word counts for a host. These word counts are stored in the antispam-table.txt file, which is used by the statistical filter to determine which messages are spam. Antispamseeder.exe updates this file based on messages in a mailbox or words entered in the command line.You can use this utility to modify the antispam-table.txt file in the following ways:

Warning: Never update the antispam-table.txt file manually. This will cause content filtering to fail.

Resolving Incorrectly Identified E-Mail

Whenever IMail Server incorrectly identifies a mail message as spam (false positive), you can use antispamseeder.exe to add statistical information about those e-mails into the antispam-table.txt file to rebalance the spam and non-spam word counts. This increases the likelihood that the e-mail will be correctly identified in the future.

Resolving False Positives in the Antispam-table.txt File

When you have messages that are incorrectly identified as spam, you can place the messages in a mailbox and add the entire contents of the mailbox to the antispam-table.txt file at once. The following procedure explains what to do when legitimate messages have been identified as spam:

  1. Place all of the incorrectly identified e-mail (non-spam) in a single mailbox. Make sure that this mailbox contains only non-spam.
  2. Prepare the mailbox. See "Preparing Mailboxes for use with antispamseeder.exe".
  3. Alter the non-spam word counts within the file by entering the following command in the command prompt substituting the hostname and mailbox with your host name and the name of the mailbox that contains the incorrectly identified (non-spam) messages:
    antispamseeder.exe -good -h<hostname> -m<mailbox> 
    
  4. The antispam-table.txt in the host's directory is updated with the new word counts.
Example
If you had a host named "Host1" and a mailbox named "good", you would enter the following command:
antispamseeder.exe -good -hHost1 -mC:\IMail\Host1\users\root\good.mbx 

Using one Antispam-table.txt File for All Hosts

To use the antispam-table.txt file from the primary host's directory instead of the secondary host's directory, do the following:

  1. In the left panel, expand the localhost and select expand a host that has an IP address.
  2. Under the host, select the Antispam folder.
  3. In the right panel, click the Content Filtering tab and select Use Primary AntiSpam Table.
    Note: This option is enabled by default upon installation.
  4. The anti-spam engine reads the antispam-table.txt file from the primary host's directory each time that content filtering is performed on a message. Therefore, this file will not appear in the secondary host's directory.

Creating Separate Antispam-table.txt Files for Hosts

There may be occasions where a secondary host does not want to use the primary host's antispam-table.txt file because the primary host considers words to be spam that the secondary host doesn't. The host needs to create a separate antispam-table.txt file that applies only to the Host. There are two steps to this process:

To create a separate antispam-table.txt file for a host, complete the following steps:

  1. In the left panel, expand the localhost, and then expand a host that has an IP address.
  2. Under the host, select the Antispam folder.
  3. In the right panel, click the Content Filtering tab and clear the Use Primary AntiSpam Table check box.
When the above option is cleared, the antispam-table.txt file is placed in the secondary host's directory, and is ready for modification. This file is a copy of the primary host's antispam-table.txt file that was created during the installation process.
  1. Click OK at the bottom of the Content Filtering tab to save your changes.
  2. Now that you have created a new antispam-table.txt file, you need to modify the word counts contained in it. To modify the word counts, you must use the antispamseeder.exe. For information on how to do this, see "Customizing the Antispam-table.txt File" below.
    Note: If a host's directory already contains an antispam-table.txt file, you must delete it before clearing the Use Primary AntiSpam Table option. If you do not delete it, the new antispam-table.txt file will not be copied to the directory and the word counts will not be updated. You can back up this file to another location in case you decide to revert back to it later.

Customizing the Antispam-table.txt File

There are several reasons why you may want to customize the antispam-table.txt file. For example, perhaps the administrator for the primary host is not satisfied with the antispam-table.txt file that ships with the product, and wants to modify it. Or, maybe a secondary host wants to define certain words as spam that the primary host wants to define as non-spam. In these cases, the antispam-table.txt file will need to be altered. The following procedure explains how to do this.

To create new word counts within the antispam-table.txt file, you must use words from spam and non-spam e-mail messages.

  1. Before you begin, identify which mailboxes you will use to create the antispam-table.txt file. You will need at least two mailboxes, one that contains only spam messages, and one that contains only non-spam messages. Make sure that each mailbox contains relatively the same number of e-mails. See also"Preparing Mailboxes for use with antispamseeder.exe".
    Note: If one mailbox contains substantially more e-mail messages than the other, the word counts will be skewed and content filtering may not function correctly.
  2. Create the spam word values within the file by entering the following command in the command prompt substituting the hostname and mailbox with your host name and the name of the mailbox that contains spam messages:
    antispamseeder.exe -spam -h<hostname> -m<mailbox>  
    
Example
If the host's name is "Host1", and the mailbox name is "spam", you would enter the following command:
antispamseeder.exe -spam -h<Host1> -<C:IMail\Host1\users\root\spam.mbx>

Note: The mailboxes should be placed in the same directory as antispamseeder.exe. If the mailboxes are in a separate directory, you must enter the full mailbox path.
  1. Create the non-spam word counts within the file by entering the following command in the command prompt, substituting the hostname and mailbox with your host name and the name of the mailbox that contains non-spam messages:
    antispamseeder.exe -good -h<hostname> -m<mailbox> 
    
Example
If the host is named "Host1" and the mailbox is named "good", you would enter the following command:
antispamseeder.exe -good -h<Host1> -m<C:\IMail\Host1\users\root\good.mbx> 
 

The antispam-table.txt in the host's directory is now updated with the new word counts.

Entering New Words into the Antispam-table.txt File

You can use antispamseeder.exe to enter new words into the antispam-table.txt file, as well as assign spam and non-spam word counts to the word. You may want to do this if you know that a specific word should be in the antispam-table.txt file that is not there.

To enter a new word into the antispam-table.txt file and assign word counts to it, do the following:

  1. From the command prompt, enter the following command:
    antispamseeder.exe -w<word> -c<word count> <-spam/-good>-h<hostname>  
     
    
    The word you enter should be a word that does not currently exist. For the word count, you will enter any value between 1 and the value that you have set for the Treat as a new word until its total occurrences exceeds option on the Advanced Statistical Filtering dialog box.
  2. When this is done, the queue manager is notified and the word counts contained in the antispam-table.txt file are automatically reloaded to include the word that you entered in the above command.

Changing word counts for individual words

The antispam-table.txt file, that is installed by default, is appropriate for most users. However, you may need to alter this file if we have identified words as spam that you do not consider to be spam, or vice versa. For example, the word "mortgage" is identified as spam because in our tests, it occurred 430 times in non-spam, and 9610 times in spam. However, at financial institutions, the word "mortgage" is a non-spam word that occurs frequently. In this case, you need to alter the antispam-table.txt file so that the anti-spam engine recognizes the word "mortgage" as non-spam.

  1. From the command prompt, enter the following command:
    antispamseeder.exe -w<word> -c<word count> [-spam/-good] -h<hostname> 
    
  2. When this is done, the queue manager is notified and the word counts contained in the antispam-table.txt file are automatically reloaded.
    Note: The word count must be positive.

Example

If you want to alter the entry for the word "graciously" in the antispam-table.txt file, so that it is treated as spam, enter the following command (where 10 represents the number of times the word "graciously" will be treated as if it had appeared in spam messages. ("Host1" represents the hostname, and "graciously" is the word).
In essence, you are altering the entry for the word "graciously" in the antispam-table.txt file, thus increasing the likelihood that this word will be identified as spam in future e-mails.
Before running the above command, the entry for this word looked like this in the antispam-table.txt file:
graciously,583326,62,2
After running the above command, the entry looks like this:
graciously,583326,62,10

Deleting Words from the Antispam-table.txt File

You can use antispamseeder.exe to delete words from a host's antispam-table.txt file, that occur infrequently. You may want to delete these words to save space, and speed up processing. This command works by eliminating all words that have occurred less than the number of times specified.

To determine if you need to perform this procedure, open the antispam-table.txt file, located in the host's directory, and see if there are a significant number of words that have occurred infrequently. See "Reading the Antispam-table.txt File" for information on how to determine this.

  1. From the command prompt, enter the following command:
    antispamseeder.exe -x -c<total word count> -h<hostname>
    
    
    Note: The number entered for the total word count must be positive.
  2. The words that have occurred fewer times than the total word count entered in the command are removed from the antispam-table.txt file.
Example
If you want to remove all words from the antispam-table.exe file that have occurred fewer than five times in all e-mail messages, enter the following command, where "Host" is the name of the host:
antispamseeder.exe -x -c<5> -h<Host> 
 

After running the above command, and reopening the antispam-table.txt file, you will notice that all words that had previously occurred less than five times are gone.

Merging antispam-table.txt Files

You can use the antispamseeder.exe utility to merge two antispam-table.txt files. This is useful when you have modified your antispam-table.txt file, but you want to download the latest updated file from the Ipswitch website, or for combining the antispam-table.txt files of several domains. Using the procedure below, you can retain your customizations while gaining new statistical information from more recent spam.

To Merge two antispam-table.txt Files:

  1. Identify which antispam-table.txt files you want to merge.
  2. Merge the two files by entering the following command in the command prompt substituting the hostname with the name of your mail host, and substituting antispam-table.txt with the name of the antispam table that you want to merge with that of the specified host: antispamseeder.exe -t<antispam-table.txt> -h<hostname>

Antispamseeder reads the specified antispam-table.txt file, and compares it to the antispam-table.txt file for the specified host. Words that are not listed in the host's file are added to it. Since the spam and non-spam word counts for each antispam-table.txt file are different, the antispamseeder utility recalculates the counts for each word that is added. Therefore, new words are added with their existing word counts, and existing words are recalculated to balance the word counts of the two files.

Example

Suppose that at installation, you chose to store the updated word statistics in the antispam-table-ini.txt file, and now you want to merge them with your existing antispam-table.txt file. Assuming that your host is named "Host1", you would enter the following command:

antispamseeder.exe -tantispam-table-ini.txt -hHost1 

Notes:
  • The antispam-table.txt files should be placed in the same directory as antispamseeder.exe. If they are in separate directories, you must enter the full path name for the files. (i.e. C:\IMail\Host2\antispam-table.txt)
  • You can rename the second file if you like, (i.e. antispam-table2.txt). This is only necessary if you want both files to reside in the same directory.

Creating a URL Domain Black List From a Mailbox

The easiest method to create a URL Domain Black List is to use the antispamseeder.exe utility. Antispamseeder will extract the domain names from the HTML code of collected spam messages. The procedure for doing this is described below.

Enter the following command:

Antispamseeder.exe -lo [-e<exclude>] -h<hostname> -m<mailbox> 
 

"Exclude" represents the exclude file. You must create an exclude text file if a mailbox contains domain names that you do not want to include in your URL Domain Black List (i.e. your domain name). The exclude file must be a text document and contain only one entry per line. It can contain both domain names and IP addresses, and must be placed in the host's top directory.

Example

Suppose you have a host named "Host1", and want to update the URL Domain Black List using the messages in a mailbox called "spam". You have also created an exclude file called excludedomains.txt. You would enter the following command:

antispamseeder.exe -lo -eexcludedomains.txt -hHost1 -mC:\Imail\Host1\Users\root\spam.mbx 

The new domain names will now be displayed in the URL Domain Black List box on the Content Filtering (HTML) tab.
Notes:
  • You do not need to enter www. before a domain name, as it is dropped when the domain name is put into the URL Domain Black List.
  • It is advised that you enter your domain name into the exclude file.
  • Unless you are certain that a domain name does not exist in the mailbox you are using with antispamseeder, you should include the -e<exclude> parameter every time you run a mailbox through antispamseeder with the -l or -lo parameter.

Antispamseeder examines each message in the "spam" mailbox for HTML code, specifically HREF and IMG SRC tags. When one of these tags is found, the primary domain name is extracted and added to the URL Domain Black List. The new URL domain names then appear under the URL Domain Black List on the Content Filtering (HTML) tab.

Creating a URL Domain Black List and antispam-table.txt

You can create an antispam-table.txt file and URL Domain Black List at the same time, by using the same mailbox to accomplish both tasks. Enter the following command:

antispamseeder.exe -l -e<exclude.txt> -h<hostname> -m<mailbox> 

Where:

Configuring the Anti-Spam Engine to Identify Wildcards

When the anti-spam engine scans an e-mail, it breaks the e-mail down into the individual words. Each character in each word is checked to make sure it is valid.The anti-spam engine does not recognize non-alphabetic characters (except hyphens), or numbers. When comparing words to the antispam-table.txt file, non-alphabetic characters and numbers are treated as a "-". So, if the word 2Sexy is found in an e-mail, it is treated as -sexy when it is compared to the word list.

If you want the anti-spam engine to identify such words as spam or non-spam, you must enter them into the antispam-table.txt file, using antispamseeder.exe. To do this, complete the following steps:

  1. From the command prompt, enter the following command:
    antispamseeder.exe -w<word> -c<word count> [-spam/-good] -h<hostname>  
     
    
See "Command Syntax" for explanations of each parameter.
  1. The word that you entered in the above command will now be identified as either spam or non-spam, depending on which parameter you entered.
    Note: The word count must be positive.

Example 1

If you want the anti-spam engine to identify the word 2Sexy as spam, add it to the antispam-table.txt file by entering the following command:

antispamseeder.exe -spam -w<-sexy> -c<100> -h<Host1.com> 
 

This command adds the word -sexy to the antispam-table.txt file as if it had occurred 100 times in spam e-mail. The word will now be treated as a spam indicator by the content filters.

Example 2

If you want the anti-spam engine to identify the word "g00d" (with zeros) as spam, you must enter the word into the antispam-table.txt file by running the following command, substituting dashes for the non-alphabetic characters. In this example, "Host1" is the hostname and "g- -d" is the word you want to be recognized as spam:

Once you run the above command, the anti-spam engine will recognize any variable of the word g- -d as spam, such as g00d, g**d etc. This command does not change the word count for the word "good" because it does not contain any non-alphabetic characters.

Preparing Mailboxes for use with antispamseeder.exe

Before a mailbox can be used by antispamseeder.exe to create or alter the antispam-table.exe file, several preliminary steps must be performed.

Failure to do the above mentioned items, may result in an inaccurate antispam-table.txt file, which will cause statistical filtering to malfunction.



Ipswitch, Inc.
http://www.ipswitch.com
TOC PREV NEXT INDEX
©Ipswitch 2004