|
|
Updating the antispam-table.txt File
The antispamseeder.exe utility, located in the IMail top directory, is used to manage the spam and non-spam word counts for a host. These word counts are stored in the antispam-table.txt file, which is used by the statistical filter to determine which messages are spam. Antispamseeder.exe updates this file based on messages in a mailbox or words entered in the command line.You can use this utility to modify the antispam-table.txt file in the following ways:
- Re-assign the word counts contained in the antispam-table.txt file, when a word is incorrectly identified as spam, or vice versa.
- Create new word counts for a host specific antispam-table.txt file.
- Delete words that do not occur often from the antispam-table.txt file to save storage space and decrease processing time.
- Add a new word to the antispam-table.txt file, and enter word counts for it.
- Enter wildcards (i.e. g* *d) into the antispam-table.txt file so that the anti-spam engine will identify such words as either spam or non-spam.
Warning: Never update the antispam-table.txt file manually. This will cause content filtering to fail.
Command Syntax
The following parameters can be placed in any order within the antispamseeder.exe command. Note that each parameter does not contain spaces.
antispamseeder.exe [-h<hostname>] [-w<word>] [-c<word count>] [-x] [-spam|-good]Reading the Antispam-table.txt File
The antispam-table.txt file is the file that contains each word that content filtering uses to determine if a message is spam. Beside each word there are three numbers. The first number is an identifier assigned by the anti-spam engine. The second number is the number of times that the word has occurred in non-spam e-mail messages. The third number is the number of times that the word has occurred in spam e-mail messages.
You may find that the words and values contained in the antispam-table.txt file are not appropriate for your use. In this case, you can customize the file based on your needs by using the antispamseeder.exe utility. See also "Customizing the Antispam-table.txt File". The following image is an excerpt from the antispam-table.txt file.
Modifying the antispam-table.txt File
The antispamseeder.exe utility, located in the IMail top directory, is used to manage the spam and non-spam word counts for a host. These word counts are stored in the antispam-table.txt file, which is used by the statistical filter to determine which messages are spam. Antispamseeder.exe updates this file based on messages in a mailbox or words entered in the command line.You can use this utility to modify the antispam-table.txt file in the following ways:
- Merge the contents of two antispam-table.txt files into one.
- Create a URL Domain Black List by extracting domain names from the messages in a mailbox.
- Re-assign the word counts contained in the antispam-table.txt file, when a word is incorrectly identified as spam, or vice versa.
- Create new word counts for a host specific antispam-table.txt file.
- Delete words that do not occur often from the antispam-table.txt file to save storage space and decrease processing time.
- Add a new word to the antispam-table.txt file, and enter word counts for it.
- Simultaneously create a URL Domain Black List and an antispam-table.txt file using the messages in a mailbox.
- Enter wildcards (i.e. g* *d) into the antispam-table.txt file so that the anti-spam engine will identify such words as either spam or non-spam.
Warning: Never update the antispam-table.txt file manually. This will cause content filtering to fail. Resolving Incorrectly Identified E-Mail
Whenever IMail Server incorrectly identifies a mail message as spam (false positive), you can use antispamseeder.exe to add statistical information about those e-mails into the antispam-table.txt file to rebalance the spam and non-spam word counts. This increases the likelihood that the e-mail will be correctly identified in the future.
Resolving False Positives in the Antispam-table.txt File
When you have messages that are incorrectly identified as spam, you can place the messages in a mailbox and add the entire contents of the mailbox to the antispam-table.txt file at once. The following procedure explains what to do when legitimate messages have been identified as spam:
- Place all of the incorrectly identified e-mail (non-spam) in a single mailbox. Make sure that this mailbox contains only non-spam.
- Prepare the mailbox. See "Preparing Mailboxes for use with antispamseeder.exe".
- Alter the non-spam word counts within the file by entering the following command in the command prompt substituting the hostname and mailbox with your host name and the name of the mailbox that contains the incorrectly identified (non-spam) messages:
antispamseeder.exe -good -h<hostname> -m<mailbox>- The antispam-table.txt in the host's directory is updated with the new word counts.
- Example
- If you had a host named "Host1" and a mailbox named "good", you would enter the following command:
antispamseeder.exe -good -hHost1 -mC:\IMail\Host1\users\root\good.mbxUsing one Antispam-table.txt File for All Hosts
To use the antispam-table.txt file from the primary host's directory instead of the secondary host's directory, do the following:
- In the left panel, expand the localhost and select expand a host that has an IP address.
- Under the host, select the Antispam folder.
- In the right panel, click the Content Filtering tab and select Use Primary AntiSpam Table.
Note: This option is enabled by default upon installation. - The anti-spam engine reads the antispam-table.txt file from the primary host's directory each time that content filtering is performed on a message. Therefore, this file will not appear in the secondary host's directory.
Creating Separate Antispam-table.txt Files for Hosts
There may be occasions where a secondary host does not want to use the primary host's antispam-table.txt file because the primary host considers words to be spam that the secondary host doesn't. The host needs to create a separate antispam-table.txt file that applies only to the Host. There are two steps to this process:
- Setup IMail Server to use a copy of the antispam-table-ini.txt file for the host. This file is a copy of the spam and non-spam word counts that are created during installation (antispam-table.txt). This file contains the initial word values, it does not contain changes that were made by the primary host.
- If you want to further refine the new spam and non-spam word counts, you can use the antispamseeder.exe utility to customize the antispam-table.txt file for the host. This option takes the above option a step further, by customizing the spam and non-spam word counts specifically for the secondary host.
To create a separate antispam-table.txt file for a host, complete the following steps:
- In the left panel, expand the localhost, and then expand a host that has an IP address.
- Under the host, select the Antispam folder.
- In the right panel, click the Content Filtering tab and clear the Use Primary AntiSpam Table check box.
- When the above option is cleared, the antispam-table.txt file is placed in the secondary host's directory, and is ready for modification. This file is a copy of the primary host's antispam-table.txt file that was created during the installation process.
- Click OK at the bottom of the Content Filtering tab to save your changes.
- Now that you have created a new antispam-table.txt file, you need to modify the word counts contained in it. To modify the word counts, you must use the antispamseeder.exe. For information on how to do this, see "Customizing the Antispam-table.txt File" below.
Customizing the Antispam-table.txt File
There are several reasons why you may want to customize the antispam-table.txt file. For example, perhaps the administrator for the primary host is not satisfied with the antispam-table.txt file that ships with the product, and wants to modify it. Or, maybe a secondary host wants to define certain words as spam that the primary host wants to define as non-spam. In these cases, the antispam-table.txt file will need to be altered. The following procedure explains how to do this.
To create new word counts within the antispam-table.txt file, you must use words from spam and non-spam e-mail messages.
- Before you begin, identify which mailboxes you will use to create the antispam-table.txt file. You will need at least two mailboxes, one that contains only spam messages, and one that contains only non-spam messages. Make sure that each mailbox contains relatively the same number of e-mails. See also"Preparing Mailboxes for use with antispamseeder.exe".
Note: If one mailbox contains substantially more e-mail messages than the other, the word counts will be skewed and content filtering may not function correctly. - Create the spam word values within the file by entering the following command in the command prompt substituting the hostname and mailbox with your host name and the name of the mailbox that contains spam messages:
antispamseeder.exe -spam -h<hostname> -m<mailbox>
- Example
- If the host's name is "Host1", and the mailbox name is "spam", you would enter the following command:
antispamseeder.exe -spam -h<Host1> -<C:IMail\Host1\users\root\spam.mbx>
Note: The mailboxes should be placed in the same directory as antispamseeder.exe. If the mailboxes are in a separate directory, you must enter the full mailbox path.
- Create the non-spam word counts within the file by entering the following command in the command prompt, substituting the hostname and mailbox with your host name and the name of the mailbox that contains non-spam messages:
antispamseeder.exe -good -h<hostname> -m<mailbox>
- Example
- If the host is named "Host1" and the mailbox is named "good", you would enter the following command:
antispamseeder.exe -good -h<Host1> -m<C:\IMail\Host1\users\root\good.mbx>The antispam-table.txt in the host's directory is now updated with the new word counts.
Entering New Words into the Antispam-table.txt File
You can use antispamseeder.exe to enter new words into the antispam-table.txt file, as well as assign spam and non-spam word counts to the word. You may want to do this if you know that a specific word should be in the antispam-table.txt file that is not there.
To enter a new word into the antispam-table.txt file and assign word counts to it, do the following:
- From the command prompt, enter the following command:
antispamseeder.exe -w<word> -c<word count> <-spam/-good>-h<hostname>
- The word you enter should be a word that does not currently exist. For the word count, you will enter any value between 1 and the value that you have set for the Treat as a new word until its total occurrences exceeds option on the Advanced Statistical Filtering dialog box.
- When this is done, the queue manager is notified and the word counts contained in the antispam-table.txt file are automatically reloaded to include the word that you entered in the above command.
Changing word counts for individual words
The antispam-table.txt file, that is installed by default, is appropriate for most users. However, you may need to alter this file if we have identified words as spam that you do not consider to be spam, or vice versa. For example, the word "mortgage" is identified as spam because in our tests, it occurred 430 times in non-spam, and 9610 times in spam. However, at financial institutions, the word "mortgage" is a non-spam word that occurs frequently. In this case, you need to alter the antispam-table.txt file so that the anti-spam engine recognizes the word "mortgage" as non-spam.
- From the command prompt, enter the following command:
antispamseeder.exe -w<word> -c<word count> [-spam/-good] -h<hostname>- When this is done, the queue manager is notified and the word counts contained in the antispam-table.txt file are automatically reloaded.
Note: The word count must be positive.
- If you want to alter the entry for the word "graciously" in the antispam-table.txt file, so that it is treated as spam, enter the following command (where 10 represents the number of times the word "graciously" will be treated as if it had appeared in spam messages. ("Host1" represents the hostname, and "graciously" is the word).
![]()
- In essence, you are altering the entry for the word "graciously" in the antispam-table.txt file, thus increasing the likelihood that this word will be identified as spam in future e-mails.
- Before running the above command, the entry for this word looked like this in the antispam-table.txt file:
- graciously,583326,62,2
- After running the above command, the entry looks like this:
- graciously,583326,62,10
Deleting Words from the Antispam-table.txt File
You can use antispamseeder.exe to delete words from a host's antispam-table.txt file, that occur infrequently. You may want to delete these words to save space, and speed up processing. This command works by eliminating all words that have occurred less than the number of times specified.
To determine if you need to perform this procedure, open the antispam-table.txt file, located in the host's directory, and see if there are a significant number of words that have occurred infrequently. See "Reading the Antispam-table.txt File" for information on how to determine this.
- From the command prompt, enter the following command:
antispamseeder.exe -x -c<total word count> -h<hostname>
Note: The number entered for the total word count must be positive. - The words that have occurred fewer times than the total word count entered in the command are removed from the antispam-table.txt file.
- Example
- If you want to remove all words from the antispam-table.exe file that have occurred fewer than five times in all e-mail messages, enter the following command, where "Host" is the name of the host:
antispamseeder.exe -x -c<5> -h<Host>After running the above command, and reopening the antispam-table.txt file, you will notice that all words that had previously occurred less than five times are gone.
Merging antispam-table.txt Files
You can use the antispamseeder.exe utility to merge two antispam-table.txt files. This is useful when you have modified your antispam-table.txt file, but you want to download the latest updated file from the Ipswitch website, or for combining the antispam-table.txt files of several domains. Using the procedure below, you can retain your customizations while gaining new statistical information from more recent spam.
To Merge two antispam-table.txt Files:
- Identify which antispam-table.txt files you want to merge.
- Merge the two files by entering the following command in the command prompt substituting the hostname with the name of your mail host, and substituting antispam-table.txt with the name of the antispam table that you want to merge with that of the specified host: antispamseeder.exe -t<antispam-table.txt> -h<hostname>
Antispamseeder reads the specified antispam-table.txt file, and compares it to the antispam-table.txt file for the specified host. Words that are not listed in the host's file are added to it. Since the spam and non-spam word counts for each antispam-table.txt file are different, the antispamseeder utility recalculates the counts for each word that is added. Therefore, new words are added with their existing word counts, and existing words are recalculated to balance the word counts of the two files.
Suppose that at installation, you chose to store the updated word statistics in the antispam-table-ini.txt file, and now you want to merge them with your existing antispam-table.txt file. Assuming that your host is named "Host1", you would enter the following command:
antispamseeder.exe -tantispam-table-ini.txt -hHost1
Creating a URL Domain Black List From a Mailbox
The easiest method to create a URL Domain Black List is to use the antispamseeder.exe utility. Antispamseeder will extract the domain names from the HTML code of collected spam messages. The procedure for doing this is described below.
Antispamseeder.exe -lo [-e<exclude>] -h<hostname> -m<mailbox>"Exclude" represents the exclude file. You must create an exclude text file if a mailbox contains domain names that you do not want to include in your URL Domain Black List (i.e. your domain name). The exclude file must be a text document and contain only one entry per line. It can contain both domain names and IP addresses, and must be placed in the host's top directory.
Suppose you have a host named "Host1", and want to update the URL Domain Black List using the messages in a mailbox called "spam". You have also created an exclude file called excludedomains.txt. You would enter the following command:
antispamseeder.exe -lo -eexcludedomains.txt -hHost1 -mC:\Imail\Host1\Users\root\spam.mbxThe new domain names will now be displayed in the URL Domain Black List box on the Content Filtering (HTML) tab.
Antispamseeder examines each message in the "spam" mailbox for HTML code, specifically HREF and IMG SRC tags. When one of these tags is found, the primary domain name is extracted and added to the URL Domain Black List. The new URL domain names then appear under the URL Domain Black List on the Content Filtering (HTML) tab.
Creating a URL Domain Black List and antispam-table.txt
You can create an antispam-table.txt file and URL Domain Black List at the same time, by using the same mailbox to accomplish both tasks. Enter the following command:
antispamseeder.exe -l -e<exclude.txt> -h<hostname> -m<mailbox>
- Exclude represents the Exclude file.
- Hostname is the hostname of the host for which you are updating the antispam-table.txt file and the URL Domain Black List.
- Mailbox is the mailbox that contains the spam messages you want to use to create the URL Domain Black List, and the antispam-table.txt file. The mailbox must contain only spam messages, because all domain names in the URL Domain Black List are considered spam domains.
Configuring the Anti-Spam Engine to Identify Wildcards
When the anti-spam engine scans an e-mail, it breaks the e-mail down into the individual words. Each character in each word is checked to make sure it is valid.The anti-spam engine does not recognize non-alphabetic characters (except hyphens), or numbers. When comparing words to the antispam-table.txt file, non-alphabetic characters and numbers are treated as a "-". So, if the word 2Sexy is found in an e-mail, it is treated as -sexy when it is compared to the word list.
If you want the anti-spam engine to identify such words as spam or non-spam, you must enter them into the antispam-table.txt file, using antispamseeder.exe. To do this, complete the following steps:
- From the command prompt, enter the following command:
antispamseeder.exe -w<word> -c<word count> [-spam/-good] -h<hostname>
- See "Command Syntax" for explanations of each parameter.
- The word that you entered in the above command will now be identified as either spam or non-spam, depending on which parameter you entered.
Note: The word count must be positive. If you want the anti-spam engine to identify the word 2Sexy as spam, add it to the antispam-table.txt file by entering the following command:
antispamseeder.exe -spam -w<-sexy> -c<100> -h<Host1.com>This command adds the word -sexy to the antispam-table.txt file as if it had occurred 100 times in spam e-mail. The word will now be treated as a spam indicator by the content filters.
If you want the anti-spam engine to identify the word "g00d" (with zeros) as spam, you must enter the word into the antispam-table.txt file by running the following command, substituting dashes for the non-alphabetic characters. In this example, "Host1" is the hostname and "g- -d" is the word you want to be recognized as spam:
![]()
Once you run the above command, the anti-spam engine will recognize any variable of the word g- -d as spam, such as g00d, g**d etc. This command does not change the word count for the word "good" because it does not contain any non-alphabetic characters.
Preparing Mailboxes for use with antispamseeder.exe
Before a mailbox can be used by antispamseeder.exe to create or alter the antispam-table.exe file, several preliminary steps must be performed.
- First, make sure that each mailbox contains the same type of e-mail messages. For example, one mailbox should contain only spam messages, and another mailbox should contain only non-spam messages.
- Next, make sure that all mailboxes, contain relatively the same number of e-mail messages. If one mailbox contains substantially more e-mail messages than the other, the word counts will be skewed and content filtering may not function correctly.
- Finally, you need to clean up any forwarded e-mail messages. Sometimes, a mailbox will contain messages that were forwarded to you by a user (i.e. false positives that the user wants added to the non-spam word counts). If this is the case, you will need to examine each forwarded e-mail and remove any information that was not included in the original e-mail, before using the mailbox with antispamseeder.exe. Information that must be removed is anything that was inserted by the user's e-mail client when the message was forwarded, such as the following:
Failure to do the above mentioned items, may result in an inaccurate antispam-table.txt file, which will cause statistical filtering to malfunction.
![]() Ipswitch, Inc. http://www.ipswitch.com |
| ©Ipswitch 2004 | |||