More Mail Filtering with procmail

Welcome to the fourth and final installment of our look at administering electronic mail. Last month, we began talking about procmail, a powerful general purpose mail-filtering facility, and its ability to sort (and possibly reject) incoming messages based on any criteria you desire. This month we're going to look at some more advanced uses of procmail, such as identifying spam messages and scanning incoming mail for viruses.

Welcome to the fourth and final installment of our look at administering electronic mail. Last month, we began talking about procmail, a powerful general purpose mail-filtering facility, and its ability to sort (and possibly reject) incoming messages based on any criteria you desire. This month we’re going to look at some more advanced uses of procmail, such as identifying spam messages and scanning incoming mail for viruses.

Using procmail to Discard Spam

In order for procmail to successfully identify spam messages, you must be able to describe the characteristics of the messages you want to treat as spam and then write recipes accordingly.

Let’s take a look at several recipes that may be useful as models of handling spam. They happen to come from my own .procmailrc file, and so are applied only to my mail. As an administrator, you can choose to deal with spam at several levels: through the transport agent (e.g., checking against blacklists), at the system level, and/or on a per-user basis. In the case of procmail-based filtering, anti-spam recipes can be used in a system-wide procmailrc file or made available to users wanting to filter their own mail.

The following recipe is useful at the beginning of any procmail configuration file, as it formats mail headers into a predictable format:

# Make sure there’s a space after header names
|formail -z

The next two recipes provide simple examples to one approach to handling spam:

# Mail from mailing lists I subscribe to
* ^From: RISKS List Owner|\
^From: Mark Russinovich

# Any other mail not addressed to me is spam
# Warning: may discard BCC’s to me
* !To: .*aefrisch

Spam is discarded by the second recipe, which defines it as mail not addressed to me. The first recipe saves mail from a couple of specific senders to the file to-read. It serves to define exceptions to the second recipe since it saves messages from these two senders regardless of whom they are addressed to. This recipe is included since I want to retain the mail from the mailing lists corresponding to these senders, but which do not arrive addressed to me. In fact, there are other recipes that fall between these two, since there are quite a few exceptions to be handled before I should discard every message not addressed to me. Here are two of them:

# Mail not addressed to me that I know I want
* !To: .*aefrisch
* ^From: .*oreilly\.com|\
^From: .*marj@zoas\.org|\
^From: aefrisch

# Keep these just in case

The first recipe saves mail sent from the specified domain and the remote user marj@zoas.org via the first two condition lines. I include this recipe because I receive mail that is not addressed to me from these sources (and thus can resemble spam because of the way their mailer programs handle personal mailing lists). I also retain messages that I send to myself, which result from a CC or BCC on an outgoing message.

The second recipe saves files addressed to any variant of “Undisclosed Recipients” to a file called spam. Such mail is almost always spam, but once in a while I discover an exception.

The next few recipes in my configuration file handle mail that is addressed to me but is still spam. This recipe discards mail with any of the specified strings anywhere in the message headers:

# Vendors who won’t let me unsubscribe
* cdw buyer|spiegel|ebizmart|bluefly gifts|examcram

Such messages are spam sent by vendors from which I once bought something and who ignore my requests to stop sending me e-mail.

The next two recipes identify other spam messages based on the Subject: header:

# Assume screaming headers are spam
* ^Subject: [-A-Z0-9\?!._ ]*$

# More spam patterns
* ^Subject: .*(\?\?|!!|\$\$|viagra|make.*money|out.*debt)

The first recipe discards messages whose subjects consists entirely of uppercase letters, numbers, and a few other characters. The second message discards messages whose subject lines contain two consecutive exclamation marks, question marks or dollar signs, the word Viagra, make followed by money, or out followed by debt (with or without any intervening text in the latter two cases).

It is also possible to check mail senders against the contents of an external file containing spam addresses, partial addresses, or any other patterns to be matched:

# Check my blacklist (a la Timo Salmi)
* ? formail -x”From” -x”From:” -x”Sender:” -x”X-Sender:” \
-x”Reply-To:” -x”Return-Path” -x”To:” | \
egrep -i -f $HOME/.spammers

This recipe is a slightly simplified one from Timo Salmi. It uses formail to extract just the text from selected headers; it then pipes the resulting output into the egrep command, taking the patterns to match from the file specified to its -f option (-i makes matches case insensitive).

My spam identification techniques are very simple and therefore quite aggressive. Some circumstances call for more restraint than I am inclined to use. There are several ways of tempering such a drastic approach. The most obvious is to save spam messages to a file rather than simply discarding them. Another is to write more detailed recipes for identifying spam. Here is an example:

# Discard if From: = To:
SENTBY=’formail -z -x”From:”‘
* ! ^To: aefrisch
* ? ^To: .*$SENTBY

This recipe discards messages where the sender and recipient addresses are the same (a classic spam characteristic), and which are different from my address. The contents of the From: header are extracted to the SENTBY variable via the backquoted formail command. This variable is used in the second condition which examines the To: header for the same string. More complex versions of such a test are also possible (e.g., one could examine more headers than just From:).

Many spam recipes are available on the Web that have been created by various people. An excellent set was written by Jon Peterson (see http://www.eskimo.com/~rjb2/UNIX/procmail.html). It’s easy to include an external filter using the procmail configuration file’s INCLUDERC keyword:

# Jon Petersen’s filters

These entries define the SPAMBOX variable (used as the spam message destination by the filters) and invoke the various jon* filter files. (I did find filters 1c and 1d too aggressive and had to comment them out.)

See the links below for other filter sources as well as general procmail-related information.

Using procmail for Security Scanning

procmail’s pattern matching and message disposition features can also be used to scan incoming mail messages for security purposes: for viruses, unsafe macros, and so on. You can create your own recipes to do so, or you can take advantage of the ones that other people have written and generously made available.

In this brief section, we will take a look at Bjarni Einarsson’s Anomy Sanitizer (see http://mailtools.anomy.net/sanitizer.html). This package is written in Perl and requires a basic knowledge of Perl regular expressions to configure. Once configured, you can run the program via procmail using a recipe like this one:


This recipe uses the sanitizer.pl script as a filter on all messages (run synchronously), using the configuration file given as the script’s argument.

The package’s configuration file, conventionally /etc/sanitizer. cfg, contains two types of entries –general parameters indicating desired features and program behavior, and definitions of file/attachment types and the way they should be examined and modified.

Figure One illustrates a few examples of the first sort of configuration file entries.

Figure One: Sample Configuration File Entries – General Parameters

 # Global parameters
feat_log_inline = 1 # Append log to modified messages.
feat_log_stderr = 0 # Don’t log to standard error also.
feat_verbose = 0 # Keep logging brief.
feat_scripts = 1 # Sanitize incoming shell scripts.
feat_html = 1 # Sanitize active HTML content.
feat_forwards = 1 # Sanitize forwarded messages.

# Template for saved file names
file_name_tpl = /var/quarantine/saved-$F-$T.$$

The first group of entries specifies various aspects of sanitize.pl‘s behavior, including level of detail and destinations for its log messages, as well as whether certain types of message content should be “sanitized” (examined and potentially transformed to avoid security problems). The final entry specifies the location of the package’s quarantine area — the directory location where potentially dangerous parts of mail messages are stored after being removed.

The next set of entries, shown in Figure Two, enables file/attachment-extension-based scanning and specifies the number of groups of extension that will be defined, as well as default actions for all other types:

Figure Two: Sample Configuration File Entries – Definitions

feat_files = 1                         # Use type-based scanning.
file_list_rules = 3 # We will define 3 groups.
# Set defaults for all other types
file_default_policy = defang # Rewrite risky constructs.
file_default_filename = unnamed.file # Use if no file name given.

A sanitizer policy indicates how a mail part/attachment will be treated when it is encountered. Table One contains the most important defined sanitizer policies.

Table One: Defined Sanitizer Policies

mangleRewrite the file name to avoid reference to a potentially dangerous extension (e.g., something of the form DEFANGED-nnnnn).
defangRewrite the file content and rename it to eliminate potentially dangerous items. For example, Java Scripts in HTML attachments are neutralized by rewriting their opening line:
<DEFANGED_SCRIPT language=JavaScript>
acceptAccept the attachment as it.
dropDelete the attachment without saving it.
saveRemove the attachment but save it to the quarantine directory.

We’ll now turn to some example file type definitions. The set of entries in Figure Three defines the first file type as the filename winmail.dat (the composite mail message and attachment archive generated by some Microsoft mailers) and all files with the extensions .exe, .vbs, .vbe, .com, .chm, .bat, .sys, or .scr.

Figure Three: Defining File Types – Part I

# Always quarantine these file types
file_list_1_scanner = 0
file_list_1_policy = save
file_list_1 = (?i)(winmail\.dat
file_list_1 += |\.(exe|vb[es]|c(om|hm)|bat|s(ys|cr))*)$

Notice that the file_list_1 parameter defines the list of file names and extensions using Perl regular expression syntax. The policy for this group of files is save, meaning that files of these types are always removed from the mail message and saved to the quarantine area. The attachment is replaced by some explanatory text within the modified mail message:

NOTE: An attachment was deleted from this part of the message because it failed one or more checks by the virus scanning system. The file has been quarantined on the mail server, with the following file name:


This message is a bit inaccurate, since in this case the attachment was not actually scanned for viruses but merely identified by its file type, but the information that the user will need is included.

Clearly, it will be necessary to inform users about any attachment removal and/or scanning policies that you institute. It will also be helpful to provide them with alternative methods for getting files of prohibited types that they may actually need. For example, they can be taught to send and receive word processing as Rich Text Format files rather than, say, MS Word documents.

Figure Four shows two more examples of file group definitions. The first section of entries defines some file types that can be passed through unexamined (via the accept policy). The second group defines some extensions for which we want to perform explicit content scanning for dangerous items. These include viruses and embedded macros in Microsoft documents. The file_list_3 extension list includes one corresponding to various Microsoft documents and templates (e.g., .doc, .xls, .dot, .ppt, and so on) as well as a variety of popular archive extensions.

Figure Four: Defining File Types – Part II

# Allow these file types through: images, music, sound, etc.
file_list_2_scanne = 0
file_list_2_policy = accept
file_list_2 = (?i)\.(gif|jpe?g|pn[mg]
file_list_2 += |x[pb]m|dvi|e?ps|p(df|cx)|bmp
file_list_2 += |mp[32]|wav|au|ram?
file_list_2 += |avi|mov|mpe?g)*$

# Scan these file types for macros, viruses
file_list_3_scanner = 0:1:2:builtin 25
file_list_3_policy = accept:save:save:defang
file_list_3 = (?i)\.(xls|d(at|oc|ot)|p(pt|l)|rtf
file_list_3 += |ar[cj]|lha|[tr]ar|rpm|deb|slp|tgz
file_list_3 += |(\.g?z|\.bz\d?))*$

The scanner and policy parameters for this file group now contain four entries. The file_list_3_scanner parameter’s four colon-separated subfields define four sets of return values for the specified scanning program: the values 0, 1, and 2 and all other return values resulting from running the built-in program. The final subfield specifies the program to run — here it is a keyword requesting sanitizer.pl‘s built-in scanning routines with the argument 25, as well as serving as a placeholder for all other possible return values that are not explicitly named in earlier subfields (each subfield can hold a single or comma-separated list of return values).

The subfields of the file_list_ policy_3 parameter define the policy to be applied when each return value is received. In this case, we have the behavior shown in Figure Five.

Figure Five: Return Policies

Return Value    Action
0 Accept the attachment.
1 Remove and save the attachment.
2 Remove and save the attachment.
all others Modify the attachment to munge any dangerous constructs.

By default, the sanitizer.pl script checks macros in Microsoft documents for dangerous operations (e.g., attempting to modify the system registry or the normal template). However, I want to be more conservative and quarantine all documents containing any macros. In order to do so, one must modify the script’s source code. Figure Six shows a quick and dirty solution to my problem, which consists of adding a single line to the script.

Figure Six: Quarantine All Macros

# Lots of while loops here – we replace the leading \000 boundary
# with ‘x’ characters to ensure this eventually completes.
$score += 99 while ($buff =~ s/\000Macro recorded/x$1/i);
$score += 99 while ($buff =~ s/\000(VirusProtection)/x$1/i);

The first line incrementing $score was added to detect macros that have been recorded by the user within the document. This solution is not ideal; there are other methods of creating macros which would avoid this string, but it illustrates what is involved in extending this script if needed.

Debugging procmail

Setting up procmail configuration files can be both addictive and time-consuming. To make debugging easier, procmail provides some logging capabilities, specified with these configuration file entries:


These variables set the path to the log file and specify that all messages directed to files be logged. If you would like even more information, including a recipe-by-recipe summary for each incoming message, add this entry also:


Here are some additional hints for debugging procmail recipes:

* Isolate everything you can from the real mail system. Use a test directory as MAILDIR when developing new recipes to avoid clobbering any real mail, and place them in a separate configuration file. Similarly, use a test file of messages rather than any real mail file by using a command like this one:

cat file | formail -s procmail rcfile

This command allows you to use the prepared message file and also to specify the alternate configuration file.

* When testing spam-related recipes, send messages to a file while you are debugging rather than to /dev/ null.

* If you are trying to test the matching conditions part of a recipe, then use a simple, uniquely-named file as the destination and incorporate the more complex destination expression only when you have verified that the conditions are constructed correctly.

You can also run the sanitizer.pl script to test your configuration with a command like this:

# cat mail-file | /path/sanitizer.pl config-file

You will also want to include this line within the configuration file:

feat_verbose = 1       # Produce maximum detail in log messages.

Wrapping Up

As you can see, procmail is an incredibly powerful and flexible utility that provides a mail administrator with a tremendous amount of control over his environment. After four months of looking at e-mail administration, we’re ready to move on to new topics. Tune in next month, when we’ll take an in-depth look at adding disks to your system and managing filesystems. In the meantime, have fun experimenting with procmail.

Sources of Additional Information

Here are some other useful procmail-related Web pages:

Nancy McGough/Infinite Ink’s “Procmail Quick Start” page


Timo Salmi’s wonderful “Procmail Tips and Recipes” page


The official procmail FAQ page


A very large collection of procmail-related links


Æleen Frisch is the author of Essential System Administration. She can be reached at aefrisch@lorentzian.com.

Comments are closed.