dcsimg

Slicing Up Spam

One popular method for filtering unsolicited commercial email, or spam, is SpamAssassin (http://www.spamassassin.org). SpamAssassin is a rule-based, mail filtering program that identifies spam. By itself, or combined with Razor (http:// razor.sourceforge.net), a collaborative spam-tracking database, SpamAssassin can effectively eliminate spam from your daily doses of email.

1. I’m sick of spam! How can I get rid of it?

One popular method for filtering unsolicited commercial email, or spam, is SpamAssassin (http://www.spamassassin.org). SpamAssassin is a rule-based, mail filtering program that identifies spam. By itself, or combined with Razor (http:// razor.sourceforge.net), a collaborative spam-tracking database, SpamAssassin can effectively eliminate spam from your daily doses of email.

SpamAssassin is written in Perl, so the initial installation of SpamAssassin is as simple as installing a new Perl module from the Comprehensive Perl Archive Network (CPAN). As root, run the following commands (shown in bold):


# perl -MCPAN -e shell
cpan> o conf prerequisites_policy ask
cpan> install Mail::SpamAssassin
cpan> quit

According to the install document, “Perl 5.8 now uses Unicode internally by default, which causes trouble for SpamAssassin (and almost all other reasonably complex pieces of Perl code!)” So, if you’re using Red Hat, you should set the LANG environment variable to “U.S. English” (i.e., export LANG=en_US) before SpamAssassin is invoked.

Once SpamAssassin is installed, you can use it in one of two ways: through procmail, or as part of a mail transport agent (MTA) such as sendmail, postfix, or qmail. Let’s focus on the first choice. (Read the SpamAssassin documentation for help with MTAs.)

Because SpamAssassin is written in Perl, launching it each time that it’s needed can be very costly, especially if an individual or a system receives large amounts of email. For higher-volume traffic, it’s recommended that you use spamd, the SpamAssassin daemon, and spamc, the SpamAssassin client.

So, assuming you use procmail and spamc, filtering your spam is quite simple: just add a couple of lines to the beginning of your .procmailrc file:


# Filter my email, looking for spam
:0fw
* < 256000
| /usr/bin/spamc

# Save spam in a file to verify, in case
# of false positives!
:0:
* ^X-Spam-Flag: YES
/home/mascio/SPAM

Why save spam? Because the default installation of SpamAssassin may incorrectly flag a number of your email messages as spam — these kinds of errors are called false positives. If you save your spam, you can use false positives to fine-tune SpamAssassin so it won’t treat the email you’re interested in as spam. See the man page for sa-learn, the SpamAssassin training utility, for more information on fine-tuning SpamAssassin.

If you want to configure system-wide defaults for SpamAssassin, the spamassassin directory (found in one of /etc/mail, /usr/etc/mail, /usr/etc, /usr/local/etc, /usr/share, /usr/pkg/etc, /usr/ etc, or /etc) contains a collection of .cf files that set SpamAssassin’s tunable parameters. Read the documentation carefully before modifying any of the .cf files. Each user also has his or her own personal configuration files in the directory .spamassassin in their home directory. The file user_prefs allows the user to tune actions to recognize legitimate email and spam.

The simplest configuration changes are white lists and black lists. A white list is a list of email addresses or domains to pass through without checking. For example, some mailing lists from yahoogroups.com, groups.msn.com, or communities.msn. com have a tendency to be flagged as spam. By adding…


whitelist_to *@communities.msn.com
whitelist_to *@groups.msn.com
whitelist_to *@yahoogroups.com

… to either the system-wide or user-specific configuration files, SpamAssassin treats any email to addresses ending in those domains as legitimate email. However, note that any spam sent to the list is passed through as well!

The directive whitelist_from can be used to pass through email messages with a definite from address.

A black list is a list of domains and email addresses that SpamAssassin should block unconditionally. The syntax is similar to the white list directives:


blacklist_from *@spam.com.

The previous line blocks all email from spam.com.

Again, it’s a good idea to save spam and review it from time to time to make sure nothing unexpected is happening, and to train the SpamAssassin rule set.

2. How can I make rm, cp, and mv friendlier to novices?

Since each of rm, cp, and mv affect your files, it can be helpful, especially if you’re a novice, if the commands are more interactive and prompt for confirmation before any change to the file system is made. Luckily, that’s easily done: all three commands have a -i or –interactive option. When prompted, simply respond with either a “y” or “n.”

To make the option automatic, create three shell aliases in your shell start-up file (.profile for bash or .login for tcsh) like so:


# for bash, add these to .profile
alias rm=rm –interactive
alias cp=cp —interactive
alias mv=mv —interactive

# for tcsh, add these to .login
alias rm rm —interactive
alias cp cp —interactive
alias mv mv —interactive

With these aliases, every time you use rm, cp, or mv, each command prompts you before doing anything. Again, just type “y” or “n” to continue.

One caveat: shell aliases are ignored whenever you execute a command via its fully-qualified path name, such as /bin/rm. In that case, you must type /bin/rm -i …. The following commands demonstrate the usefulness of the aliases:


% ls
bar blue foo green grok red
% rm red
remove red? n
% rm red
remove red? y
% mv green grok
overwrite grok? y
% rm b*
remove bar? y
remove blue? y
% /bin/rm *
% ls
%

As you can see, the interactive option can prevent frustrating mistakes such as removing all of your files.



John R. S. Mascio is a systems and network manager. You can submit your questions to John at mascio@ryu.com.

Comments are closed.