Overwhelmed by email? Or is your setup okay? We'll bet you can do better. Here are some power tools to turn your post from a pain to a pleasure.
If you’re like most IT professionals, you get floods of email. Some is probably worthless, but you need to read, organize, and store all the rest. All email programs — sometimes referred to as Mail User Agents or MUAs — let you read and handle incoming mail and send new messages. Most modern MUAs have lots of handy features, but most of them are also crippled by their “one-size-fits-all” interface, with only the features that the MUA’s developers put there. (And if Microsoft wrote the MUA, you’re even more restricted by a proprietary storage format and vulnerabilities that you can’t fix yourself.)
There’s no one “perfect” solution for every email user, from casual to pro. So, in this article, we won’t try to choose one. Instead, we’ll help you make a system that fits you. By learning the fundamentals of how email is transmitted, received, edited, and stored, you can build your own custom solution. Or, if you use an “off-the-shelf” package, you’ll still be able to understand what’s going on “under the hood.” Let’s dig in!
What’s in a Message?
To automate email processing, it helps to know how a message is constructed and delivered. Armed with that information, you can then parse messages, search through them, and edit them with a script or a mail-filtering system.
An email message is a series of lines of characters. It has two parts, the header and the body, which are separated by an empty line. The entire message is transmitted in an envelope, typically via the SMTP or ESMTP protocol.
An email message header has meta-information about the message and it’s a series of header fields. Listing One shows some fields from a typical header. The email header begins on the first line of the message. Header fields start in the first character position of a line, and long fields can be split across lines as long as continuation lines are indented with space or tab characters — as the Received: cc: and Content-Type fields are in Listing One. The header ends with an empty line (no whitespace allowed). (If you’re wondering why there’s a plus sign + in the cc: header, see the sidebar “Power Tip for Mail Administrators.”)
The other part of a message is the body. Originally, a message body was just a series of lines of plain seven-bit ASCII text. The MIME standard extended the seven-bit email world to carry non-English languages and data reliably. If a message header has a MIME-Version field, the body is in
MIME format; Content-Type describes the body format.
Mail Folders in mbox Format
Most email systems store multiple messages in a folder or mailbox — which, on many systems, is actually a plain-text file in mbox format. Each message begins with a separator line composed of the word From followed by a space, the envelope sender address, and the date received. For example, From jpeek Thu Jun 26 08:15:09 2003 is a separator line. Messages often end with an empty line, but this isn’t required. Listing Two shows a typical folder in mbox format with three (short) messages.
Listing Two: Three messages in an mbox-format folder
From email@example.com Thu Jun 26 05:03:59 2003
Received: from mail.foo.xyz ([220.127.116.11])
by mail.jpeek.com (Postfix) with ESMTP
I<…most of header and body omitted…>
From jpeek Thu Jun 26 08:15:09 2003
Received: from jpeek.com (kumquat.jpeek.com
…most of header and body omitted…
and thanks for the great report, Joe.
>From now on, let’s do them just like this!
From firstname.lastname@example.org Thu Jun 26 08:22:33 2003
Received: from msater.ru ([18.104.22.168])
…most of header and body omitted…
to get rich <EM>fast</EM>!!!!!!
Notice that the last line of the second message starts with a greater-than (>) character. This line was escaped upon delivery to the folder because the original message line started with From and a space; left unchanged, this line would (wrongly) act as a message separator. This is one problem with the mbox format. The sidebar “Mail, MH-style” explains one solution.
mbox folder format, and the giant monolithic MUAs that have grown up to handle those folders, have disadvantages. Folders with large messages, or with lots of messages, become huge files that take time to parse and rearrange. Message lines starting with “From” must be escaped. Meta-information must be kept somewhere else. For instance, IMAP servers create a first message with a subject like “Do not remove this message” and store folder information in that message.
An alternative with a lot of advantages is the MH format. MH stores each message in a separate file, so there’s no separator line needed. An MH message folder is simply a Linux directory full of message files, where the filename (1, 2, etc.) is the message number. Using MH format, removing and reordering messages can be done with simple Linux utilities like rm and mv; you can also use MH-specific utilities to delete and renumber individual messages.
Just as directories can have subdirectories, an MH folder can have subfolders to an unlimited depth. Thanks to Linux filesystem hard links, MH can also store the same message in multiple folders, letting you organize messages into various orderings without using any additional disk space.
MH is so different that all of its uses aren’t obvious at first: you have to “break the mold,” the mindset of using other MUAs. It’s also 25 years old, and it’s showing its age in some ways. Still, if you have a lot of email, MH is worth careful consideration. There’s more at http://www.jpeek.com/email/mh.
Linux Magazine /
July 2003 / FEATURES