D. J. Bernstein
Internet mail
Internet mail message header format

Address lists

An address list is a tokenizable field value representing a series of zero or more Internet mail addresses. It also includes a great deal of user-interface fluff. The exact format is described below.

Address lists are the most painful part of 822. It would have been easy to encode each address as a quoted string, but the 822 format is vastly more complicated.

Words

A word is a single token, either an atom or a quoted string.

Encoded box names

An encoded box name is a series of words and dots, with at least one dot between any two adjacent words. It represents a string, given by concatenating the strings represented by the words and a dot for each dot.

For example, here are six encoded box names, each representing the 8-byte string "John.Doe":

     John.Doe
     "John".Doe
     John."Doe"
     "John"."Doe"
     "John.Doe"
     "\J\o\h\n\.\D\o\e"
Beware that sendmail incorrectly distinguishes between these names.

822 says that an encoded box name must start with a word, alternate between words and dots, and end with a word. MH is unable to handle encoded box names with adjacent dots or ending with a dot. Such names nevertheless appear on occasion; I recommend that readers be prepared for any combination of dots.

Subdomains

A subdomain is a single token, either an atom or a domain literal.

Encoded domain names

An encoded domain name is a series of subdomains and dots, with at least one dot between any two adjacent subdomains. It represents a string, given by concatenating the strings represented by the subdomains and a dot for each dot.

For example, here are some encoded domain names:

     heaven.af.mil
     [127.0.0.1]
     []
     [\[].af.mil
The last example represents the 10-byte string "[[].af.mil". Beware that many mailers have trouble with domain literals. 822bis prohibits domain literals inside domain names except when the domain literal is the entire domain name.

822 says that an encoded domain name must start with a subdomain, alternate between subdomains and dots, and end with a subdomain. In practice, domain names sometimes show up ending with a dot; readers have to be prepared for this.

Encoded addresses

An encoded address is an encoded box name, an @, and an encoded domain name. It represents an Internet mail address in the obvious way. For example,
     God@heaven.af.mil
is an encoded address, representing the 17-byte Internet mail address "God@heaven.af.mil"; and
     "\"quote" . "and space" @[]  (dot).[\[].yp.  to
is an encoded address, representing a 29-byte Internet mail address starting with a double quote.

Domain names must be fully qualified: e.g., heaven.af.mil, not just heaven. If a recipient sees a message from God@heaven, for example, how does he know that his reply should go to God@heaven.af.mil?

On the other hand, many MUAs let the user type unqualified domain names in outgoing messages. Many MUAs also support an extension to the syntax for encoded addresses in outgoing messages: they allow an encoded box name without an @ or a domain name, so users at heaven.af.mil can simply type God instead of God@heaven.af.mil. The MUA has to add a fully qualified domain name before it sends the message.

How to create an encoded address

Here is some advice to writers on how to encode an address as a series of tokens.

Encode the box part as a series of atoms and dots if

Otherwise, encode the box part as a single quoted string, using backslashes as necessary.

I also recommend against putting any spaces, tabs, or comments between the tokens in an encoded address; they are handled incorrectly by Pine, mm, and some versions of mailx. Put them before or after the address:

     "\"quote.and space"@[].[\[].yp.to    (Might work)
And consider switching to an address that doesn't need any quoted strings or domain literals.

Routes

An at-domain is an @ followed by a domain. A route is one or more at-domains, separated by commas. For example,
     @heaven.af.mil,@uucp.local
is a route.

Routes were discouraged in 822 and heavily discouraged in RFC 1123. Their function is actively subverted by most Internet mailers. Writers should not generate them. However, they still show up occasionally; readers have to be able to parse the route syntax.

Bracketed addresses

A bracketed address is
  1. a < token;
  2. optionally, a route followed by a colon;
  3. an encoded address; and
  4. a > token.
For example,
     <@gateway.af.mil:God@heaven.af.mil>
is a bracketed address. The < and > are required; for example,
     @proxy.research.att.com:God@heaven.af.mil   (WRONG)
is not a valid bracketed address.

I recommend against inserting spaces, tabs, or comments inside a bracketed address:

     From: < innocent.user@heaven.af.mil >  (spotted in 1998)
When Microsoft Outlook 8.5 sees this, it incorrectly includes the spaces in the address.

Phrases

A phrase is a series of one or more tokens, each token being a word, a dot, or an @. For example:
     J. Q. Public
822 does not allow dots or @s in phrases, and MH doesn't understand them. I recommend that writers limit each phrase to a single word, preferably a quoted string:
     "J. Q. Public"
However, many messages include phrases with unquoted dots, so readers have to be prepared to handle them.

Targets

A target is either There is one encoded address inside a target, representing one Internet mail address.

The optional phrase in a target is normally used for a human-comprehensible name. Two examples:

     "The Boss" <God@heaven.af.mil>
     "Angels mailing list" <angels@heaven.af.mil>
Many writers instead put names into comments, since that's the standard USENET name format:
     God@heaven.af.mil (The Boss)
822bis discourages all use of comments inside targets.

Sun's mailx is unable to handle a bracketed address without a phrase:

     <God@heaven.af.mil>
I recommend that writers avoid using a bracketed address when there is no phrase.

Microsoft Outlook 2000 reportedly aborts header processing when it sees a phrase consisting of a single space:

     " " <God@heaven.af.mil>
I do not know what other phrases trigger this bug.

Target lists

A target list is a series of one or more targets, with one or more commas between each pair of adjacent targets. For example, the following target list contains two targets:
     The Boss <God@heaven.af.mil>, angels@heaven.af.mil
822 allows any number of commas between and around targets:
     ,,,God@heaven.af.mil,,,angels@heaven.af.mil,,,
Pine throws away the rest of the recipient list when it sees an extra comma. 822bis prohibits extra commas.

Users enjoy being able to leave out the commas, so it's helpful if the MUA can convert

     God@heaven.af.mil angels@heaven.af.mil      (WRONG)
into the correct target list:
     God@heaven.af.mil, angels@heaven.af.mil

Address groups

An address group is a phrase, a colon, an optional target list, and a semicolon. Two examples:
     the gang: angels@heaven.af.mil, saints@heaven.af.mil;
     recipient list not shown: ;
Beware that pre-1996 versions of sendmail will corrupt "phrase longer than one word: ;" into "phrase longer than one word:;@the.sendmail.host".

Address lists

An address list is a series of one or more targets or address groups, each adjacent pair separated by one or more commas. For example:
     people who asked: ;, other people who should know: ;

How to parse address lists

A useful (though accidental) feature of the 822 address list syntax is that address lists are easy to parse from right to left, with no backtracking.