This documentation is for Dovecot v2.x, see wiki1 for v1.x documentation.

Mbox Mailbox Format

Usually UNIX systems are configured by default to deliver mails to /var/mail/username or /var/spool/mail/username mboxes. In IMAP world these files are called INBOX mailboxes. IMAP protocol supports multiple mailboxes however, so there needs to be a place for them as well. Typically they're stored in ~/mail/ or ~/Mail/ directories.

The mbox file contains all the messages of a single mailbox. Because of this, the mbox format is typically thought of as a slow format. However with Dovecot's indexing this isn't true. Only expunging messages from the beginning of a large mbox file is slow with Dovecot, most other operations should be fast. Also because all the mails are in a single file, searching is much faster than with maildir.

Modifications to mbox may require moving data around within the file, so interruptions (eg. power failures) can cause the mbox to break more or less badly. Although Dovecot tries to minimize the damage by moving the data in a way that data should never get lost (only duplicated), mboxes still aren't recommended to be used for important data.

Locking

Locking is a mess with mboxes. There are multiple different ways to lock a mbox, and software often uses incompatible locking. See MboxLocking for how to check what locking methods some commonly used programs use.

There are at least four different ways to lock a mbox:

Dotlock

Another problem with dotlocks is that if the mailboxes exist in /var/mail/, the user may not have write access to the directory, so the dotlock file can't be created. There are a couple of ways to work around this:

Deadlocks

If multiple lock methods are used, which is usually the case since dotlocks aren't typically used for read locking, the order in which the locking is done is important. Consider if two programs were running at the same time, both use dotlock and fcntl locking but in different order:

Now both of them are waiting for each others locks. Finally after a couple of minutes they time out and fail the operation.

Directory Structure

By default, when listing mailboxes, Dovecot simply assumes that all files it sees are mboxes and all directories mean that they contain sub-mailboxes. There are two special cases however which aren't listed:

Because it's not possible to have a file which is also a directory, it's not normally possible to create a mailbox and child mailboxes under it.

However if you really want to be able to have mailboxes containing both messages and child mailboxes under mbox, then Dovecot can be configured to do this, subject to certain provisos; see MboxChildFolders.

Dovecot's Metadata

Dovecot uses C-Client (ie. UW-IMAP, Pine) compatible headers in mbox messages to store metadata. These headers are:

Whenever any of these headers exist, Dovecot treats them as its own private metadata. It does sanity checks for them, so the headers may also be modified or removed completely. None of these headers are sent to IMAP/POP3 clients when they read the mail.

The MTA, MDA or LDA should strip all these headers case-insensitively before writing the mail to the mbox.

Only the first message contains the X-IMAP or X-IMAPbase header. The difference is that when all the messages are deleted from mbox file, a "pseudo message" is written to the mbox which contains X-IMAP header. This is the "DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA" message which you hate seeing when using non-C-client and non-Dovecot software. This is however important to prevent abuse, otherwise the first mail which is received could contain faked X-IMAPbase header which could cause trouble.

If message contains X-Keywords header, it contains a space-separated list of keywords for the mail. Since the same header can come from the mail's sender, only the keywords are listed in X-IMAP header are used.

The UID for a new message is calculated from "last used UID" in X-IMAP header + 1. This is done always, so fake X-UID headers don't really matter. This is also why the pseudo message is important. Otherwise the UIDs could easily grow over 231 which some clients start treating as negative numbers, which then cause all kinds of problems. Also when 232 is exceeded, Dovecot will also start having some problems.

Content-Length is used as long as another valid mail starts after that many bytes. Because the byte count must be exact, it's quite unlikely that abusing it can cause messages to be skipped (or rather appended to the previous message's body).

Status and X-Status headers are trusted completely, so it's pretty good idea to filter them in LDA if possible.

Dovecot's Speed Optimizations

Updating messages' flags and keywords can be a slow operation since you may have to insert a new header (Status, X-Status, X-Keywords) or at least insert data in the header's value. Some mbox MUAs do this simply by rewriting all of the mbox after the inserted data. If the mbox is large, this can be very slow. Dovecot optimizes this by always leaving some space characters after some of its internal headers. It can use this space to move only minimal amount of data necessary to get the necessary data inserted. Also if data is removed, it just grows these spaces areas.

mbox_lazy_writes setting works by adding and/or updating Dovecot's metadata headers only after closing the mailbox or when messages are expunged from the mailbox. C-Client works the same way. The upside of this is that it reduces writes because multiple flag updates to same message can be grouped, and sometimes the writes don't have to be done at all if the whole message is expunged. The downside is that other processes don't notice the changes immediately (but other Dovecot processes do notice because the changes are in index files).

mbox_dirty_syncs setting tries to avoid re-reading the mbox every time something changes. Whenever the mbox changes (ie. timestamp or size), it first checks if the mailbox's size changed. If it didn't, it most likely meant that only message flags were changed so it does a full mbox read to find it. If the mailbox shrunk, it means that mails were expunged and again Dovecot does a full sync. Usually however the only thing besides Dovecot that modifies the mbox is the LDA which appends new mails to the mbox. So if the mbox size was grown, Dovecot first checks if the last known message is still where it was last time. If it is, Dovecot reads only the newly added messages and goes into a "dirty mode". As long as Dovecot is in dirty mode, it can't be certain that mails are where it expects them to be, so whenever accessing some mail, it first verifies that it really is the correct mail by finding its X-UID header. If the X-UID header is different, it fallbacks to a full sync to find the mail's correct position. The dirty mode goes away after a full sync. If mbox_lazy_writes was enabled and the mail didn't yet have X-UID header, Dovecot uses MD5 sum of a couple of headers to compare the mails.

mbox_very_dirty_syncs does the same as mbox_dirty_syncs, but the dirty state is kept also when opening the mailbox. Normally opening the mailbox does a full sync if it had been changed outside Dovecot.

From Escaping

In mboxes a new mail always begins with a "From " line, commonly referred to as From_-line. To avoid confusion, lines beginning with "From " in message bodies are usually prefixed with '>' character while the message is being written to in mbox.

Dovecot doesn't currently do this escaping however. Instead it prevents this confusion by adding Content-Length headers so it knows later where the next message begins. Dovecot doesn't either remove the '>' characters before sending the data to clients. Both of these will probably be implemented later.

Mbox Variants

There are a few minor variants of this format:

mboxo is the name of original mbox format originated with Unix System V. Messages are stored in a single file, with each message beginning with a line containing "From SENDER DATE". If "From " (case-sensitive, with the space) occurs at the beginning of a line anywhere in the email, it is escaped with a greater-than sign (to ">From "). Lines already quoted as such, for example ">From " or ">>>From " are not quoted again, which leads to irrecoverable corruption of the message content.

mboxrd was named for Raul Dhesi in June 1995, though several people came up with the same idea around the same time. An issue with the mboxo format was that if the text ">From " appeared in the body of an email (such as from a reply quote), it was not possible to distinguish this from the mailbox format's quoted ">From ". mboxrd fixes this by always quoting already quoted "From " lines (e.g. ">From ", ">>From ", ">>>From ", etc.) as well, so readers can just remove the first ">" character. This format is used by qmail and getmail (>=4.35.0).

mboxcl format was originated with Unix System V Release 4 mail tools. It adds a Content-Length field which indicates the number of bytes in the message. This is used to determine message boundaries. It still quotes "From " as the original mboxo format does (and not as mboxrd does it).

mboxcl2 is like mboxcl but does away with the "From " quoting.

MMDF (Multi-channel Memorandum Distribution Facility mailbox format) was originated with the MMDF daemon. The format surrounds each message with lines containing four control-A's. This eliminates the need to escape From: lines.

Dovecot currently uses mboxcl2 format internally, but it's planned to move to combination of mboxrd and mboxcl.

How a message is read stored in mbox extension ?

References

MailboxFormat/mbox (last edited 2014-06-05 10:45:58 by PhilCarmody)