Dovecot's index files
The basic idea behind Dovecot's index files is that it makes reading the mailboxes a lot faster. The index files consist of the following files:
- dovecot.index: Main index file
- dovecot.index.cache: Cached mailbox data
- dovecot.index.log: Transaction log file
- dovecot.index.log.2: .log file is rotated to .log.2 file when it grows too large.
Each mailbox has its own separate index files. If the index files are disabled, the same structures are still kept in the memory, except cache file is disabled completely (because the client probably won't fetch the same data twice within a connection).
If index files are missing, Dovecot creates them automatically when the mailbox is opened. If at any point creating a file or growing a file gives "not enough disk space" error, the indexes are transparently moved to memory for the rest of the session. This isn't done with mailbox formats that rely on index files (e.g. dbox).
See Design/Indexes for more technical information how the index files are handled.
The main index contains the following information for each message:
- IMAP UID
- Current flags and keywords
- Pointer to cache file
- mbox-only: mbox file offset
- mbox-only: MD5 sum of some of the message headers, intended to help find the message when its X-UID: header hasn't yet been written
- Other extensions in Dovecot v1.1+, such as mailbox sorting data
This is the same information that most other IMAP servers keep in memory while the mailbox is open, but Dovecot has the advantage of keeping the information permanently stored so it's easy to get it when opening the mailbox.
The index file's header also contains some summary information, such as how many messages exist, how many of them are unseen and how many are marked with \Deleted flag. Opening mailboxes and answering to STATUS IMAP commands can be usually done simply by getting the required information from the index file's header. This is why these operations are extremely fast with Dovecot compared to other servers that don't use an equivalent index file.
The main index's header also contains mailbox syncing state:
- Maildir: cur/ and new/ directories' timestamps
- mbox: mbox file's mtime and size
The index file is synchronized against mailbox only if the syncing information changes.
Cache file may contain the following information for messages:
- Message headers (some, not all)
- Sent date (parsed Date: header)
- Received date (IMAP's INTERNALDATE field)
- Physical and virtual message sizes
- Message's parsed MIME structure, allowing to quickly read only a specific MIME part (IMAP's FETCH BODY[1.2.3] command)
- IMAP's BODY and BODYSTRUCTURE fields
- If both are used, only BODYSTRUCTURE is saved, since BODY can be generated from it
- IMAP's ENVELOPE isn't cached currently. Instead the headers used to build it are cached directly.
IMAP clients can work in many different ways. There are basically 2 types:
- Online clients that ask for the same information multiple times (eg. webmails, Pine)
- Offline clients that usually download first some of the interesting message headers and only after that the message bodies (possibly automatically, or possibly only when the user opens the mail). Most IMAP clients behave like this.
Cache file is extremely helpful with the type 1 clients. The first time that client requests message headers or some other metadata they're stored into the cache file. The second time they ask for the same information Dovecot can now get it quickly from the cache file instead of opening the message and parsing the headers.
For type 2 clients the cache file is helpful if they use multiple clients or if the data was cached while the message was being saved (Dovecot v1.1+ can do this). Some of the information is helpful in any case, for example it's required to know the message's virtual size when downloading the message. Without the virtual size being in cache Dovecot first has to read the whole message to calculate it.
Only the mailbox metadata that client(s) have asked for earlier are stored into cache file. This allows Dovecot to be adaptive to different clients' needs and still not waste disk space (and cause extra disk I/O!) for fields that client never needs.
Dovecot can cache fields either permanently or temporarily. Temporarily cached fields are dropped from the cache file after about a week. Dovecot uses two rules to determine when data should be cached permanently instead of temporarily:
- Client accessed messages in non-sequential order within this session. This most likely means it doesn't have a local cache.
- Client accessed a message older than one week.
Design/Indexes/Cache explains the reasons for these rules.
All changes to the main index go through transaction log first. This has two advantages when the mailbox is accessed using multiple simultaneous connections:
- It allows getting a list of changes quickly so that IMAP clients can be notified of the changes. An alternative would be to do a comparison of two index mappings, which is what most other IMAP servers do.
mmap_disable=yes implementation relies on the transaction log. Instead of re-reading the whole main index file after each change it's necessary to only read a few bytes from the transaction log.
In Dovecot v1.1+ the transaction log plays an even more important role. The main index file is updated only "once in a while" to reduce disk writes, so it is common to first read the main index and then apply new changes from the transaction log on top of that. With empty mailboxes (eg. download+delete POP3 users) it would even be possible to delete the whole main index and keep only the transaction log (although this isn't done currently).