This documentation is for Dovecot v2.x, see wiki1 for v1.x documentation.

Dsync Design

FIXME: This describes the design for v2.0/v2.1. The v2.2 design is somewhat different.

Two-way synchronization

dsync attempts to preserve all changes done by both sides of the synced mailboxes.

Mailbox list

Mailboxes have 128 bit globally unique IDs, which are used for figuring out when two mailboxes should actually be synchronized. This solves two major problems:

Then there's the problem of how to correctly sync mailbox renames and deletions. How do you know which side of the sync has the most recent name for the mailbox? How do you know if one side had deleted mailbox, or if the other side had created it? To solve these problems, Dovecot v2.0 created a "mailbox log", which adds a record with mailbox GUID and timestamp whenever mailbox is renamed or deleted. So:

dsync writes timestamps to changelog using the original timestamps, so that dsync's changes won't override changes done by user during sync.

Mailbox

When saving new mails, dsync preserves all of their immutable state:

It also attempts preserve IMAP UID. This works as long as the other side hasn't already used the UID for another mail. If it has, dsync doesn't attempt to preserve the UID, because an IMAP client might have already seen the UID and cached another mail's contents for it. IMAP requires that message's contents must never change, so UIDs can't be reused. So whenever an UID conflict happens, dsync gives messages in both sides a new UID, because it can't know which message the client had seen, or perhaps user used two clients and both saw a different message. (This assumes a master/slave replication use case for dsync.)

The mutable metadata that dsync preserves is:

Flags and keywords are synced based on modseqs. Whichever side has a higher modseq for the message, its flags and keywords are synced to the other side. Currently there's no per-flag or per-keyword synchronization, so that if one side had added \Seen flag and other side had added \Answered flag, one of them would be dropped.

Finding what to sync

dsync can run in full mode or fast mode. Full mode means it goes through all messages in all mailboxes, making sure everything is fully synchronized. In fast mode it relies on uidvalidity, uid-next and highest-modseq values to find out changes. If any of the values changed, the mailbox is included in sync.

FIXME: A superfast mode should still be implemented, where once a mailbox is selected for syncing, it should sync only mails whose modseq is higher than a given one. This would improve performance and network traffic with large mailboxes.

Copy optimizations

Before dsync actually starts syncing anything, it first fetched a list of all to-be-synced messages and adds them to a GUID -> message hash table. Whenever dsync needs to sync a new message to the other side, it first checks if the message's GUID already exists on the other side. If it does, it starts a message copy operation instead of a full save. It's possible that this copy operation fails if the message just gets expunged from the other side, so there needs to be fallback handling for this. If the message exists in multiple mailboxes, a copy from the next mailbox is attempted. If all of them fail, dsync fallbacks to saving the message.

FIXME: This optimization currently works only in full sync mode. If this were to work in fast sync mode, the full mailbox list would have to be looked up from local side. And this would slow it down..

Design/Dsync (last edited 2014-10-05 10:32:10 by TimoSirainen)