Benchmarking mbox versus maildir

Table Of Contents

Introduction - Phase I
Test environment - Phase I
Benchmark results on low-end hardware
Benchmark results on high-end hardware
Memory usage comparison
Introduction - Phase II
Test environment - Phase II
Benchmark results
Memory usage comparison
Introduction - Phase III
Test environment - Phase III
Benchmark results
Memory usage comparison
Graphs
Final Analysis

Last updated on: March 25, 2003

Introduction - Phase I

This paper presents the results of a series of benchmarks that compare the relative performance of mbox-based and maildir-based mail access in several different environments. The comparative advantages of each mail storage format is a recurring subject that always comes up anytime someone asks which is the best IMAP server for them to use.

mbox mail storage format

This is the traditional way to store mail on UNIX-based mail servers. Individual messages are simply concatenated together, and saved in a single file. A special marker is placed where one message ends and the next message begins. Only one process can access the mbox file in read/write mode. Concurrent access requires a locking mechanism. Anytime someone needs to update the mbox file, everyone else must wait for the update to complete.

maildir mail storage format

Maildirs were originally implemented in the Qmail mail server, supposedly to address the inadequacies of mbox files. Individual messages are saved in separate files, one file per message. There is a defined method for naming each file. There's a defined procedure for adding new messages to the maildir. No locking is required. Multiple processes can use maildirs at the same time.

mbx mail storage format

This is a slightly modified version of the original mbox format that's offered by the UW-IMAP server. mbx mailboxes still require locking. The main difference from the mbox format is that each message in the file is preceded by a record that carries some message-specific metadata. As such, certain operations that used to require the entire mbox file to be rewritten can now be implemented by updating the fixed-size header record.

This benchmark focuses mainly on the mbox and maildirs formats. In March of 2003 an unrelated party conducted a similar benchmark for mbx formats. See http://www.decisionsoft.com/pdw/mailbench.html for more details.

Documentation included with the University of Washington IMAP server (UW-IMAP) states that maildirs have many "performance disadvantages" and that the maildir format "doesn't scale." Furthermore, maildirs are supposedly vulnerable to "filesystem trashing" due to multiple "open() and stat()" calls, because "just about every filesystem in existence serializes" file creation and access^[1]. The document makes a conclusion that this results in performance degradation for "moderately sized" mailboxes of about 2,000 messages.

Painting "just about" every filesystem in existence with the same brush, and assuming that every filesystem works pretty much in the same way, is very misleading. Many contemporary high performance filesystem are designed explicitly for parallel access. For example, consider the SGI XFS filesystem:

The free space and inodes within each AG are managed independently and in parallel so multiple processes can allocate free space throughout the file system simultaneously.^[2]

It took me about 6 months to write the first revision of the maildir-based Courier-IMAP server. The absence of maildir support in the UW-IMAP server is the reason I wrote it. Many people have found that it needed less memory, and was faster than UW-IMAP. Many people observed that upgrading to Courier-IMAP lowered their overall system load, and increased performance. Large mail clusters with a network-based fault tolerant, scalable, architecture frequently have problem deploying mbox-based mailboxes, due to many documented problems with file locking (file locking is required for mbox-based mailboxes) with network-based filesystems.^[3] As referenced in ^[3], maildirs have no issues with NFS (the most common type of a network-based filesystem) since maildirs do not use locking.

After looking around for some time, I did not find any independent benchmarks that directly measured the relative performance of mboxes and maildirs. Therefore I decided to run some actual benchmarks myself. I defined the test conditions according to UW-IMAP server's documentation. I created a test environment that stacked the deck in favor of mboxes. This was done in accordance with the claimed shortcomings of maildirs as stated in UW-IMAP server's documentation, in order to accurately measure the magnitude of the claimed problems.

Test environment - Phase I

For this benchmark, I used the UW-IMAP 2000 server, that uses mbox files, and the Courier-IMAP 1.3.6 server, that uses maildirs. Initially I created a mailbox with 100 messages, and ran the same benchmarking script for each server. I reran the same script the second time, this time with 2,000 messages. This benchmarking script put each IMAP server through several tasks. Each task was profiled with the time command.

The benchmarks initially ran on very low-end, obsolete, hardware, then repeated on a more robust, modern server. This makes it possible to observe both kinds of scalability: larger mailboxes, and faster hardware.

Here's the script that generated the test data for the benchmarks:

#!/bin/sh

n=0
while test $n -lt 100
do

    dd if=/dev/urandom bs=3k count=1 | uuencode - | \
            mail -s "Test message $n" `whoami`

    n=`expr $n + 1`
done

The mail server was configured to deliver mail either to /var/spool/mail or to $HOME/Maildir, for its respective IMAP server. This test script created 100 messages, each one approximately 4.5Kb in size. In the second half of the benchmark, the script was modified to create 2,000 messages.

After the mailbox was primed with dummy messages, the following script benchmarked each IMAP server:

#!/bin/sh

# For IMAP-2000:
#
# PATH=/usr/sbin:$PATH
# export PATH

# For Courier-IMAP 1.3.6:

# PATH=/usr/lib/courier-imap/bin:$PATH
# MAILDIR=$HOME/Maildir
# export PATH
# export MAILDIR

echo "=============="
echo "SELECT.1"
echo ""
time imapd <<EOF
001 SELECT INBOX
002 LOGOUT
EOF
echo "=============="
echo ""
echo "SELECT.2"
echo ""
time imapd <<EOF
001 SELECT INBOX
002 LOGOUT
EOF
echo ""
echo "=============="
echo ""
echo "DELETE.1"
echo ""
time imapd <<EOF
001 SELECT INBOX
002 STORE 50 +FLAGS.SILENT (\Deleted)
003 EXPUNGE
004 LOGOUT
EOF
echo ""
echo "=============="
echo ""
echo "FETCH.1"
echo ""
time imapd >/dev/null <<EOF
001 SELECT INBOX
002 FETCH 1:* (BODYSTRUCTURE)
003 EXPUNGE
004 LOGOUT
EOF
echo ""
echo "=============="
echo ""
echo "FETCH.2"
echo ""
time imapd >/dev/null <<EOF
001 SELECT INBOX
002 FETCH 1:* (BODYSTRUCTURE)
003 EXPUNGE
004 LOGOUT
EOF
echo "=============="
echo ""
echo "SEARCH.1"
echo ""
time imapd >/dev/null <<EOF
001 SELECT INBOX
002 SEARCH 1:* TEXT "This text will not be found"
003 EXPUNGE
004 LOGOUT
EOF
echo "=============="
echo ""
echo "SEARCH.2"
echo ""
time imapd >/dev/null <<EOF
001 SELECT INBOX
002 SEARCH 1:* TEXT "This text will not be found"
003 EXPUNGE
004 LOGOUT
EOF

Here's a brief explanation for those who are not familiar with the IMAP protocol syntax. These tests carried out the following tasks:

SELECT.1 - open a mailbox with 100 or 2,000 new messages.
SELECT.2 - open a mailbox with 100 or 2,000 messages that have already been seen.
DELETE.1 - delete a message from the mailbox.
FETCH.1 - retrieve the MIME structure of all messages in the mailbox.
FETCH.2 - same command as FETCH.1
SEARCH.1 - search all messages for a text string.
SEARCH.2 - same command as SEARCH.1

Benchmark results on low-end hardware

Hardware:

Pentium 200Mhz CPU, 80MB Ram
IDE hard disk in PIO mode (no DMA)
Linux kernel 2.4.2

This slow hardware was chosen to highlight any inherent bottlenecks or performance problems that are inherent with maildirs. The raw results are given below. Analysis follows:

	UW-IMAP 2000		Courier-IMAP 1.3.6
	100 messages	2,000 messages	100 messages	2,000 messages
SELECT.1	real 0m1.552s user 0m0.090s sys 0m0.300s	real 0m9.069s user 0m2.120s sys 0m2.440s	real 0m0.313s user 0m0.150s sys 0m0.100s	real 0m4.408s user 0m0.160s sys 0m4.210s
SELECT.2	real 0m0.208s user 0m0.080s sys 0m0.060s	real 0m1.068s user 0m0.630s sys 0m0.380s	real 0m0.030s user 0m0.010s sys 0m0.020s	real 0m0.169s user 0m0.150s sys 0m0.020s
DELETE.1	real 0m0.710s user 0m0.120s sys 0m0.020s	real 0m5.250s user 0m1.190s sys 0m1.510s	real 0m0.040s user 0m0.010s sys 0m0.030s	real 0m0.362s user 0m0.330s sys 0m0.030s
FETCH.1	real 0m0.455s user 0m0.260s sys 0m0.120s	real 0m6.061s user 0m5.060s sys 0m0.890s	real 0m0.728s user 0m0.200s sys 0m0.080s	real 0m27.713s user 0m3.250s sys 0m1.860s
FETCH.2	real 0m0.455s user 0m0.270s sys 0m0.120s	real 0m6.219s user 0m5.220s sys 0m0.890s	real 0m0.246s user 0m0.140s sys 0m0.110s	real 0m4.466s user 0m2.930s sys 0m1.500s
SEARCH.1	real 0m0.551s user 0m0.410s sys 0m0.080s	real 0m7.935s user 0m6.450s sys 0m1.380s	real 0m0.482s user 0m0.350s sys 0m0.140s	real 0m9.251s user 0m7.480s sys 0m1.760s
SEARCH.2	real 0m0.553s user 0m0.400s sys 0m0.100s	real 0m8.167s user 0m6.920s sys 0m1.140s	real 0m0.484s user 0m0.390s sys 0m0.090s	real 0m9.246s user 0m7.300s sys 0m1.870s

Analysis

The time command reports the following data:

real - this is the total amount of time the process ran.
user - the total CPU time expended by the process.
sys - the total CPU time expended by the kernel, on behalf of the process.

user and sys can be interpreted as the total amount of CPU time the process took to execute. The difference between their sum, and the amount of real time, is the time the system was waiting for a pending I/O operation to complete (or it was busy with something else), before continuing to execute the process. user represents the actual amount of time executed by the program code, while sys represents the amount of time executed by the kernel, in this process. One typical example is the actual kernel code to open or close files, or read and write the content of the file.

SELECT.1

Here, the IMAP server opened a mailbox with a bunch of messages it never saw before. Even with 2,000 messages, maildirs are twice as fast as mboxes. Why? The IMAP server must assign unique message identifiers, UIDs, to each message. The UW-IMAP server saves UIDs in the mbox file, and must essentially read the entire mailbox, assign UIDs, and then save the UIDs in the mbox file. The Courier-IMAP server doesn't need to read the contents of each message. It only needs to rename each file in the maildir (note the high sys time). The Courier-IMAP server keeps track of UIDs separately.

SELECT.2

The same IMAP folder is reopened, and closed. The UW-IMAP server runs much faster this time, because it doesn't have to rewrite the mailbox, and the contents of the mailbox file are already cached in memory by the operating system (note that the process almost never waits for I/O - the sum of user and sys is almost the same as real). But Courier-IMAP is still faster.

According to the raw numbers, Courier-IMAP is about seven times faster than UW-IMAP, but this ratio should be considered as a mere approximation. The total execution time, in both cases, is very small, and the actual timings are less meaningful because of the granularity of the system clock. Other factors include the context switch time, and the behavior of the operating system process scheduler.

DELETE.1

UW-IMAP's performance noticeably deteriorates with a 2,000-message mailbox. This is because deleting a message also requires the entire mbox file to be rewritten. The UW-IMAP process spends half of its time waiting for pending I/O to complete.

Courier-IMAP doesn't need to do much I/O here. It only needs to rename and then delete a single file from the maildir.

FETCH.1

Courier-IMAP's execution time degrades drastically in this test, especially with a 2,000 message mailbox. This is the first time Courier-IMAP needs to read the contents of the entire mailbox, and the slow IDE disk really grinds things to a virtual halt. Note that the actual process time is the same for both UW-IMAP and Courier-IMAP. The difference is entirely in the I/O time. UW-IMAP already had to read the mailbox several times earlier in this benchmark, and the operating system already had the mailbox's contents cached. Courier-IMAP managed to avoid reading the mailbox's contents, so far. But, it can't avoid the inevitable, and it's time to pay the piper.

FETCH.2

Same exact task as FETCH.1, but this time Courier-IMAP is faster than UW-IMAP by a small margin. Why? There's no disk I/O this time, and both servers are on equal footing. Both servers have the exact same task at hand, and Courier-IMAP is slightly faster. Why?

I do not believe that the differences between mboxes and maildirs are a direct factor. I believe that the internal design of each IMAP server is in play here. The UW-IMAP server has a number of internal abstraction and indirection layers, in order to be able to support many different mail storage formats. All that translates into additional overhead, and a less optimal internal design. Courier-IMAP is designed to support maildirs only, and its internal code is optimized, in most places, for the maildir format. Note that Courier-IMAP's user time is consistently half of UW-IMAP's. That shows the much smaller internal execution path in Courier-IMAP, which is entirely based on the way that maildirs store mail. UW-IMAP's execution path is much longer. At its top level, the execution path is more generic, and is not particularly geared for any mail storage format. Eventually, it winds its way down to the driver for each particular mailbox, and its specific code. With all things being equal, Courier-IMAP's much simpler internal architecture saves enough process time to make up for the larger number of I/O calls.

SEARCH.1 and SEARCH.2

Both benchmarks show more or less equivalent results. With everything cached at this point, UW-IMAP is faster by about a second, with 2,000 messages in the mailbox. Courier-IMAP uses a slightly more sophisticated search algorithm that can find alternate encodings of the same search string (in alternate encodings of the same base character set). This additional complexity results in a slight performance penalty.

Benchmark results on high-end hardware

Hardware:

Abit BP-6 motherboard, dual 500Mhz Celeron CPUs, 256MB PC-100 SDRAM
Ultra2 SCSI hard disk, 40MB/s DMA mode.
Linux kernel 2.4.2

Raw results:

	UW-IMAP 2000		Courier-IMAP 1.3.6
	100 messages	2,000 messages	100 messages	2,000 messages
SELECT.1	real 0m0.198s user 0m0.020s sys 0m0.030s	real 0m3.068s user 0m0.560s sys 0m0.240s	real 0m0.026s user 0m0.030s sys 0m0.010s	real 0m2.147s user 0m0.030s sys 0m2.120s
SELECT.2	real 0m0.035s user 0m0.030s sys 0m0.010s	real 0m0.201s user 0m0.130s sys 0m0.060s	real 0m0.009s user 0m0.010s sys 0m0.000s	real 0m0.052s user 0m0.040s sys 0m0.010s
DELETE.1	real 0m0.135s user 0m0.020s sys 0m0.030s	real 0m2.195s user 0m0.220s sys 0m0.250s	real 0m0.014s user 0m0.000s sys 0m0.010s	real 0m0.113s user 0m0.090s sys 0m0.020s
FETCH.1	real 0m0.093s user 0m0.060s sys 0m0.030s	real 0m1.359s user 0m1.140s sys 0m0.220s	real 0m0.057s user 0m0.050s sys 0m0.000s	real 0m1.004s user 0m0.800s sys 0m0.200s
FETCH.2	real 0m0.093s user 0m0.080s sys 0m0.020s	real 0m1.358s user 0m1.110s sys 0m0.240s	real 0m0.058s user 0m0.050s sys 0m0.010s	real 0m0.994s user 0m0.790s sys 0m0.200s
SEARCH.1	real 0m0.111s user 0m0.110s sys 0m0.000s	real 0m1.729s user 0m1.460s sys 0m0.270s	real 0m0.115s user 0m0.100s sys 0m0.020s	real 0m2.198s user 0m1.870s sys 0m0.330s
SEARCH.2	real 0m0.112s user 0m0.090s sys 0m0.010s	real 0m1.712s user 0m1.450s sys 0m0.250s	real 0m0.115s user 0m0.100s sys 0m0.010s	real 0m2.201s user 0m1.910s sys 0m0.290s

Analysis

Things look very different on larger hardware. It should be noted that the hardware used in this benchmark -- although much more powerful -- is not even considered to be state of the art, at the time these benchmarks were performed. Modern mail servers usually have two or four Pentium III (or Xeon) CPUs running at 800Mhz or higher; at least half a gigabyte of PC-133 SDRAM; and wide-SCSI hard drives running at 160MB/s DMA.

SELECT.1, SELECT.2, DELETE.1

Courier-IMAP continues to maintain its performance edge over the UW-IMAP server, pretty much by the same margin as it did on low-end hardware.

FETCH.1, FETCH.2, SEARCH.1, SEARCH.2

With better hardware, Courier-IMAP was slightly faster than UW-IMAP with 100 messages, and slightly slower with 2,000 messages. The severe performance degradation in the FETCH.1 benchmark with 2,000 messages -- that was caused by a slow IDE disk and limited amount of RAM -- is nowhere to be found. Therefore, Courier-IMAP technically scaled better than UW-IMAP, when moving from low-end to high-end hardware.

Memory usage comparison

These benchmarks show the memory usage of each IMAP server. The memory usage numbers were obtained by:

Running the IMAP server by hand, manually, in the same manner as the benchmark.sh script did.

Entering the following commands:

001 SELECT INBOX
002 FETCH 1:* (BODYSTRUCTURE)

Reading the /proc/pid/status file, which contains various per-process data, including its memory usage.

The raw results:

UW-IMAP 2000	Courier-IMAP 1.3.6
100 messages	2,000 messages	100 messages	2,000 messages
VmSize: 3832 kB VmRSS: 1656 kB VmData: 192 kB VmStk: 28 kB VmExe: 688 kB VmLib: 2788 kB	VmSize: 5344 kB VmRSS: 3168 kB VmData: 1704 kB VmStk: 28 kB VmExe: 688 kB VmLib: 2788 kB	VmSize: 1596 kB VmRSS: 688 kB VmData: 92 kB VmStk: 28 kB VmExe: 160 kB VmLib: 1284 kB	VmSize: 2340 kB VmRSS: 1444 kB VmData: 832 kB VmStk: 32 kB VmExe: 160 kB VmLib: 1284 kB

UW-IMAP 2000

Courier-IMAP 1.3.6

100 messages

2,000 messages

100 messages

2,000 messages

VmSize:     3832 kB
VmRSS:      1656 kB
VmData:      192 kB
VmStk:        28 kB
VmExe:       688 kB
VmLib:      2788 kB

VmSize:     5344 kB
VmRSS:      3168 kB
VmData:     1704 kB
VmStk:        28 kB
VmExe:       688 kB
VmLib:      2788 kB

VmSize:     1596 kB
VmRSS:       688 kB
VmData:       92 kB
VmStk:        28 kB
VmExe:       160 kB
VmLib:      1284 kB

VmSize:     2340 kB
VmRSS:      1444 kB
VmData:      832 kB
VmStk:        32 kB
VmExe:       160 kB
VmLib:      1284 kB

Analysis

These numbers report the following information:

VmSize - the total in-memory size of the running process
VmData, VmRSS - this is the size of the process's data segment (initialized and uninitialized segments are reported separately).
VmStk - size of the process's stack
VmExe - size of the process's code
VmLib - size of runtime libraries loaded by the process

Multiple instances of the same program share a single copy of the VmExe and VmLib segments. Therefore it's the size of the VmData, VmRSS, and VmStk segments that determines how many processes can be running, before the server runs out of memory.

Courier-IMAP's memory needs grew at a slightly faster pace than UW-IMAP's. However, Courier-IMAP needs much less memory than UW-IMAP to open a folder. Even at 2,000 messages, Courier-IMAP's VmData and VmRSS were less than half of UW-IMAP's. A mail server should be able to support at least twice as many IMAP clients with Courier-IMAP, before running out of RAM. This assumes that other system resources (filesystem handles, maximum number of processes, etc...) are not exhausted before then.

Introduction - Phase II

The parameters for Phase II were defined after reviewing the results of Phase I. The same benchmarking script was used for phase II, except that the INBOX folder was loaded with 10,000 random messages. The total size of INBOX was approximately 40 megabytes.

In Phase I, maildirs showed some weaknesses on low-end hardware, but achieve slightly better scalability - as compared to mbox files - on high end hardware. Phase II tries to determine if this scaling trend continues with even larger mail folders.

Test environment - Phase II

The same scripts generated the test data. The scripts were modified to generate 10,000 messages, about 4K per message. Phase II used the same test machines as in Phase I. Refer to Phase I for the specifications of each test machine.

Benchmark results

	UW-IMAP 2000		Courier-IMAP 1.3.6
	Low-end hardware	High-end hardware	Low-end hardware	High-end hardware
SELECT.1	real 1m16.516s user 0m40.260s sys 0m25.680s	real 0m27.873s user 0m15.780s sys 0m2.060s	real 1m32.598s user 0m0.850s sys 1m30.220s	real 1m2.735s user 0m0.320s sys 1m1.880s
SELECT.2	real 0m5.221s user 0m2.360s sys 0m2.080s	real 0m0.721s user 0m0.470s sys 0m0.260s	real 0m0.786s user 0m0.720s sys 0m0.070s	real 0m0.240s user 0m0.180s sys 0m0.050s
DELETE.1	real 0m24.241s user 0m4.470s sys 0m14.320s	real 0m8.690s user 0m0.890s sys 0m1.200s	real 0m1.756s user 0m1.570s sys 0m0.190s	real 0m0.553s user 0m0.460s sys 0m0.090s
FETCH.1	real 0m26.383s user 0m22.330s sys 0m4.000s	real 0m5.921s user 0m5.000s sys 0m0.920s	real 2m26.612s user 0m12.670s sys 0m34.310s	real 0m3.898s user 0m2.770s sys 0m1.060s
FETCH.2	real 0m26.401s user 0m22.370s sys 0m3.970s	real 0m5.867s user 0m4.820s sys 0m1.050s	real 2m28.187s user 0m12.140s sys 0m34.500s	real 0m3.756s user 0m2.840s sys 0m0.920s
SEARCH.1	real 0m36.359s user 0m30.050s sys 0m5.980s	real 0m8.938s user 0m7.210s sys 0m1.660s	real 2m56.652s user 0m29.760s sys 0m37.050s	real 0m7.520s user 0m6.160s sys 0m1.360s
SEARCH.2	real 0m35.390s user 0m30.070s sys 0m5.280s	real 0m8.784s user 0m7.430s sys 0m1.360s	real 2m55.936s user 0m29.450s sys 0m37.320s	real 0m7.572s user 0m6.290s sys 0m1.280s

Analysis

Phase II's results were consistent with Phase I's. Maildirs continued to fall behind on low-end hardware. Mboxes lagged behind maildirs on high-end hardware. Except for the SELECT.1 benchmark, maildirs scaled much better (from 2,000 messages) on high-end hardware that mboxes. In fact, in terms of absolute numbers, maildirs were faster than mbox files. The Courier-IMAP server even managed to beat UW-IMAP on the FETCH and SEARCH benchmarks, for the very first time. The most likely explanation for that is that Courier-IMAP's smaller code size means that a larger percentage of its code can be kept in the CPU's Level 1 cache. Celerons do not have Level 2 cache, but they do have Level 1 cache.

The SELECT.1 benchmark involved opening a folder with 10,000 new messages. In this test, the UW-IMAP server only needed to rewrite the mbox file. The Courier-IMAP server had to rename every one of the 10,000 files in the maildir. Note that the maildir results show almost no CPU user time. All the CPU time came from the kernel.

Memory usage comparison

UW-IMAP 2000	Courier-IMAP 1.3.6
VmSize: 9656 kB VmRSS: 7520 kB VmData: 6036 kB VmStk: 28 kB VmExe: 688 kB VmLib: 2752 kB	VmSize: 5488 kB VmRSS: 4620 kB VmData: 4008 kB VmStk: 32 kB VmExe: 160 kB VmLib: 1256 kB

Even with a 10,000 messages in a folder, Courier-IMAP needed much less memory than UW-IMAP.

Introduction - Phase III

The parameters for Phase III were designed to determine a different kind of scalability. Phases I and II had a large number of small messages in the folder. In Phase III the mail folder had a small number of large messages. This environment is more like a corporate environment than an ISP environment, with middle-management constantly exchanging large documents and presentation files. The parameters for Phase III were defined after reviewing the results of Phase I and Phase II. The same benchmarking script was used for phase III. The INBOX folder in Phase III was about the same size as in phase II - about 40 megabytes - except that it contained 200 messages, and each message was 200Kb long.

Test environment - Phase III

The same scripts generated the test data. The scripts were modified to generate 20 messages, about 200Kb per message. Phase II used the same test machines as in Phase I. Refer to Phase I for the specifications of each test machine.

Benchmark results

	UW-IMAP 2000		Courier-IMAP 1.3.6
	Low-end hardware	High-end hardware	Low-end hardware	High-end hardware
SELECT.1	real 0m43.238s user 0m2.820s sys 0m27.310s	real 0m13.712s user 0m1.130s sys 0m1.610s	real 0m0.141s user 0m0.030s sys 0m0.110s	real 0m0.290s user 0m0.140s sys 0m0.140s
SELECT.2	real 0m3.393s user 0m1.760s sys 0m1.510s	real 0m0.518s user 0m0.350s sys 0m0.160s	real 0m0.036s user 0m0.030s sys 0m0.010s	real 0m0.013s user 0m0.010s sys 0m0.000s
DELETE.1	real 0m3.335s user 0m1.790s sys 0m1.410s	real 0m7.139s user 0m0.940s sys 0m1.260s	real 0m0.059s user 0m0.050s sys 0m0.010s	real 0m0.022s user 0m0.010s sys 0m0.010s
FETCH.1	real 0m21.806s user 0m18.280s sys 0m3.430s	real 0m4.587s user 0m4.010s sys 0m0.580s	real 0m25.223s user 0m3.530s sys 0m14.080s	real 0m1.171s user 0m1.000s sys 0m0.180s
FETCH.2	real 0m21.783s user 0m18.260s sys 0m3.490s	real 0m4.586s user 0m4.020s sys 0m0.560s	real 0m5.404s user 0m3.370s sys 0m0.800s	real 0m1.201s user 0m0.970s sys 0m0.240s
SEARCH.1	real 0m32.841s user 0m27.560s sys 0m5.220s	real 0m6.582s user 0m5.690s sys 0m0.880s	real 0m18.875s user 0m16.960s sys 0m1.900s	real 0m4.899s user 0m4.480s sys 0m0.420s
SEARCH.2	real 0m32.841s user 0m27.560s sys 0m5.220s	real 0m6.609s user 0m5.600s sys 0m0.990s	real 0m18.878s user 0m17.300s sys 0m1.580s	real 0m4.961s user 0m4.470s sys 0m0.490s

Analysis

With large messages, maildirs did better than mboxes pretty much all across the board, on both low-end and high-end hardware. Expensive disk I/O on low end hardware dragged down maildirs on the FETCH.1 benchmark, though. Recall that FETCH.1 is the first benchmark where the Courier-IMAP server actually has to read the entire mailbox. The remaining benchmarks reflect the fact that the operating system caches a few large files better than many small files. The Courier-IMAP process didn't spend much time in kernel space even on low-end hardware, indicating that virtually no disk I/O took place.

One unexpected result is the UW-IMAP server's poor performance in the SEARCH and FETCH benchmarks. It appears that the server has some kind of a problem here, scaling to mailboxes that contain large messages. Note that the UW-IMAP server spends most of its time in "user" state. There's very little system activity. The process spent pretty much all of its time in user space, and that is entirely responsible for its poor performance.

Memory usage comparison

UW-IMAP 2000	Courier-IMAP 1.3.6
VmSize: 4036 kB VmRSS: 1900 kB VmData: 416 kB VmStk: 28 kB VmExe: 688 kB VmLib: 2752 kB	VmSize: 1604 kB VmRSS: 712 kB VmData: 128 kB VmStk: 28 kB VmExe: 160 kB VmLib: 1256 kB

Graphs

The following graphs visually represent the performance data gathered in Phases I-III. They were derived using the following process.

For each individual benchmark, the user and sys times were added together to obtain the total CPU time used in the benchmark. Then, the average of the total CPU time and the real time was computed. Essentially, the formula was (real+user+sys)/2. Justification: user+sys represents the total CPU time, which is a factor in how many mail clients the server can support; the real time is the apparent performance from the mail client's point of view. Both measurements are reasonable factors in determining the overall system performance. A small CPU time means that the system can handle more processes. But if the real CPU time is 2 minutes - for example - the fact that the total CPU is only a couple of seconds isn't going to play very well with a mail client that now must wait 2 minutes for a response. Averaging them together computes a metric where both factors are given equal weight. That is, both real CPU time and actual CPU time are considered equally in evaluating the overall system performance.

Phases I and II

The following graphs represent the combined results of Phases I and II. The CPU time is the Y axis, the number of messages in the mailbox is the X axis. The more vertical a line, the poorer is the represented scalability. A nearly horizontal line represents a nearly perfect, constant scalability.

Phase III

The following graphs represent the scalability from Phase III, with Phase I as a reference point. The same formula computed the metric for an individual benchmark. The initial value on the graph is the metric from Phase I, with 100 messages each approximately 4Kb long, for a total mailbox size of about 400Kb. The final value on the graph is the metric from the 40Mb mailbox from Phase III (200 messages, 200Kb each message).

Final Analysis

These results easily reject an absolute claim that maildirs always fail to scale to large mail folders. These benchmarks show that a big factor is the underlying hardware and the operating system. The ext2 filesystem, as implemented by the Linux kernel, is known for its speed and good performance.^[4]

Maildirs will not scale very well on servers that use old, slow, hardware. Maildirs will also do poorly with an inefficient filesystem that stores very large folders which are frequently searched for specific content. However maildirs' performance should be adequate even on slow machines with very large folders, as long as the mail activity is just occasional read/write access, and browsing. Even with large folders, containing unread messages, maildirs will require less system load than mboxes. On fast hardware, these benchmarks indicate that maildirs scale better in more often than not. Maildirs scale much better with mail folders that contain large messages. Even with folders that have a large number of smaller messages, maildirs did better than mboxes on many benchmarks.

It should be noted that some of these numbers reflect the overall system performance that may differ from the apparent performance seen by a mail client. When running the benchmark, the UW-IMAP server did not actually take much longer to open a 2,000 message folder than Courier-IMAP -- it postponed the mbox file rewrite until the folder was closed. However, this benchmark takes both measurements into account. From the user's standpoint, some of the delay in opening a large folder is postponed until the folder is closed. This results in a slightly faster response when opening a folder, but from the system's viewpoint the load's the same. This is why both measurements are important. Whether you take the load up front, or spread it around, the grand total is still the same. The decision to postpone rewriting the mbox file can result in some savings in time (mostly by consolidating multiple rewrites into one). However, there's also a down side to this approach. An IMAP server can always be killed by an abnormal system event, for example. When that happens to the UW-IMAP server, any unsaved changes to the folder will be lost.

Mail clients that do not cache IMAP metadata may also result in degraded maildir performance. The Pine mail client doesn't do any caching; it pretty much reads the message index every time it opens the folder, which is usually an expensive operation for maildirs. Most Windows mail clients cache IMAP metadata extensively. IMAP mail clients that support offline use MUST cache IMAP metadata. Both Netscape Mail, Outlook, and Outlook Express, usually cache everything they receive from the IMAP server. They will not ask for the entire message index, again, and therefore avoid most of maildir's message index penalty. If they open a folder and see no changes since the last IMAP session, they will do absolutely nothing. Therefore, another factor to consider is the mail client software that will be used to access the mailbox.

The final conclusion is that -- except in some specific instances -- using maildirs will be just as fast -- and in sometimes much faster -- than mbox files, while placing less of a load on the rest of the mail system. The claims in the UW-IMAP server's documentation regarding maildir performance can be supported only in certain, specific, very narrowly-defined conditions. There is no simple answer on which mail storage format is better. A lot depends on many variables that vary widely in different situations. Besides the raw benchmarks shown above, other factors include the mail server software being used, what kind of storage is being used, and the available network bandwidth. The final answer depends on all of the above.

References

^[1] http://www.washington.edu/imap/documentation/formats.txt.html.

^[2] "Scalability and Performance in Modern File Systems", SGI.

^[3] A Google search on "nfs locking errors" provides plenty of reading material. See also "Using sendmail in a NFS safe way".

^[4] Independent benchmarks show that Linux's ext2 filesystem outperforms Solaris's tmpfs RAM-based filesystem!

http://www.courier-mta.org