The Looseleaf Papers

fdatasync makes update-mime-database too slow

Created

Modified

Published

Lately, the update-mime-database step of installing Debian packages has been taking way too long (on the order of three or four minutes).

What is it doing?

update-mime-database is passed the mime directory containing the packages subdirectory which was modified as its only argument. It scans all the XML files in the packages subdirectory, combines the information in them, and creates a number of output files.

—X Desktop Group, Shared MIME-info Database spec, 29 July 2002

https://specifications.freedesktop.org/shared-mime-info-spec/0.9/ar01s03.html

So it parses a bunch XML files in /usr/share/mime/packages/ then spits out a bunch of files into /usr/share/mime. For example, on my machine it does roughly this:

  • Step 1: read a 295-byte XML file called /usr/share/mime/packages/apt.xml into memory, effectively

    $ mypackages=$(cat /usr/share/mime/packages/apt.xml)
    
  • Step 2: write a slightly different 385-byte XML file called /usr/share/mime/text/x-apt-sources-list.xml.new, effectively

    $ tmp=/usr/share/mime/text/x-apt-sources-list.xml.new
    $ printf '%s' "$mynewdata" > "$tmp"
    
  • Step 3: rename this file to /usr/share/mime/text/x-apt-sources-list.xml, effectively

    $ mv /usr/share/mime/text/x-apt-sources-list.xml.new \
         /usr/share/mime/text/x-apt-sources-list.xml
    

Step 3, the final rename, is an atomic operation, but step 2, the write, is not. So if update-mime-database is interrupted partway through step 2, it will leave behind an invalid XML file.

To prevent this, update-mime-database performs an atomic write using fdatasync():

In order to implement fully atomic upgrades, the OSTree system expects that “triggers” such as this tool ensure that any data they generate are on durable storage. Thus, we use fdatasync() before calling rename().

https://bugs.freedesktop.org/show_bug.cgi?id=61472

— Colin Walters, 2013-02-25

https://cgit.freedesktop.org/xdg/shared-mime-info/commit/?id=bc7658182f1922d49f33acf614f408a9d3f1f9f2

OSTree is designed to implement fully atomic and safe upgrades; more generally, atomic transitions between lists of bootable deployments. If the system crashes or you pull the power, you will have either the old system, or the new one.

https://ostree.readthedocs.io/en/latest/manual/atomic-upgrades/

That is all well and good. But on my machine, running update-mime-database can take three or four minutes, far longer than any other step for installing a package.

$ /usr/bin/time --output="time.log" --verbose sudo /usr/bin/update-mime-database.real /usr/share/mime
$ head -n 5 "time.log"
    Command being timed: "sudo /usr/bin/update-mime-database.real /usr/share/mime"
    User time (seconds): 0.60
    System time (seconds): 0.47
    Percent of CPU this job got: 0%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 4:04.84

Note that the combined user and system time is brief (only about a second), but the wall clock time is about two hundred times longer. Why? Because on my machine, update-mime-database has to write and sync 1200 files, and each fdatasync() might be slow, often more than 100 milliseconds, as we can see using strace.

$ sudo strace -o "strace.log" -cw /usr/bin/update-mime-database.real /usr/share/mime
$ head -n 7 "strace.log"
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.40  169.224237      141020      1200           fdatasync
  1.81    3.177132        9929       320           read
  1.09    1.910018        1592      1200           rename
  0.22    0.394739       12336        32           unlink
  0.18    0.317509         110      2878       316 open

I am using the -w flag to get the wall clock time, not system time. Note that fdatasync averages 141020 microseconds per call, or about 140 milliseconds. This is a different run that the previous example, and since there is less disk load it finishes faster, but it still takes more than 175.5 seconds (almost 3 minutes) to complete.

Now, unfortunately strace only shows the average and does not give any insight into the distribution of durations of each call. The average (mean) is not a resistant measure of central tendency; it is susceptible to outliers. What if most of the time fdatasync is fast, but maybe 1% of the time it take 30 seconds instead? We could use a more robust measure like median, but why not plot the whole distribution as a histogram?

For this, we will need to use the -T flag to strace, since otherwise we don’t know how much time is spent in the actual call.

-T
Show the time spent in system calls. This records the time difference between the beginning and the end of each system call.

http://www.man7.org/linux/man-pages/man1/strace.1.html

$ sudo strace -o $(date +%s).log -rTC /usr/bin/update-mime-database.real /usr/share/mime

And here’s a histogram of the results:

Number of calls vs. Time in syscall (seconds)

We can see that while fdatasync does occasionally take about a second, most of the time it’s around 100 to 200 milliseconds. So it’s not a case of a few huge outliers.

OK, but maybe I just happen to have a poorly performing hard drive on this machine. What did other people have to say after the new version reached mainline Linux distros?

With shared-mime-info-1.1, this command takes only 1 second to complete:

/usr/bin/update-mime-database /usr/share/mime

With shared-mime-info-1.2 it takes over two minutes. There’s no CPU load, only disk I/O.

—Nikos Chantziaras, “x11-misc/shared-mime-info-1.2 updates database very slowly”, 2013-10-10

https://bugs.gentoo.org/show_bug.cgi?id=487504

fdatasync fix is causing a massive slowdown when updating mime cache

—Fryderyk Dziarmagowski, 2013-10-28

https://bugs.freedesktop.org/show_bug.cgi?id=61472

Lately pacman is acting a bit strange here. Some packages takes several minutes to install, even if they are small packages.

— Box0, “Pacman installation very slow for some packages”, 2013-11-09

https://bbs.archlinux.org/viewtopic.php?id=172645

The update-mime-database program in f20 takes about 20 minutes to run on my system. Judging from https://bugs.freedesktop.org/show_bug.cgi?id=70366 the reason is that it deliberately turns on synchronous disc I/O.

— Tom Horsley, “fdatasync() in update-mime-database, results in large performance degradation”, 2014-01-13

https://bugzilla.redhat.com/show_bug.cgi?id=1052173

At some point, I assume after an update in the last few weeks, the update-mime-database process has become agonizingly slow, taking 5 minutes or more to run. This is especially awful when running a system update that may call it multiple times.

— TheRealTachyon, “update-mime-database has become agonizingly slow in recent weeks”, 7 December 2016

https://forums.opensuse.org/showthread.php/521542-update-mime-database-has-become-agonizingly-slow-in-recent-weeks?s=540c7bf98f5c6ae0ff7b356012064c48

Clearly, it’s not just a problem on my machine.

Aside from patching the code, what can we do about this? The quick and dirty solution is to install eatmydata, a program that enables us to live life faster and more dangerously.

libeatmydata is a small LD_PRELOAD library designed to (transparently) disable fsync (and friends, like open(O_SYNC)). This has two side-effects: making software that writes data safely to disk a lot quicker and making this software no longer crash safe.

https://www.flamingspork.com/projects/libeatmydata/

Does it help?

$ sudo strace -o "eatmydata.log" -cw eatmydata /usr/bin/update-mime-database.real /usr/share/mime
$ head -n 7 eatmydata.log
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 51.04    1.734689        4663       372           read
 28.89    0.981759         777      1264           rename
  7.56    0.257037          84      3062       341 open
  3.38    0.114791        3280        35           unlink
  2.83    0.096046        4002        24           getdents

Yes, this version is positively celeritous by comparison. The total time varies by disk load and installed packages, but it is an improvement by about two orders of magnitude.

I made the workaround permanent by editing this line of /usr/bin/update-mime-database:

exec update-mime-database.real "$@"

so that it reads this instead:

exec eatmydata update-mime-database.real "$@"

Now for a slightly uncomfortable question: would it be possible to actually improve the performance of update-mime-database instead of intentionally subverting its design decisions?

Maybe. There is a little discussion on the bug trackers:

As far as I’m aware, update-mime-database currently doesn’t have smarts like gtk-update-icon-cache to know when it’s cache is current or not, to take advantage of any %posttrans optimization to run only once per transaction

— Rex Dieter, 2014-05-20

https://bugzilla.redhat.com/show_bug.cgi?id=1052173

We (should) only run it in packages that do add new packages XML files.

— Bastien Nocera, 2014-06-27

https://bugs.freedesktop.org/show_bug.cgi?id=70366#c26

we can’t break existing users of update-mime-database (which expect update-mime-database to update the database, even if the mtime of the version file isn’t newer than files in packages/).

— Bastien Nocera 2014-06-27

https://bugs.freedesktop.org/show_bug.cgi?id=70366#c46

Personally, I think a more drastic change might be in order. Heck, /usr/share/mime/mime.cache is already a binary format; if atomic updates and data consistency are the goal, why not go all the way and do it with real database transactions?

The ext3 filesystem developers appear to recommend this course of action:

Applications are expected to use fsync() or fdatasync(), and if that impacts their performance too much, to use a single berkdb or other binary database file, and not do something stupid with hundreds of tiny text files that only hold a few bytes of data in each text file.

— Theodore Ts’o, 2009-03-07

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54

Harsh, I know. And in the end, even the _good_ applications will decide that it’s not worth the performance penalty of doing an fsync(). In git, for example, where we generally try to be very very very careful, ‘fsync()’ on the object files is turned off by default.

Why? Because turning it on results in unacceptable behavior on ext3. Now, admittedly, the git design means that a lost new DB file isn’t deadly, just potentially very very annoying and confusing - you may have to roll back and re-do your operation by hand, and you have to know enough to be able to do it in the first place.

— Linus Torvalds, March 25, 2009

https://lwn.net/Articles/326505/

I’ve heard SQLite is widely available and well-maintained, but we could also stick with classic Unix tools like dbm / GNU dbm.

https://news.ycombinator.com/item?id=3377423

The real culprit here is really “fsync is slow”. A couple of months ago I write a little test that basically wrote a 1MB file, in increments of 16 byte write() calls, followed by either fsync() fdatasync() or.. nothing. That is, basically:

while(written < 1MB) {
   written += write(fd,buf,chunksz);
   //fsync(fd);
   //fdatasync(fd);
}

Here’s some of the times it took to write a 1MB file (Fedora 14, old hard drive)

chunksz = 16:
No sync'ing: 926ms
fdatasync: 727114ms
fsync: 3024498ms
(yes, THAT slow)

chunksz = 256:
No sync'ing: 65ms
fdatasync: 53786ms
fsync: 191553ms

chunksz=1024:
No sync'ing: 20ms
fdatasync: 20232ms
fsync: 48039ms

— noselasd, August 22, 2011

https://news.ycombinator.com/item?id=2913705

I don’t see any technical obstacles to this route, though I admit the social coordination problems are thorny and numerous. Adding new dependencies is not to be taken lightly, particularly for a mimetype library used by GNOME, KDE, and XFCE. Clearly they have tried to avoid extra dependencies.

$ apt-cache depends shared-mime-info
shared-mime-info
  Depends: libc6
  Depends: libglib2.0-0
  Depends: libxml2

Regardless of the ultimate solution, the update-mime-database maintainers made a unilateral decision that caused a drastic change in performance. That was a mistake. Worse, they released it without a runtime option to disable the new behavior.

Just a few months after the original patch, the reasoning at the time of the commit was no longer relevant, due to changing circumstances.

BTW, while I wrote this patch, I actually no longer need it for gnome-continuous, because we now run “triggers” on the build server side. But it does make sense for dpkg/rpm type systems that are operating on live roots.

— Colin Walters, 2013-10-11

https://bugs.freedesktop.org/show_bug.cgi?id=70366

The discussion on the bug tracker becomes increasingly contentious because the maintainers don’t want to budge.

I’d seen a number of real-world issues with corrupted icon caches historically, and haven’t seen any more recently than that.

Enough people lose power or have their kernel crash that it makes sense to default to safety.

— Colin Walters, 2013-10-11

https://bugs.freedesktop.org/show_bug.cgi?id=70366#c6

we want to update-mime-database as soon as possible (immediately after each package gets installed), and not force the user to wait for the chromium build to finish before he can run any of the other applications which were already updated.

— Alexandre Rostovtsev, 2013-10-11

https://bugs.freedesktop.org/show_bug.cgi?id=70366#c7

It’s weird to see individual tools try to improve safety though. I can only imagine the chaos and pain that would be caused if every single program on my system would do a sync after every output operation, be it my compression utility, word processor or system logger.

— Nikos Chantziaras 2013-10-12

https://bugs.freedesktop.org/show_bug.cgi?id=70366#c10

Yes, so that update-mime-database is correct, rather than fast by default. That seems to be the right way to do it.

— Bastien Nocera, 2014-05-23

https://bugs.freedesktop.org/show_bug.cgi?id=70366#c26

I’m of a strong mind that the cost-benefit of fsync isn’t worth it by default currently, so making this opt-in is really the way to go:

  • the slowdown is significant, x40 (or more) times slower
  • it’s easy to recover (e.g. run update-mime-info again).

— Rex Dieter, 2014-05-23

https://bugs.freedesktop.org/show_bug.cgi?id=70366#c27

My (biased) take on our positions: I insist that performance be restored by default asap while reasonable alternative solutions are sought out. I argue this data isn’t that critical to warrant sync writes anyway, certainly not anything like an rpm database (where corruption could be fatal), and is easily (and quickly!) able to be regenerated as needed.

— Rex Dieter, June 16, 2014

https://pagure.io/fesco/issue/1318

Maybe this will be fixed eventually, but as of August 2018, update-mime-database still calls fdatasync() by default every time it renames a file.

if (fdatasync(fd) == -1)
{
        set_error_from_errno(error);
        return -1;
}

https://github.com/freedesktop/xdg-shared-mime-info/blob/55a88f1c59beecd4e706e4340e8183336daf5e5f/update-mime-database.c#L969

Note that this is not the first time an ill-advised fsync() has degraded performance.

What happened with Firefox 3.0 was that the primary user interface thread called the sqllite library each time the user clicked on a link to go to a new page. The sqllite library called fsync(), which in ext3’s data=ordered mode, caused a large, visible latency which was visible to the user if there was a large file copy happening by another process.

[ … ]

An fsync() call every 15, 30, or 60 minutes, done by a thread which doesn’t block the application’s UI, would have never been noticed and would have not started the firestorm on Firefox’s bugzilla #421482. Very often, after a little thinking, a small change in the application is all that’s necessary for to really optimize the application’s fsync() usage.

— Theodore Ts’o, “Don’t fear the fsync!”, March 15, 2009

https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/

Related: