arighi's blog: libnosync: are *sync really necessary?

I was asking myself why user applications should care about the synchronization of their buffers. I suppose it's a task dedicated to the operating system, that actually knows what is better for the system. Looking at the manpage of FSYNC(2) we can see that:


NAME
       fsync,  fdatasync  -  synchronize  a file's complete in-core state with
       that on disk

[snip]

DESCRIPTION
       fsync copies all in-core parts of a file to disk, and waits  until  the
       device  reports  that all parts are on stable storage.  It also updates
       metadata stat information. It does  not  necessarily  ensure  that  the
       entry  in the directory containing the file has also reached disk.  For
       that an explicit fsync on the file descriptor of the directory is  also
       needed.

[snip]

OK, but... why? I wrote a simple glibc wrapper (see below) in order to have "fake" fsync() and fdatasync() - not for the simple sync(), so you can continue to run the famous `sync; sync; sync`, if you're paranoid enough ;-) - and I was impressed by the heavy use of them by the user applications... and the speed-up if you disable them.

In fact, if you have a journaled filesystem (hey! otherwise I think you should really consider to move to a journaled filesystem!) all the flushes of metadata causes a lot of writes in the journal (for example in ext3 a single fsync() causes the write of *everything*) and it's a lot of I/O for your PC. And this is a disadvantage also in term of power consumption.

So, where is the trick?! After some thoughts I realized that the main reson should be to *be* really sure that the internal metadata of the applications (like a DBMS for example), built on-top-of the filesystem, have been correctly written to the backing store area. Everything that implements its own concept of "journal" should use the *sync() functions. Otherwise if a crash occurs just in the middle of an "important" write, well... at the resume the metadata of your filesystem will be ok, but the metadata of the application (mapped into the filesystem data) could result corrupted. So, in order to have a robust desktop it's surely better to have those syscalls enabled.

OK, but is this really important for *all* your applications??? for example I don't think it's important for amarok... for example try to run a simple `strace -qfe trace=fdatasync,fsync amarok`. In my system I can see 36 syscalls of *sync!!! and this is too much... BTW I've nothing against amarok, it's a great application & my favourite music player :-)

Following the *sync() lib wrapper. Use this (always without any warranty) if you want to run your non-critical application faster. [IDEA] It would be interesting to run your apps with the wrapper and execute a `sync; sync; sync` just before the screensaver... :-)


/*
 *  libnosync
 *
 *  Copyright (C) 2007 Andrea Righi 
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, write to the Free Software
 *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 *
 * Compile:
 *     gcc -fPIC -Wall -O2 -g -shared -W1,-soname,libnosync.so.0 \
 *     -o libnosync.so.0.1 -lc -ldl
 *
 * Use:
 *     export LD_PRELOAD=`pwd`/libnosync.so.0.1
 *
 * Remove:
 *     unset LD_PRELOAD
 */

#define _GNU_SOURCE
#define __USE_GNU

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#ifdef DEBUG
#define DPRINTF(format, args...) fprintf(stderr, "debug: " format, ##args)
#else
#define DPRINTF(format, args...)
#endif

int fdatasync(int) __attribute__ ((weak, alias("wrap_fsync")));
int fsync(int) __attribute__ ((weak, alias("wrap_fsync")));

int wrap_fsync(int fd)
{
        DPRINTF("called fsync/fdatasync on fd = %d\n", fd);
        return 0;
}

arighi's blog

Wednesday, May 16, 2007

libnosync: are *sync really necessary?

No comments: