A process that writes something to a file generates dirty pages in the page cache. Dirty pages must be kept in sync with their backing store (the file defined on the block device).
In the Linux kernel the frequency to writeback dirty pages is controlled by two parameters: vm.dirty_ratio and vm.dirty_background_ratio. Both are expressed a percentage of dirtyable memory, that is the free memory + reclaimable memory (active and inactive pages in the LRU list).
The first parameter controls when a process will itself start writing out dirty data, the second controls when the kernel thread [pdflush] must be woken up and it will start writing out dirty data globally on behalf of the processes (dirty_background_ratio is always less than dirty_ratio; if dirty_background_ratio >= dirty_ratio the kernel automatically set it to dirty_ratio / 2).
Unfortunately, both percentages are int and the kernel doesn't even allow to set them below 5%. This means that in large memory machine those limits are too coarse. On a machine that has 1GB of dirtyable memory the kernel will start to writeback dirty pages in chunks of 50MB (!!!) minimum (with dirty_ratio = 5).
Even if it could be fine for batch or server machines, this behaviour could be unpleasant for desktop or latency-sensitive environments, when the large writeback can be perceived as a lack of responsiveness in the whole system.
IMHO we really need an interface to define fine-grained limits (to writeback small amount of data, often) and the best solution for this without breaking the compatibility with the old interface seems to introduce a new interface to define pcm (milli-percent) values.
At least this would resolve the problem for today machines... until 1TB memory servers will become popular...