Sunday, March 25, 2007

disk I/O per-process accounting

A common problem in Linux is how to find the most I/O intensive process when there is an intense disk activity of the system. In some cases you may want to kill the crazy process that caused this condition.

A lot of tools in Linux are able to deliver generic stats for your system: top, sar, dstat, iostat, vmstat, ... but unfortunately none of them is capable to show the particular disk activity done by each process.

The following kernel patch enables the userspace tools to access per-process I/O statistics (WARNING: I tested it only with 2.6.18.3 vanilla!!!):

--- include/linux/sched.h.orig 2007-03-25 21:42:50.000000000 +0200
+++ include/linux/sched.h 2007-03-25 21:42:56.000000000 +0200
@@ -990,6 +990,12 @@
struct rcu_head rcu;

/*
+ * disk I/O accounting informations
+ */
+ unsigned long long acct_disk_read;
+ unsigned long long acct_disk_write;
+
+ /*
* cache last used pipe for splice
*/
struct pipe_inode_info *splice_pipe;
--- block/ll_rw_blk.c.orig 2007-03-25 18:05:51.000000000 +0200
+++ block/ll_rw_blk.c 2007-03-25 18:12:51.000000000 +0200
@@ -2586,6 +2586,12 @@
disk_round_stats(rq->rq_disk);
rq->rq_disk->in_flight++;
}
+
+ if (rw == READ) {
+ current->acct_disk_read += nr_sectors;
+ } else {
+ current->acct_disk_write += nr_sectors;
+ }
}

/*
--- fs/proc/array.c.orig 2007-03-25 18:13:07.000000000 +0200
+++ fs/proc/array.c 2007-03-25 18:15:00.000000000 +0200
@@ -412,7 +412,7 @@

res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
%lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
-%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu\n",
+%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %llu %llu %llu\n",
task->pid,
tcomm,
state,
@@ -457,7 +457,9 @@
task_cpu(task),
task->rt_priority,
task->policy,
- (unsigned long long)delayacct_blkio_ticks(task));
+ (unsigned long long)delayacct_blkio_ticks(task),
+ task->acct_disk_read,
+ task->acct_disk_write);
if(mm)
mmput(mm);
return res;

The patch adds at the end of the process status array (see /usr/src/linux/fs/proc/array.c) two entries:
  1. the I/O read activity of the process
  2. the I/O write activity of the process
You can access to them via the proc filesystem, the process array is in /proc/[pid]/stat (see `man 5 proc`).

For example the following command shows the "top 10" list of the most I/O intensive processes of my system:

$ cat /proc/[0-9]*/stat | awk '{print $2 ":" $43 + $44}' | sort -rn -t : -k 2 | head
(pdflush):275240
(reiserfs/0):179064
(thunderbird-bin):74376
(cupsd):18904
(firefox-bin):15640
(Xorg):13632
(netstat):13512
(gaim):9096
(kswapd0):6032
(syslog-ng):4568

As expected at the first place there's the pdflush (the worker_thread that writes back filesystem data), followed by the reiserfs/0 worker_thread... but obviously you can't kill them! they're kernel thread... so in my case the most active I/O intensive userspace process is thunderbird! ;-)

You can also write your custom top-like userspace tools to monitor the I/O rate of each process, or a program to see if your processes are doing more reads or writes, etc...

No comments: