Sunday, August 11, 2019

Kernel debugging using QEMU/KVM, virtme and crash

Introduction

Kernel development can be very time consuming, not only for the time required to compile the kernel itself (especially with a beefy .config), but also the tasks of deploying, testing and debugging represent a large portion of the work.

In this article we will explore a slightly different approach that allows to speed up kernel development, testing and debugging using QEMU/KVM.

Virtualized environment

Some times we can speed up this process modularizing the portion of code that we need to analize. However, this approach is not always doable (we may not know yet the specific portion of code to debug or simply it's just not possible to modularize it - i.e., core kernel components).

Moreover, working with a kernel module can potentially crash or seriously compromise the system, causing loss of work and extra work to reboot and restore the previous session.

Using a virtualized environment is definitely better from this point of view: the testing environment can be easily re-deployed, restarted and easily resumed in case of failures. Moreover, the development environment is never compromised.

The first tool that we are going to use is virtme (https://github.com/amluto/virtme).

This tool is a wrapper on top of QEMU/KVM that allows to quickly spin up an instance on a kernel build directory, creating a live sandbox of a just compiled kernel that shares the $HOME directory (in read-only) with the host.

This gives the big advantage of being able to easily deploy additional files into the testing instance, simply by copying them into a folder inside your $HOME.

Testing the kernel

We are going to create a helper script on top of virtme to quickly start a kernel to test:

righiandr@xps-13:~$ cat bin/kernel-test
#!/bin/bash
virtme-run --kdir . $* -a "nokaslr" --qemu-opts -m 1024 -smp 4 -s -qmp tcp:localhost:4444,server,nowait
righiandr@xps-13:~$

Assuming we have just compiled the kernel in $HOME/linux, we can quickly start it using:

righiandr@xps-13:~/linux$ kernel-test
[    0.000000] Linux version 5.3.0-rc3+ (righiandr@xps-13) (gcc version 9.1.0 (Ubuntu 9.1.0-9ubuntu2)) #43 SMP Sun Aug 11 08:46:51 CEST 2019
[    0.000000] Command line: earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps "virtme_stty_con=rows 28 cols 104 iutf8" TERM=xterm-256color rootfstype=9p rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro nokaslr init=/bin/sh -- -c "mount -t tmpfs run /run;mkdir -p /run/virtme/guesttools;/bin/mount -n -t 9p -o ro,version=9p2000.L,trans=virtio,access=any virtme.guesttools /run/virtme/guesttools;exec /run/virtme/guesttools/virtme-init"
...
virtme-init: console is ttyS0
root@(none):/#

Now we can start running our tests using the new kernel ("CTRL+a x" to close the session).

We can also assign a new block device to virtual instance, example (assign a 1GB raw disk):

righiandr@xps-13:~/linux$ dd if=/dev/zero of=/tmp/disk.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.414057 s, 2.6 GB/s
righiandr@xps-13:~/linux$ kernel-test --disk "disk1=/tmp/disk.img"
[    0.000000] Linux version 5.3.0-rc3+ (righiandr@xps-13) (gcc version 9.1.0 (Ubuntu 9.1.0-9ubuntu2)) #43 SMP Sun Aug 11 08:46:51 CEST 2019
...
virtme-init: console is ttyS0
root@(none):/# fdisk -l /dev/sda
Disk /dev/sda: 1 GiB, 1073741824 bytes, 2097152 sectors
Disk model: disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
root@(none):/#

Having a separate disk dedicated to the instance is really useful to test I/O / filesystem features.

Debugging kernel modules

A downside of using "virtme --kdir" is that we're going to miss the kernel module directory in the usual location /lib/modules/uname -r. For example, if we try to load xfs (compiled as an external module), we get the following error:

root@(none):/# modprobe xfs
modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/5.3.0-rc3+/modules.dep.bin'
modprobe: FATAL: Module xfs not found in directory /lib/modules/5.3.0-rc3+

To resolve this limitation we can follow the following simple steps:

Install the kernel modules in a temporary directory in $HOME (for example /home/righiandr/tmp/kmod):

righiandr@xps-13:~/linux$ make modules_install INSTALL_MOD_PATH=~/tmp/kmod

Start the instance:

righiandr@xps-13:~/linux$ kernel-test --disk "disk1=/tmp/disk.img"
...
virtme-init: console is ttyS0
root@(none):/#

Mount (bind) the temporary kernel modules directory to the standard module path (run this inside the instance):

root@(none):/# mount --bind /home/righiandr/tmp/kmod/lib/modules /lib/modules

At this point the kernel is able to load external modules as usual, example:

root@(none):/# mount --bind /home/righiandr/tmp/kmod/lib/modules/ /lib/modules
root@(none):/# modprobe xfs
[   17.257358] SGI XFS with security attributes, no debug enabled
root@(none):/#

Kernel debugging using virtme and crash

Now let's see how we can track down a soft lockup bug (explicitly added) in xfs. Let's assume we have explicitly introduced a soft lockup bug in xfs and we have an easy way to reproduce the bug:

root@(none):/# mkfs.xfs /dev/sda
meta-data=/dev/sda               isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@(none):/# mount /dev/sda /mnt
[  293.523348] XFS (sda): Mounting V5 Filesystem
[  293.529194] XFS (sda): Ending clean mount
root@(none):/# df /mnt/
[  324.109386] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [df:328]
[  324.110158] Modules linked in: xfs
[  324.110522] irq event stamp: 64386
[  324.110896] hardirqs last  enabled at (64385): [<ffffffff810038fa>] trace_hardirqs_on_thunk+0x1a/0x20
[  324.111843] hardirqs last disabled at (64386): [<ffffffff8100391a>] trace_hardirqs_off_thunk+0x1a/0x20
[  324.112756] softirqs last  enabled at (64384): [<ffffffff81e00338>] __do_softirq+0x338/0x435
[  324.113592] softirqs last disabled at (64377): [<ffffffff810a03ae>] irq_exit+0xbe/0xd0
[  324.114377] CPU: 0 PID: 328 Comm: df Not tainted 5.3.0-rc3+ #43
[  324.114940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[  324.115820] RIP: 0010:xfs_fs_statfs+0x87/0x90 [xfs]
[  324.116290] Code: 38 e8 7d b5 52 e1 48 8d bb d0 01 00 00 e8 71 b5 52 e1 48 8d bb 38 02 00 00 e8 65 b5 52 e1 48 8d bb 20 01 00 00 e8 49 fa af e1 <eb> fe 0f 1f 80 00 00 00 00 0f 1f 44 00 00 85 f6 75 03 31 c0 c3 55
[  324.118115] RSP: 0018:ffffc90000217df0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
[  324.118829] RAX: 0000000000000000 RBX: ffff88803ba4e000 RCX: 0000000000000000
[  324.119506] RDX: ffff88803dc176e0 RSI: 8888888888888889 RDI: 0000000000000246
[  324.120214] RBP: ffffc90000217df8 R08: 0000000000000000 R09: 0000000000000001
[  324.120943] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88803b3ec640
[  324.121673] R13: ffffc90000217e90 R14: 0000000000000002 R15: 0000000000000000
[  324.122409] FS:  00007fd7c6ccf580(0000) GS:ffff88803dc00000(0000) knlGS:0000000000000000
[  324.123239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  324.123832] CR2: 00007fd7c6bc20d0 CR3: 000000003a91a002 CR4: 0000000000360ef0
[  324.124567] Call Trace:
[  324.124841]  statfs_by_dentry+0x73/0xa0
[  324.125245]  vfs_statfs+0x1b/0xc0
[  324.125592]  user_statfs+0x5b/0xb0
[  324.125956]  __do_sys_statfs+0x28/0x60
[  324.126357]  __x64_sys_statfs+0x16/0x20
[  324.126766]  do_syscall_64+0x65/0x1d0
[  324.127153]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  324.127679] RIP: 0033:0x7fd7c6bee1fb
[  324.128053] Code: c3 66 0f 1f 44 00 00 48 8b 05 91 8c 0d 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 65 8c 0d 00 f7 d8 64 89 01 48
[  324.129961] RSP: 002b:00007ffe140f5ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
[  324.130739] RAX: ffffffffffffffda RBX: 00007ffe140f7f7f RCX: 00007fd7c6bee1fb
[  324.131473] RDX: 00000000ffffffff RSI: 00007ffe140f5ef0 RDI: 00007ffe140f7f7f
[  324.132206] RBP: 00007ffe140f5ef0 R08: 00007ffe140f6013 R09: 0000000000000032
[  324.132940] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe140f5f90
[  324.133672] R13: 0000000000000000 R14: 000056187bdb37c0 R15: 000056187bdb3780

At this point the system is unresponsive due to the soft lockup, but we got a kernel oops at least that already tells a lot about the bug. However we are unable to run any other command inside the instance.

In cases like this it can be pretty useful to generate a memory dump of the instance and use a separate tool (i.e., crash) to analize the state of the system after the crash.

In our environment we can use the QEMU Machine Protocol (QMP) to generate a memory dump. If we pay attention to the options used in kernel-test to spin up the QEMU/KVM instance (using virtme-run) we can see that we appended the following: "-qmp tcp:localhost:4444,server,nowait". With these options QEMU/KVM starts a QMP server listening on port 4444 that can accept QMP commands (see also https://wiki.qemu.org/Documentation/QMP).

To easily generate a memory dump of our test instance we are going to create an additional helper script called kernel-dump:

righiandr@xps-13:~$ cat bin/kernel-dump
#!/usr/bin/expect -f

if { $argc < 1 } {
 send_user "usage: kernel-dump vmcore.img\n"
 exit
}
set out_file [lindex $argv 0]

spawn telnet localhost 4444
send "{ \"execute\": \"qmp_capabilities\" }\r"
expect "{\"return\": {}}"
send "{\"execute\":\"dump-guest-memory\",\"arguments\":{\"paging\":false,\"protocol\":\"file:$out_file\"}}\r"
expect "{\"return\": {}}"

Now we can simply use this helper script to generate the memory dump in /tmp/vmcore.img:

righiandr@xps-13:~/linux$ kernel-dump  /tmp/vmcore.img
spawn telnet localhost 4444
{ "execute": "qmp_capabilities" }
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 4}, "package": "Debian 1:4.0+dfsg-0ubuntu5"}, "capabilities": ["oob"]}}
{"return": {}}
{"execute":"dump-guest-memory","arguments":{"paging":false,"protocol":"file:/tmp/vmcore.img"}}
{"timestamp": {"seconds": 1565520296, "microseconds": 278432}, "event": "STOP"}
{"timestamp": {"seconds": 1565520296, "microseconds": 963140}, "event": "DUMP_COMPLETED", "data": {"result": {"total": 1074003968, "status": "completed", "completed": 1074003968}}}
{"timestamp": {"seconds": 1565520296, "microseconds": 963240}, "event": "RESUME"}
{"return": {}}

At this point we have a regular memory dump (similar to "virsh dump --memory-only" if you are more familiar with libvirt instances) and we can use "crash" to analize it:

righiandr@xps-13:~/linux$ crash /tmp/vmcore.img vmlinux
...
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: vmlinux
    DUMPFILE: /tmp/vmcore.img
        CPUS: 4
        DATE: Sun Aug 11 12:44:55 2019
      UPTIME: 00:05:10
LOAD AVERAGE: 1.55, 0.41, 0.14
       TASKS: 86
    NODENAME: (none)
     RELEASE: 5.3.0-rc3+
     VERSION: #43 SMP Sun Aug 11 08:46:51 CEST 2019
     MACHINE: x86_64  (1992 Mhz)
      MEMORY: 1 GB
       PANIC: ""
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffff8261b800  (1 of 4)  [THREAD_INFO: ffffffff8261b800]
         CPU: 0
       STATE: TASK_RUNNING
     WARNING: panic task not found

crash>

Now we are able to dig into the system even after the crash, for example we can still look at the kernel log (running dmesg), list the processes (ps) and even get a backtrace of a target PID:

crash> dmesg | tail -36
[  324.109386] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [df:328]
[  324.110158] Modules linked in: xfs
[  324.110522] irq event stamp: 64386
[  324.110896] hardirqs last  enabled at (64385): [<ffffffff810038fa>] trace_hardirqs_on_thunk+0x1a/0x20
[  324.111843] hardirqs last disabled at (64386): [<ffffffff8100391a>] trace_hardirqs_off_thunk+0x1a/0x20
[  324.112756] softirqs last  enabled at (64384): [<ffffffff81e00338>] __do_softirq+0x338/0x435
[  324.113592] softirqs last disabled at (64377): [<ffffffff810a03ae>] irq_exit+0xbe/0xd0
[  324.114377] CPU: 0 PID: 328 Comm: df Not tainted 5.3.0-rc3+ #43
[  324.114940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[  324.115820] RIP: 0010:xfs_fs_statfs+0x87/0x90 [xfs]
[  324.116290] Code: 38 e8 7d b5 52 e1 48 8d bb d0 01 00 00 e8 71 b5 52 e1 48 8d bb 38 02 00 00 e8 65 b5 52 e1 48 8d bb 20 01 00 00 e8 49 fa af e1 <eb> fe 0f 1f 80 00 00 00 00 0f 1f 44 00 00 85 f6 75 03 31 c0 c3 55
[  324.118115] RSP: 0018:ffffc90000217df0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
[  324.118829] RAX: 0000000000000000 RBX: ffff88803ba4e000 RCX: 0000000000000000
[  324.119506] RDX: ffff88803dc176e0 RSI: 8888888888888889 RDI: 0000000000000246
[  324.120214] RBP: ffffc90000217df8 R08: 0000000000000000 R09: 0000000000000001
[  324.120943] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88803b3ec640
[  324.121673] R13: ffffc90000217e90 R14: 0000000000000002 R15: 0000000000000000
[  324.122409] FS:  00007fd7c6ccf580(0000) GS:ffff88803dc00000(0000) knlGS:0000000000000000
[  324.123239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  324.123832] CR2: 00007fd7c6bc20d0 CR3: 000000003a91a002 CR4: 0000000000360ef0
[  324.124567] Call Trace:
[  324.124841]  statfs_by_dentry+0x73/0xa0
[  324.125245]  vfs_statfs+0x1b/0xc0
[  324.125592]  user_statfs+0x5b/0xb0
[  324.125956]  __do_sys_statfs+0x28/0x60
[  324.126357]  __x64_sys_statfs+0x16/0x20
[  324.126766]  do_syscall_64+0x65/0x1d0
[  324.127153]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  324.127679] RIP: 0033:0x7fd7c6bee1fb
[  324.128053] Code: c3 66 0f 1f 44 00 00 48 8b 05 91 8c 0d 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 65 8c 0d 00 f7 d8 64 89 01 48
[  324.129961] RSP: 002b:00007ffe140f5ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
[  324.130739] RAX: ffffffffffffffda RBX: 00007ffe140f7f7f RCX: 00007fd7c6bee1fb
[  324.131473] RDX: 00000000ffffffff RSI: 00007ffe140f5ef0 RDI: 00007ffe140f7f7f
[  324.132206] RBP: 00007ffe140f5ef0 R08: 00007ffe140f6013 R09: 0000000000000032
[  324.132940] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe140f5f90
[  324.133672] R13: 0000000000000000 R14: 000056187bdb37c0 R15: 000056187bdb3780
crash> 

crash> ps
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
      0      0   0  ffffffff8261b800  RU   0.0       0      0  [swapper/0]
>     0      0   1  ffff88803da48000  RU   0.0       0      0  [swapper/1]
>     0      0   2  ffff88803da4b2c0  RU   0.0       0      0  [swapper/2]
>     0      0   3  ffff88803da532c0  RU   0.0       0      0  [swapper/3]
      1      0   0  ffff88803d9c8000  IN   0.2    4084   2028  virtme-init
      2      0   3  ffff88803d9cb2c0  IN   0.0       0      0  [kthreadd]
      3      2   0  ffff88803d9eb2c0  ID   0.0       0      0  [rcu_gp]
      4      2   0  ffff88803d9e8000  ID   0.0       0      0  [rcu_par_gp]
      5      2   0  ffff88803d9f0000  ID   0.0       0      0  [kworker/0:0]
      6      2   0  ffff88803d9f32c0  ID   0.0       0      0  [kworker/0:0H]
      8      2   0  ffff88803d9fb2c0  ID   0.0       0      0  [mm_percpu_wq]
      9      2   0  ffff88803da00000  IN   0.0       0      0  [ksoftirqd/0]
     10      2   2  ffff88803da032c0  ID   0.0       0      0  [rcu_sched]
     11      2   0  ffff88803da08000  RU   0.0       0      0  [migration/0]
     12      2   0  ffff88803da0b2c0  RU   0.0       0      0  [kworker/0:1]
     13      2   0  ffff88803da50000  IN   0.0       0      0  [cpuhp/0]
     14      2   1  ffff88803da632c0  IN   0.0       0      0  [cpuhp/1]
     15      2   1  ffff88803da60000  IN   0.0       0      0  [migration/1]
     16      2   1  ffff88803da68000  IN   0.0       0      0  [ksoftirqd/1]
     18      2   1  ffff88803da732c0  ID   0.0       0      0  [kworker/1:0H]
     19      2   2  ffff88803da70000  IN   0.0       0      0  [cpuhp/2]
     20      2   2  ffff88803da80000  IN   0.0       0      0  [migration/2]
     21      2   2  ffff88803da832c0  IN   0.0       0      0  [ksoftirqd/2]
     22      2   2  ffff88803dad32c0  ID   0.0       0      0  [kworker/2:0]
     23      2   2  ffff88803dad0000  ID   0.0       0      0  [kworker/2:0H]
     24      2   3  ffff88803dad8000  IN   0.0       0      0  [cpuhp/3]
     25      2   3  ffff88803dadb2c0  IN   0.0       0      0  [migration/3]
     26      2   3  ffff88803dae32c0  IN   0.0       0      0  [ksoftirqd/3]
     27      2   3  ffff88803dae0000  ID   0.0       0      0  [kworker/3:0]
     28      2   3  ffff88803daf0000  ID   0.0       0      0  [kworker/3:0H]
     29      2   1  ffff88803daf32c0  IN   0.0       0      0  [kdevtmpfs]
     30      2   1  ffff88803dbab2c0  ID   0.0       0      0  [netns]
     31      2   2  ffff88803dba8000  IN   0.0       0      0  [kauditd]
     32      2   2  ffff88803d0732c0  IN   0.0       0      0  [khungtaskd]
     33      2   2  ffff88803d070000  IN   0.0       0      0  [oom_reaper]
     34      2   2  ffff88803d078000  ID   0.0       0      0  [writeback]
     35      2   3  ffff88803d07b2c0  IN   0.0       0      0  [kcompactd0]
     36      2   3  ffff88803d0c32c0  IN   0.0       0      0  [ksmd]
     37      2   3  ffff88803d0c0000  IN   0.0       0      0  [khugepaged]
     39      2   2  ffff88803d0cb2c0  ID   0.0       0      0  [kworker/u8:1]
     43      2   3  ffff88803d138000  ID   0.0       0      0  [kworker/3:1]
     48      2   1  ffff88803d150000  ID   0.0       0      0  [kworker/1:1]
    131      2   1  ffff88803d370000  ID   0.0       0      0  [kintegrityd]
    132      2   0  ffff88803d3932c0  ID   0.0       0      0  [kblockd]
    133      2   2  ffff88803d390000  ID   0.0       0      0  [blkcg_punt_bio]
    134      2   3  ffff88803d0c8000  ID   0.0       0      0  [tpm_dev_wq]
    135      2   1  ffff88803d158000  ID   0.0       0      0  [ata_sff]
    136      2   0  ffff88803d15b2c0  ID   0.0       0      0  [md]
    137      2   2  ffff88803d128000  ID   0.0       0      0  [edac-poller]
    138      2   3  ffff88803d12b2c0  ID   0.0       0      0  [devfreq_wq]
    139      2   0  ffff88803d18b2c0  IN   0.0       0      0  [watchdogd]
    140      2   2  ffff88803d188000  ID   0.0       0      0  [kworker/2:1]
    143      2   1  ffff88803d17b2c0  IN   0.0       0      0  [kswapd0]
    144      2   0  ffff88803d168000  ID   0.0       0      0  [kworker/u9:0]
    145      2   1  ffff88803d16b2c0  IN   0.0       0      0  [ecryptfs-kthrea]
    148      2   1  ffff88803d13b2c0  ID   0.0       0      0  [kthrotld]
    149      2   2  ffff88803bcab2c0  ID   0.0       0      0  [acpi_thermal_pm]
    150      2   3  ffff88803bca8000  IN   0.0       0      0  [scsi_eh_0]
    151      2   2  ffff88803bf88000  ID   0.0       0      0  [scsi_tmf_0]
    152      2   1  ffff88803bf8b2c0  ID   0.0       0      0  [nvme-wq]
    153      2   3  ffff88803b8a0000  ID   0.0       0      0  [nvme-reset-wq]
    154      2   3  ffff88803b8a32c0  ID   0.0       0      0  [nvme-delete-wq]
    155      2   3  ffff88803b920000  IN   0.0       0      0  [scsi_eh_1]
    156      2   3  ffff88803b9232c0  ID   0.0       0      0  [scsi_tmf_1]
    157      2   3  ffff88803b9432c0  IN   0.0       0      0  [scsi_eh_2]
    158      2   1  ffff88803b940000  ID   0.0       0      0  [scsi_tmf_2]
    159      2   2  ffff88803b9932c0  ID   0.0       0      0  [kworker/u8:2]
    161      2   0  ffff88803ba38000  ID   0.0       0      0  [ipv6_addrconf]
    173      2   2  ffff88803ba532c0  ID   0.0       0      0  [kstrp]
    192      2   2  ffff88803bb9b2c0  ID   0.0       0      0  [charger_manager]
    193      2   2  ffff88803bb98000  ID   0.0       0      0  [kworker/2:1H]
    213      1   1  ffff88803bb90000  IN   0.3   18312   2804  systemd-udevd
    296      1   3  ffff88803bb4b2c0  IN   0.2    4216   2496  bash
    299      2   0  ffff88803bb932c0  ID   0.0       0      0  [kworker/0:1H]
    302      2   2  ffff88803b708000  ID   0.0       0      0  [xfsalloc]
    303      2   0  ffff88803b70b2c0  ID   0.0       0      0  [xfs_mru_cache]
    308      2   1  ffff88803bb732c0  ID   0.0       0      0  [kworker/1:2]
    312      2   1  ffff88803d3332c0  ID   0.0       0      0  [kworker/1:1H]
    321      2   1  ffff88803bb70000  ID   0.0       0      0  [xfs-buf/sda]
    322      2   3  ffff88803d3732c0  ID   0.0       0      0  [xfs-conv/sda]
    323      2   2  ffff88803d1532c0  ID   0.0       0      0  [xfs-cil/sda]
    324      2   0  ffff88803b9db2c0  ID   0.0       0      0  [xfs-reclaim/sda]
    325      2   3  ffff88803b9d8000  ID   0.0       0      0  [xfs-eofblocks/s]
    326      2   0  ffff88803b9d0000  ID   0.0       0      0  [xfs-log/sda]
    327      2   1  ffff88803b9d32c0  IN   0.0       0      0  [xfsaild/sda]
>   328    296   0  ffff88803d330000  RU   0.1    2556   1076  df

crash> bt 328
PID: 328    TASK: ffff88803d330000  CPU: 0   COMMAND: "df"
    [exception RIP: xfs_fs_statfs+135]
    RIP: ffffffffa008fd17  RSP: ffffc90000217df0  RFLAGS: 00000282
    RAX: 0000000000000000  RBX: ffff88803ba4e000  RCX: 0000000000000000
    RDX: ffff88803dc176e0  RSI: 8888888888888889  RDI: 0000000000000246
    RBP: ffffc90000217df8   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000001  R11: 0000000000000001  R12: ffff88803b3ec640
    R13: ffffc90000217e90  R14: 0000000000000002  R15: 0000000000000000
    CS: 0010  SS: 0018
 #0 [ffffc90000217e00] statfs_by_dentry at ffffffff81352353
 #1 [ffffc90000217e20] vfs_statfs at ffffffff8135265b
 #2 [ffffc90000217e40] user_statfs at ffffffff8135275b
 #3 [ffffc90000217e88] __do_sys_statfs at ffffffff813527d8
 #4 [ffffc90000217f20] __x64_sys_statfs at ffffffff81352826
 #5 [ffffc90000217f30] do_syscall_64 at ffffffff81004745
 #6 [ffffc90000217f50] entry_SYSCALL_64_after_hwframe at ffffffff81c00091
    RIP: 00007fd7c6bee1fb  RSP: 00007ffe140f5ee8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007ffe140f7f7f  RCX: 00007fd7c6bee1fb
    RDX: 00000000ffffffff  RSI: 00007ffe140f5ef0  RDI: 00007ffe140f7f7f
    RBP: 00007ffe140f5ef0   R8: 00007ffe140f6013   R9: 0000000000000032
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007ffe140f5f90
    R13: 0000000000000000  R14: 000056187bdb37c0  R15: 000056187bdb3780
    ORIG_RAX: 0000000000000089  CS: 0033  SS: 002b
crash> 

Looking at all these details we can see that the lockup happened at xfs_fs_statfs+135. We can use gdb inside crash to list the source code where the lockup happened, unfortunately in our case it seems that the symbol cannot be found:

crash> gdb list *(xfs_fs_statfs+135)
No symbol "xfs_fs_statfs" in current context.
gdb: gdb request failed: list *(xfs_fs_statfs+135)
crash> 

This is because xfs_fs_statfs is defined in an external module and crash doesn't know where to find it. So we can help it to find the missing symbols using the command "mod":

crash> mod
     MODULE       NAME     SIZE  OBJECT FILE
ffffffffa00f5340  xfs  1327104  (not loaded)  [CONFIG_KALLSYMS]
crash> mod -s xfs /home/righiandr/tmp/kmod/lib/modules/5.3.0-rc3+/kernel/fs/xfs/xfs.ko
     MODULE       NAME     SIZE  OBJECT FILE
ffffffffa00f5340  xfs  1327104  /home/righiandr/tmp/kmod/lib/modules/5.3.0-rc3+/kernel/fs/xfs/xfs.ko 
crash> 

Now we can use "gdb list" to look at the exact line of code that caused the lockup:

crash> gdb set listsize 3
crash> gdb list *(xfs_fs_statfs+135)
0xffffffffa008fd17 is in xfs_fs_statfs (fs/xfs/xfs_super.c:1102).
1101            spin_lock(&mp->m_sb_lock);
1102            while (1);
1103            statp->f_bsize = sbp->sb_blocksize;
crash> 

There it is! In this case it's pretty obvious to notice the endless while(1) loop after a spin_lock(), some times it can be way harder to spot a lockup, but the virtualized environment in combination with crash can really help a lot to speed up the tests and, often, it allows get additional information that would have been impossible to see in a typical development/testing environment.

Conclusion

In this article we have seen a possible workflow to test and debug the kernel in a virtualized environment. This makes kernel development safer, faster and easier, compared to the usual workflow of developing and testing a kernel on the same development system.

This is all possible simply using QEMU/KVM, virtme and crash with a couple of small helper scripts.

I hope anyone searching for kernel development tips find this useful. Let me know your kernel development tips and tricks in the comments below!

1 comment:

Unknown said...

ooh thanks for this.
its a piece of the puzzle.