Debugging a Linux Hard Lockup

Building the linux-wireless-testing tree described in https://wireless.wiki.kernel.org/en/developers/documentation/git-guide   I have multiple MediaTek 7612 USB cards I’d like to get working. There’s exciting work going on in wireless to bring the MediaTek wifi into the mainline kernel.

I built the kernel on a Shuttle, loaded my new kernel. Plugged in the Alfa AWUS036ACM dongle, loaded up wpa_supplicant and my scripts. Desktop was unresponsive: hard lock up.

Power cycle. I’m on Fedora 28 which is systemd which uses journald. The journalctl -b -1 didn’t show any kernel panic.

Debugging a hard lock up of a kernel module seems easier than a kernel boot. The netconsole module https://www.kernel.org/doc/Documentation/networking/netconsole.txt will let me see the kernel log messages leading up to the lock-up.

I also need to investigate the watchdogs https://www.kernel.org/doc/Documentation/lockup-watchdogs.txt

Netconsole.

% sudo modprobe netconsole @/enp3s0,5566@172.19.10.254/

[69677.819810] netconsole: unknown parameter '@/enp3s0,5566@172' ignored
[69677.819903] console [netcon0] enabled
[69677.819904] netconsole: network logging started

OK, what did I screw up. Is there a way I can tickle the kernel into logging a message to test my config? Not from userspace, but Google search found the following useful post. (Oh, StackOverflow (and related) is there nothing you can’t do?)

https://serverfault.com/questions/140354/how-to-add-message-that-will-be-read-with-dmesg/140358#140358

Load/unload the module. Still not seeing any messages. Oh, right. I always forget this step. https://elinux.org/Debugging_by_printing#Netconsole_resources

echo 8 > /proc/sys/kernel/debug

Skipping over the “unknown parameter” problem for now and just setting the parameters manually. Here is what I have:

...trillian:coconut% cd /sys/kernel/config/netconsole/
...trillian:netconsole% ls
arthurdent/
...trillian:netconsole% cd arthurdent/
...trillian:arthurdent% ls
dev_name enabled extended local_ip local_mac local_port remote_ip remote_mac remote_port
...trillian:arthurdent% cat remote_mac
70:88:6b:81:ac:64
...trillian:arthurdent% cat remote_ip
172.16.17.181
...trillian:arthurdent% cat remote_port
5566
...trillian:arthurdent% cat local_ip
172.16.17.92
...trillian:arthurdent% cat local_port
6665
...trillian:arthurdent% cat dev_name
enp0s31f6

So now run netcat on arthurdent, my other machine.

nc -l -u 5566

Can test my netcat from trillian by sending a udp packet:

ls | nc -u 172.16.17.181 5566

And now time to turn the crank and watch the chaos unfold. Start up my wpa_supplicant and wpa_cli script and boom crash.

[ 546.841947] wlp0s20f0u1u2: authenticate with c4:b9:cd:dc:48:40
[ 547.140202] wlp0s20f0u1u2: send auth to c4:b9:cd:dc:48:40 (try 1/3)
[ 547.140226] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
[ 547.140228] PGD 800000083e025067 P4D 800000083e025067 PUD 83e072067 PMD 0
[ 547.140233] Oops: 0000 [#1] SMP PTI
[ 547.140236] CPU: 2 PID: 2503 Comm: wpa_supplicant Not tainted 4.19.0-rc2-wt #1
[ 547.140238] Hardware name: Shuttle Inc. SZ170/FZ170, BIOS 2.09 08/01/2017
[ 547.140258] RIP: 0010:ieee80211_wake_txqs+0x1e3/0x3d0 [mac80211]
[ 547.140261] Code: 4c 89 fe 4c 89 ef 48 8b 92 b0 02 00 00 e8 45 d6 2e f4 48 8b 3c 24 e8 0c bc 00 f4 48 83 c5 08 48 3b 6c 2
4 08 74 a0 4c 8b 7d 00 <41> 0f b6 57 11 3b 54 24 18 75 e6 4d 8d a7 28 ff ff ff f0 49 0f ba
[ 547.140263] RSP: 0018:ffff95038ea83ed0 EFLAGS: 00010293
[ 547.140265] RAX: ffff95038419e978 RBX: ffff95038419e000 RCX: ffff950384b18760
[ 547.140267] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff950384b18828
[ 547.140269] RBP: ffff95038419e970 R08: 00039e862fcdb16e R09: ffff95038a9b4230
[ 547.140271] R10: 0000000000000420 R11: ffffb8efc46c78d0 R12: ffff9503798251d0
[ 547.140273] R13: ffff950384b18760 R14: 0000000000000000 R15: 0000000000000000
[ 547.140275] FS: 00007f42ed60cdc0(0000) GS:ffff95038ea80000(0000) knlGS:0000000000000000
[ 547.140277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.140279] CR2: 0000000000000011 CR3: 0000000835b30001 CR4: 00000000003606e0
[ 547.140281] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 547.140283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 547.140284] Call Trace:

(Crash truncated here.)  Posted to the linux-wireless mailing list and am now getting some help.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s