Tag Archives: Linux

Linux Kernel Module Build Flags

Building out-of-tree Linux kernel modules is pretty easy. The page https://www.kernel.org/doc/html/latest/kbuild/modules.html shows some basic info. There are some great ways to tinker with the CFLAGS (and other build flags).

The kernel module build uses rules/macros in the linux/scripts/Makefile.lib.

# deprecated old-style kernel build flags
EXTRA_CFLAGS+=-DEXTRA_CFLAGS

The kernel best practice to set build flags is through the ccflags variable.

# these flags added to all C compiles
ccflags-y += -Wall -Werror

File-specific flags can also be set.

# Set DEBUG flag when compiling hello.c
CFLAGS_hello.o += -DDEBUG

A useful feature is also removing flags. I’m currently working with a set of vendor code that has a warning triggered by a newer version of GCC. The compile is failing because the vendor makefiles use -Werror. I can’t modify the code due to license restrictions but I can modify the makefile.

# turn off the -Werror flag
ccflags-remove-y += -Werror

Similarly, can remove flags for a specific file.

# turn off -Werror just for hello.c
CFLAGS_REMOVE_hello.o += -Werror

SSH Over Serial Port

What started as a rant on Mastadon has turned into an actual solution. The problem I need to solve is an out-of-band (non-network) connection between two machines. The problem also was I needed a super secure mechanism to connect to a local Linux box which was VPN’d into a corporate network. (I didn’t want a second network interface because I’m not smart enough to set up security to properly protect a dual-homed machine. The most secure connection is no connection.)

I started thinking: what if I could run a terminal aware protocol such as telnet carried over a serial port. Rather than have a raw character connection over the RS232, let’s carry something that has a terminal control plane. My first thought was hacking telnetd/telnet code to route over serial port. But a colleague (hi, KyleS!) suggested socat.

I’ve used socat before. It’s an amazing tool that can connect disparate socket/file descriptors together.

I prototyped the idea using my Fedora Linux laptop on one side and a Raspberry-Pi on the other. The two machines are connected via USB FTDI serial and a null modem adapter. I am using the R-Pi as the ssh server side. Rather than worry about telnet, I used the already running ssh.

On the Raspberry-Pi (server side):

 socat GOPEN:/dev/ttyUSB0,cfmakeraw,b115200 TCP:localhost:22,reuseaddr

On the Fedora Linux (client side):

 socat TCP-LISTEN:2222,bind=localhost,reuseaddr GOPEN:/dev/ttyUSB0,cfmakeraw,b115200

And then the connection from the client to the server is as simple as:

ssh -p 2222 pi@localhost

And now my serial connection can run Vim with full glorious syntax highlighting. The terminal resize SIGWINCH works. The connection is at 115200 so slightly slower than even 10Mbps Ethernet. But I have an out of band (non-Ethernet) connection between two machines so if (when) I bork the network config (often), I can recover the remote. I can edit code through the serial connection (edit my borked network scripts).

Next logical step is to set up the same socat/Dropbear trick with router firmware.

UPDATE 20230207. The initial connection can be very quirky if there is anything leftover in the serial port. Unlike a file handle or a network socket, closing the serial port will leave detritus available to read. If the ssh connection fails for some reason (in my case, hitting problems with known_hosts and ‘localhost’), the trash on the serial port will confuse the ssh at each ends.

I’m tinkering with some solutions. I’m hoping calls to ‘stty -F /dev/ttyUSB0 sane’ will help. Will update this post when I’ve found a reliable solution.

Command Line JSON

I just stumbled across a wonderful tool: command line JSON validation/pretty-print.

https://docs.python.org/3/library/json.html#module-json.tool

I often work with JSON with our routers. I use curl to read from our API endpoints which return JSON. Getting back large blobs of JSON is useful but hard to read all jumbled together.

% curl –basic –user admin:${CP_PASSWORD} http://172.16.22.1/api/status/wlan/radio/0

Command line output of router wifi survey api output.

Now instead I can pipe to python3 -m json.tool and the JSON will be cleanly formatted and humanly readable.

C ‘enum’ is Annoying.

Writing a blog post is hard. “I’ll do it later,” I keep thinking. Maybe it’s like flossing–“I’ll do it later.” Next thing I know I’m spraying blood onto the ceiling while the hygienist tut-tuts about my bad habits. I floss (write) now and my future self will thank me.

I’m tinkering with nl80211, the successor to WEXT. WEXT is the Linux Wireless Extensions, a set of standardized ioctl for communicating userspace to the kernel level wireless drivers. WEXT is amazing but limited so the smart people got together and created a much more flexible system around NetLink.

As I’m tinkering with nl80211, I discover a frustrating extensive use of C enum. Using C enum in a network protocol is frustrating because with a large enum (say, greater than 10 elements), converting from an integer in a debug message back to the actual enum is well nigh impossible.

Case in point, the nl80211.h enum nl80211_attrs is ~400 lines and about 260 elements. In my little nl80211 baby-steps code, I fetch NL80211_CMD_GET_INTERFACE and get back an array filled with attribute + value.

        int i;
        for (i=0 ; i<NL80211_ATTR_MAX ; i++ ) {
                if (tb_msg[i]) {
                        printf("%d=%p type=%d len=%d\n", i, (void *)tb_msg[i], nla_type(tb_msg[i]), nla_len(tb_msg[i]));
                        hex_dump("msg", (unsigned char *)tb_msg[i], nla_len(tb_msg[i]));
                }
        }

Again this is baby steps code. I have no idea what I’m actually doing. I’m poking the box.

46=0x14db108 type=46 len=4
msg 0x00001080 08 00 2e 00 ....
206=0x14db110 type=206 len=1
msg 0x00001090 05 . 
217=0x14db120 type=217 len=4 
msg 0x000010a0 08 00 d9 00 ....
256=0x14db118 type=256 len=4
msg 0x000010b0 08 00 00 01 ....

Now I have a list of attributes coming back from the call. What the foo is 217? 256? A visit to gdb will give me answers!  The last element in the enum is NL80211_ATTR_PORT_AUTHORIZED so what is its value? First print the symbol, gives me the symbol. Print as hex (p/x) and as decimal (p/d) shows me the numerical value.

(gdb) p NL80211_ATTR_PORT_AUTHORIZED
$1 = NL80211_ATTR_PORT_AUTHORIZED
(gdb) p/x NL80211_ATTR_PORT_AUTHORIZED 
$2 = 0x103
(gdb) p/d NL80211_ATTR_PORT_AUTHORIZED 
$3 = 259

To find the symbol value of my integer, I can do the reverse in gdb. The typecast will convert an integer to the enum type.

(gdb) p (enum nl80211_attrs)46
$6 = NL80211_ATTR_GENERATION
(gdb) p (enum nl80211_attrs)206
$7 = NL80211_ATTR_MAX_CSA_COUNTERS
(gdb) p (enum nl80211_attrs)217
$8 = NL80211_ATTR_EXT_FEATURES
(gdb) p (enum nl80211_attrs)256
$9 = NL80211_ATTR_SCHED_SCAN_MAX_REQS

Having to dig into gdb for every enum in nl80211.h will be tiring. The problems grow in other header files that are nests of #ifdefs.

I me personally prefer using #define for symbols like this. The explicit link of a symbol to a value in source form is helpful.

 

Continuing to Debug a OpenSSL Build

Continuing to tinker with the OpenSSL problems I’ve been having. Have a chunk of software whose building requires the older version of OpenSSL to be installed. But I want the newest OpenSSL dev package on my machine. We’re not going to get along.

I discovered I could selectively static link some of the libraries into the program. This was completely mind blowing feature to me. I started using the GCC linker way way back and was quite disconcerted by the -lfoo syntax which would find libfoo.a for linking.

In trying to force this software to build against a local copy of the older OpenSSL, I found problems with the final program trying to find the OpenSSL dynamic libraries. LD_LIBRARY_PATH would allow me to aim the software at my local lib but I was hoping there would be something simpler (that didn’t require setting that env var every time). Could I static link the executable? It’s been a while since I tried to static link a Linux app.

Quick google and found an even better answer:

https://stackoverflow.com/questions/6578484/telling-gcc-directly-to-link-a-library-statically

In the Makefile, I simply had to use -l:filename to link directly to a specific static library. With the -L flag, I could aim at a specific library location.

LDFLAGS += -lffi -lutil -lz -lm -lpthread -l:libssl.a -l:libcrypto.a -lrt -ldl

I had no idea this was possible with the linker. I’ve been using the GNU compilers for so many years and had no idea this was possible.

 

 

Debugging a Linux Hard Lockup

Building the linux-wireless-testing tree described in https://wireless.wiki.kernel.org/en/developers/documentation/git-guide   I have multiple MediaTek 7612 USB cards I’d like to get working. There’s exciting work going on in wireless to bring the MediaTek wifi into the mainline kernel.

I built the kernel on a Shuttle, loaded my new kernel. Plugged in the Alfa AWUS036ACM dongle, loaded up wpa_supplicant and my scripts. Desktop was unresponsive: hard lock up.

Power cycle. I’m on Fedora 28 which is systemd which uses journald. The journalctl -b -1 didn’t show any kernel panic.

Debugging a hard lock up of a kernel module seems easier than a kernel boot. The netconsole module https://www.kernel.org/doc/Documentation/networking/netconsole.txt will let me see the kernel log messages leading up to the lock-up.

I also need to investigate the watchdogs https://www.kernel.org/doc/Documentation/lockup-watchdogs.txt

Netconsole.

% sudo modprobe netconsole @/enp3s0,5566@172.19.10.254/

[69677.819810] netconsole: unknown parameter '@/enp3s0,5566@172' ignored
[69677.819903] console [netcon0] enabled
[69677.819904] netconsole: network logging started

OK, what did I screw up. Is there a way I can tickle the kernel into logging a message to test my config? Not from userspace, but Google search found the following useful post. (Oh, StackOverflow (and related) is there nothing you can’t do?)

https://serverfault.com/questions/140354/how-to-add-message-that-will-be-read-with-dmesg/140358#140358

Load/unload the module. Still not seeing any messages. Oh, right. I always forget this step. https://elinux.org/Debugging_by_printing#Netconsole_resources

echo 8 > /proc/sys/kernel/debug

Skipping over the “unknown parameter” problem for now and just setting the parameters manually. Here is what I have:

...trillian:coconut% cd /sys/kernel/config/netconsole/
...trillian:netconsole% ls
arthurdent/
...trillian:netconsole% cd arthurdent/
...trillian:arthurdent% ls
dev_name enabled extended local_ip local_mac local_port remote_ip remote_mac remote_port
...trillian:arthurdent% cat remote_mac
70:88:6b:81:ac:64
...trillian:arthurdent% cat remote_ip
172.16.17.181
...trillian:arthurdent% cat remote_port
5566
...trillian:arthurdent% cat local_ip
172.16.17.92
...trillian:arthurdent% cat local_port
6665
...trillian:arthurdent% cat dev_name
enp0s31f6

So now run netcat on arthurdent, my other machine.

nc -l -u 5566

Can test my netcat from trillian by sending a udp packet:

ls | nc -u 172.16.17.181 5566

And now time to turn the crank and watch the chaos unfold. Start up my wpa_supplicant and wpa_cli script and boom crash.

[ 546.841947] wlp0s20f0u1u2: authenticate with c4:b9:cd:dc:48:40
[ 547.140202] wlp0s20f0u1u2: send auth to c4:b9:cd:dc:48:40 (try 1/3)
[ 547.140226] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
[ 547.140228] PGD 800000083e025067 P4D 800000083e025067 PUD 83e072067 PMD 0
[ 547.140233] Oops: 0000 [#1] SMP PTI
[ 547.140236] CPU: 2 PID: 2503 Comm: wpa_supplicant Not tainted 4.19.0-rc2-wt #1
[ 547.140238] Hardware name: Shuttle Inc. SZ170/FZ170, BIOS 2.09 08/01/2017
[ 547.140258] RIP: 0010:ieee80211_wake_txqs+0x1e3/0x3d0 [mac80211]
[ 547.140261] Code: 4c 89 fe 4c 89 ef 48 8b 92 b0 02 00 00 e8 45 d6 2e f4 48 8b 3c 24 e8 0c bc 00 f4 48 83 c5 08 48 3b 6c 2
4 08 74 a0 4c 8b 7d 00 <41> 0f b6 57 11 3b 54 24 18 75 e6 4d 8d a7 28 ff ff ff f0 49 0f ba
[ 547.140263] RSP: 0018:ffff95038ea83ed0 EFLAGS: 00010293
[ 547.140265] RAX: ffff95038419e978 RBX: ffff95038419e000 RCX: ffff950384b18760
[ 547.140267] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff950384b18828
[ 547.140269] RBP: ffff95038419e970 R08: 00039e862fcdb16e R09: ffff95038a9b4230
[ 547.140271] R10: 0000000000000420 R11: ffffb8efc46c78d0 R12: ffff9503798251d0
[ 547.140273] R13: ffff950384b18760 R14: 0000000000000000 R15: 0000000000000000
[ 547.140275] FS: 00007f42ed60cdc0(0000) GS:ffff95038ea80000(0000) knlGS:0000000000000000
[ 547.140277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.140279] CR2: 0000000000000011 CR3: 0000000835b30001 CR4: 00000000003606e0
[ 547.140281] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 547.140283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 547.140284] Call Trace:

(Crash truncated here.)  Posted to the linux-wireless mailing list and am now getting some help.