The kernel best practice to set build flags is through the ccflags variable.
# these flags added to all C compiles
ccflags-y += -Wall -Werror
File-specific flags can also be set.
# Set DEBUG flag when compiling hello.c
CFLAGS_hello.o += -DDEBUG
A useful feature is also removing flags. I’m currently working with a set of vendor code that has a warning triggered by a newer version of GCC. The compile is failing because the vendor makefiles use -Werror. I can’t modify the code due to license restrictions but I can modify the makefile.
# turn off the -Werror flag
ccflags-remove-y += -Werror
Similarly, can remove flags for a specific file.
# turn off -Werror just for hello.c
CFLAGS_REMOVE_hello.o += -Werror
What started as a rant on Mastadon has turned into an actual solution. The problem I need to solve is an out-of-band (non-network) connection between two machines. The problem also was I needed a super secure mechanism to connect to a local Linux box which was VPN’d into a corporate network. (I didn’t want a second network interface because I’m not smart enough to set up security to properly protect a dual-homed machine. The most secure connection is no connection.)
I started thinking: what if I could run a terminal aware protocol such as telnet carried over a serial port. Rather than have a raw character connection over the RS232, let’s carry something that has a terminal control plane. My first thought was hacking telnetd/telnet code to route over serial port. But a colleague (hi, KyleS!) suggested socat.
I’ve used socat before. It’s an amazing tool that can connect disparate socket/file descriptors together.
I prototyped the idea using my Fedora Linux laptop on one side and a Raspberry-Pi on the other. The two machines are connected via USB FTDI serial and a null modem adapter. I am using the R-Pi as the ssh server side. Rather than worry about telnet, I used the already running ssh.
And then the connection from the client to the server is as simple as:
ssh -p 2222 pi@localhost
And now my serial connection can run Vim with full glorious syntax highlighting. The terminal resize SIGWINCH works. The connection is at 115200 so slightly slower than even 10Mbps Ethernet. But I have an out of band (non-Ethernet) connection between two machines so if (when) I bork the network config (often), I can recover the remote. I can edit code through the serial connection (edit my borked network scripts).
Next logical step is to set up the same socat/Dropbear trick with router firmware.
UPDATE 20230207. The initial connection can be very quirky if there is anything leftover in the serial port. Unlike a file handle or a network socket, closing the serial port will leave detritus available to read. If the ssh connection fails for some reason (in my case, hitting problems with known_hosts and ‘localhost’), the trash on the serial port will confuse the ssh at each ends.
I’m tinkering with some solutions. I’m hoping calls to ‘stty -F /dev/ttyUSB0 sane’ will help. Will update this post when I’ve found a reliable solution.
Long story: when our routers boot, we default to the IP address of 192.168.0.1/24. When we need to do firmware upgrades using TFTP, uboot will also use 192.168.0.1/24. When I have multiple routers, all running with the default IP address that need to be upgraded, I cannot connect all the routers to my same Linux box. Confusion will reign.
My solution has been to use a Raspberry Pi connected to each router. The R-Pi main Ethernet port is connected to my cubicle LAN. A second USB Ethernet dongle connects the R-Pi to the router.
The R-Pi acts as a firewall client between the prototype firmware (which can and often does have whacky bugs in my code). The R-Pi is plugged into one of router’s LAN ports. (Usually there’s a single specific port hardwired in uboot that comes up.) The R-Pi runs the TFTP server. The R-Pi runs minicom to the router’s debug serial port. I also run a serial console to the R-Pi itself. If the router testing goes doolally and confuses the R-Pi TCP/IP stack, I can connect over serial (yay, out of band control!) and reboot the R-Pi.
Before the Great R-Pi Shortage during the pandemic, I purchased enough R-Pi that I can have an individual R-Pi for each development router. I find this to be a very reliable solution, except when I forget which R-Pi connects to which router.
I ran into a problem with libnl and sequence numbers while working on my wifi scanner application. My app would subscript to the NL80211_CMD_NEW_SCAN_RESULTS and NL80211_CMD_SCHED_SCAN_RESULTS events. On receiving those events, I would send a NL80211_CMD_GET_SCAN.
Sometimes, I would start getting a -NLE_SEQ_MISMATCH on receiving the scan survey data. Once that occurred, I would receive no more scan data.
The sequence numbers are tracked per socket in the private ‘struct nl_sock’. Both are initialized to time(0) in __alloc_socket().
struct nl_sock
{
...
unsigned int s_seq_next;
unsigned int s_seq_expect;
...
};
The nlmsghdr contains the sequence number. Note the same sequence number can be used multiple times. When there is more data that can fit in a single nl_msg, the data is broken across multiple nl_msg, indicated by flag NLM_F_MULTI (“Multipart message, terminated by NLMSG_DONE”).
struct nlmsghdr {
__u32 nlmsg_len; /* Length of message including header */
__u16 nlmsg_type; /* Message content */
__u16 nlmsg_flags; /* Additional flags */
__u32 nlmsg_seq; /* Sequence number */
__u32 nlmsg_pid; /* Sending process port ID */
};
The nlmsghdr->nlmsg_seq is assigned in nl_complete_msg() which is called before the nl_msg is sent to the nl_sock. The socket ‘next’ is incremented at this time.
if (nlh->nlmsg_seq == NL_AUTO_SEQ) {
nlh->nlmsg_seq = sk->s_seq_next++;
NL_DBG(3, "nl_complete_msg(%p): Increased next " \
"sequence number to %d\n",
sk, sk->s_seq_next);
}
The sequence number is checked in recvmsgs(), which is the core of libnl’s nl_msg receive handling. The recvmsgs() is responsible for calling several callbacks and for checking sequence numbers.
if (hdr->nlmsg_type == NLMSG_DONE ||
hdr->nlmsg_type == NLMSG_ERROR ||
hdr->nlmsg_type == NLMSG_NOOP ||
hdr->nlmsg_type == NLMSG_OVERRUN) {
/* We can't check for !NLM_F_MULTI since some netlink
* users in the kernel are broken. */
sk->s_seq_expect++;
NL_DBG(3, "recvmsgs(%p): Increased expected " \
"sequence number to %d\n",
sk, sk->s_seq_expect);
}
When the NLMSG_DONE is received, the expected sequence number is increased. If that DONE or ERROR aren’t received, the expected sequence number is never incremented.
The sequence number seq_next is advanced when a new nl_msg is created. The sequence number seq_expect is advanced when an incoming nl_msg is DONE or ERROR (or NOOP or OVERRUN, which I haven’t encountered yet).
The sequence number checking occurs also in recvmsgs(). In my code, I’m not using the NL_CB_SEQ_CHECK callback and leaving auto-ack mode enabled, so the sequence number checking in recvmsgs() is enforced. (As I’m still learning libnl, I was using the same pattern as the iw library.) Note this check happens before the DONE+ERROR check which increments the seq_expect.
/* Sequence number checking. The check may be done by
* the user, otherwise a very simple check is applied
* enforcing strict ordering */
if (cb->cb_set[NL_CB_SEQ_CHECK]) {
NL_CB_CALL(cb, NL_CB_SEQ_CHECK, msg);
/* Only do sequence checking if auto-ack mode is enabled */
} else if (!(sk->s_flags & NL_NO_AUTO_ACK)) {
NL_DBG(3, "recvmsgs(%p) : nlmsg_seq=%d s_seq_expect=%d\n",
sk, hdr->nlmsg_seq, sk->s_seq_expect);
if (hdr->nlmsg_seq != sk->s_seq_expect) {
if (cb->cb_set[NL_CB_INVALID])
NL_CB_CALL(cb, NL_CB_INVALID, msg);
else {
err = -NLE_SEQ_MISMATCH;
nl_msg_dump(msg, stdout);
goto out;
}
}
}
A simple transaction could look like:
sk->seq_next
sk->seq_expect
hdr->seq
72968
72968
new nl_msg; assigned 72968; seq_next++
72969
72968
72968 (MULTI)
72969
72968
72968 (MULTI)
72969
72968
72968 (MULTI+DONE); seq match! seq_expect++
72969
72969
Moving on to the problem I encountered. The trigger of the problem is a 2nd CMD_NEW_SCAN_RESULTS or CMD_SCHED_SCAN_RESULTS received while already reading a CMD_GET_SCAN. The kernel doesn’t like interleaving the get-scan-results apparently so refuses with an -EBUSY error. The request increments the sk->seq_next but the error response nl_msg->seq doesn’t match the sk->seq_expect (which is tracking the previous request) and so the nl_msg is dropped before hitting the DONE check that would increment sk->seq_expect.
Once the sequence numbers get into this state, there is no exit. The nl_socket is perpetually at the wrong sequence number. The only solution is to close/re-open the socket on receiving a -NLE_SEQ_MISMATCH.
A better solution would be to avoid getting into this state in the first place. Perhaps not send a new CMD_GET_SCAN while a previous fetch is already running. I’m still tinkering with solutions.
I’m learning Kismet https://www.kismetwireless.comĀ I need a way to do remote wireless surveys. If I can send non-technical $customer a Magic Box that will send me a survey. WiFi problems are very environmental so having a second opinion of the environment will help me get my head around the issues.
I’m building Kismet from their git repo. The newer versions have a web interface that seems very enticing. With an internet connected device and a nice VPN overlay, I think I could get the remote information I need.
Kismet has instructions to build under Ubuntu. But for $reasons I’m running Fedora. I’ve been tinkering around with the list of required packages and I think I’ve hit the magic list:
The authenticity of host 'nn.nn.nn.nn (nn.nn.nn.nn)' can't be established.
ECDSA key fingerprint is SHA256:wls31bhGHFgGLYT403xsznbNS53Gzjlwrda5v7OUEZ4.
Are you sure you want to continue connecting (yes/no)?
Um. Wut?
This message always annoys me. I don’t like blindly accepting unknown connections.