Category Archives: Python

Command Line JSON

I just stumbled across a wonderful tool: command line JSON validation/pretty-print.

https://docs.python.org/3/library/json.html#module-json.tool

I often work with JSON with our routers. I use curl to read from our API endpoints which return JSON. Getting back large blobs of JSON is useful but hard to read all jumbled together.

% curl –basic –user admin:${CP_PASSWORD} http://172.16.22.1/api/status/wlan/radio/0

Command line output of router wifi survey api output.

Now instead I can pipe to python3 -m json.tool and the JSON will be cleanly formatted and humanly readable.

Fun with Python Regex

I’m a computer language nerd. I like programming languages. I’m in no way an expert. I do enjoy digging into new languages and even digging into the low level jiggery-pokery of languages I use day-to-day. Like C. But US$200 for the C standard language spec? Are you kidding me? No. Digging around on the committee website I found a draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

My first pass at the C enum parser is incredibly simple. I don’t want to write a full language parser because (a) that’s a lot of work and (b) it’s already been done before. I want something that would take a couple days tops to create and let me get back to fiddling with nl80211.

I’m using Python for the parser because I know Python pretty well. I’ve tinkered with the the C++ std::regex library as well but not for this project.

C comments can be either block comments surrounded by /*  */  or line comments starting with //.  For my toy parser, I’m handling comments very naively.

I’m mostly interested in the nl80211.h so I’m making some assumptions about the code format as I’m puttering with regex.

A C enum BNF-ish is (from the above pdf):

Screen Shot 2018-12-09 at 9.05.57 AM

I’m amusing myself with WordPress’ text formatting. I get distracted way too easily. (I dug further into the doc to find the definition of enumeration-constant.)

enum-specifier:

enum identifieropt { enumerator-list }
enum identifieropt { enumerator-list , }
enum identifier

enumerator-list:

enumerator
enumerator-list , enumerator

enumerator:

enumeration-constant
enumeration-constant = constant-expression

enumeration-constant:

identifier

Focus, Dave. Focus. HTML is just another language and it’s easy to get distracted. The break between LHS and RHS above is driving me nuts but I will move on. Focus!

A C identifier can be described by the Python regex “[a-zA-Z_][a-zA-Z_0-9]*”  Python regex whitespace is “\s” Required whitespace would be “\s+”. My first naive regex that searched for a starting enum: “enum\s+([a-zA-Z_][a-zA-Z_0-9]*)\s+{”  I’m assuming the enum+identifier+openbrace is on the same line.

The enumerator-list is another regex but more complicated because of the optional expression. I started with: “([a-zA-Z_][a-zA-Z_0-9]*)\s+,”  for a simple match. The constant-expression match would be “([a-zA-Z_][a-zA-Z_0-9]*)\s*=\s*([a-zA-Z_0-9]*)?,” and the copy paste started getting on my nerves.

I started fiddling with a Python printf-y .format() and a stumbled across a brain blast. Python f-strings are amazing when used with regex. Instead of trying to build a .format() or a %s block, I can assign my regex to a var. And I have a very readable regex. I can build up my regex piece by piece (for greater or for ill).

# C-style variable name
identifier = "[a-zA-Z_][a-zA-Z_0-9]*"

# using f strings to save myself some confusion 
open_brace = "\{"
close_brace = "\}"
whitespace = "\s+"
number = "-?[0-9]+"
operator = "(?P<operator>\+|-|<<)" # XXX subset of actual C operators

# I'm very sure this is not the proper use of the term 'atom'
# atom := number | identifier 
atom = f"(?:{identifier}|{number})"

# expression := atom
# := atom operator atom 
expression = f"({atom}({whitespace}{operator}{whitespace}{atom})?)"

# enum member regex
symbol_matcher = re.compile(f"(?P<identifier>{identifier})({whitespace}={whitespace}(?P<expression>{expression}))?")

# start of an emum declaration (XXX assumes open brace on same line as the
# 'enum' keywoard
enum_matcher = re.compile(f"enum{whitespace}({identifier}){whitespace}{open_brace}")

 

The f-string uses variables from Python’s context. So f”{identifier}{whitespace}{operator}{whitespace}” will expand to “[a-zA-Z_][a-zA-Z_0-9]*\s+(\+|-|<<)\s+” The f-string is much easier to read. The ?P<name> is a Python regex feature that stores the grouped regex expression into a key “name”.

s = "NL80211_NAN_FUNC_ATTR_MAX = NUM_NL80211_NAN_FUNC_ATTR - 1"
robj = symbol_matcher.search(s)
print(robj)
print(robj.groups())
print(robj.groupdict())

The code snippet gives me the following. Definitely need a lot of testing.

<_sre.SRE_Match object; span=(0, 57), match='NL80211_NAN_FUNC_ATTR_MAX = NUM_NL80211_NAN_FUNC_>
('NL80211_NAN_FUNC_ATTR_MAX', ' = NUM_NL80211_NAN_FUNC_ATTR - 1', 'NUM_NL80211_NAN_FUNC_ATTR - 1', 'NUM_NL80211_NAN_FUNC_ATTR - 1', ' - 1', '-')
{'identifier': 'NL80211_NAN_FUNC_ATTR_MAX', 'expression': 'NUM_NL80211_NAN_FUNC_ATTR - 1', 'operator': '-'}

Parsing C ‘enum’ with Python Regex.

In my previous post I mentioned I find C ‘enum’ a big annoying. An enum value captured in a network track or a packet hexdump is difficult to track backwards to a symbolic value. The nl80211.h header file uses enum extensively. As I continue to learn netlink and nl80211, I’d like a quick way to convert those enum into a human value.

Why not parse the header file and decode the enum? Python is my go-to for small string tasks.  (I mentioned I was parsing C enum in Python to a friend and he said, “That’s a very Dave move.” I’ll take it as a compliment, I suppose.)

C enum is straightforward: start counting as zero and auto increment. If there is an RHS expression, then that enum takes on that value and the auto increment continues from there.

enum nl80211_commands {
     /* don't change the order or add anything between, this is ABI! */
     NL80211_CMD_UNSPEC,

     NL80211_CMD_GET_WIPHY, /* can dump */
     NL80211_CMD_SET_WIPHY,
     NL80211_CMD_NEW_WIPHY,
     NL80211_CMD_DEL_WIPHY,
[snip]

(At some point I wonder if I should spring for the extra WordPress plugin for code formatting. Maybe.)

In the nl80211_commands above, NL80211_CMD_UNSPEC == 0, then NL80211_CMD_GET_WIPHY == 1. Simple counter. But danger lurks.

The RHS can be an expression. The expression can be a simple value.

enum nl80211_user_reg_hint_type {
        NL80211_USER_REG_HINT_USER      = 0,
        NL80211_USER_REG_HINT_CELL_BASE = 1,
        NL80211_USER_REG_HINT_INDOOR    = 2,
};

Or a complicated expression.

enum nl80211_tdls_peer_capability {
        NL80211_TDLS_PEER_HT = 1<<0,
        NL80211_TDLS_PEER_VHT = 1<<1,
        NL80211_TDLS_PEER_WMM = 1<<2,
};

The expression can reference previous values in the same enum as the next example. (Emphasis added.)

enum nl80211_sched_scan_plan {
        __NL80211_SCHED_SCAN_PLAN_INVALID,
        NL80211_SCHED_SCAN_PLAN_INTERVAL,
        NL80211_SCHED_SCAN_PLAN_ITERATIONS,

        /* keep last */
        __NL80211_SCHED_SCAN_PLAN_AFTER_LAST,
        NL80211_SCHED_SCAN_PLAN_MAX =
                __NL80211_SCHED_SCAN_PLAN_AFTER_LAST - 1
};

Here’s my favorite example, showing the enum auto increment counter being reset. The example below adds new symbol identical to an existing symbol’s value and the auto increment continues on its merry way.

enum nl80211_commands {
[snip]
     NL80211_CMD_GET_BEACON,
     NL80211_CMD_SET_BEACON,
     NL80211_CMD_START_AP,
     NL80211_CMD_NEW_BEACON = NL80211_CMD_START_AP,
     NL80211_CMD_STOP_AP,
     NL80211_CMD_DEL_BEACON = NL80211_CMD_STOP_AP,
[snip]

I think it would be interesting to create a regex that can parse the enum. There are of course simpler ways to do this: I could just continue to use gdb. Most of the enums are small so not a big deal to manually count them. The large enum I could copy to a new file and manually count. But I like tinkering with regexes. And I’ve had this problem of decoding large enum for as long as I’ve used C (a long time). And it seems like a fun little project.

I’ve had co-workers do woodwork to relax. Several co-workers are mountain bikers (Boise is fantastic for mountain biking.) Video games are always a good way to relax. I like to tinker with small code projects.

Debugging an OpenSSL Version Problem.

I’m working in a system using an older version of OpenSSL. The system builds both a cross compiled version of Python and the same version of Python built for the host. Having the same version of python both in the embedded host and the local host allows me to run the Python scripts locally on the same exact same version of Python. I can test my Python scripts locally without having to push them to the remote firmware, which is slow and expensive.

Python uses OpenSSL. The embedded Python build used a local repo copy of the 1.0.x series OpenSSL. Consequently, the host Python build also needs to use the same older 1.0.x series OpenSSL. But the host Python build used the system (default) installed dev version of OpenSSL. I would have had to remove the current version of OpenSSL and install the older version. I was troubled by that requirement.

I built the OpenSSL 1.0.x, installed it to a $HOME tree. I then had to do the hard part of aiming the system’s host Python build at the older version. Seemed simple enough: find where CFLAGS and LDFLAGS was defined, change those to aim at my local OpenSSL.

CFLAGS+=$(HOME)/include/openssl

LDFLAGS+=$(HOME)/lib

But it wasn’t working. I knew it wasn’t working because the Python link would report not being able to find the newer OpenSSL symbol names so I knew the build was still using the newer version of header files.

I needed to debug my changes to the build. Along the way, I found some useful GCC options.

Debugging a header file include problem is straightforward: sabotage the build. I added “#error foo” to the host’s /usr/include/openssl/rsa.h and “#error bar” to the old openssl/rsa.h   Fun “make -B Modules/_ssl.o” and see which include file was being hit.  (The -B flag to make forces Make to rebuild the file regardless of timestamp.)  The build would fail with “error foo”. I was still getting the system header. I needed to see where GCC was finding its header files.

https://stackoverflow.com/questions/344317/where-does-gcc-look-for-c-and-c-header-files

specifically https://stackoverflow.com/a/41718748/39766

The set of paths where the compiler looks for the header files can be checked by the command:-

cpp -v

I added the ‘-v’ flag to the build, asking GCC to pass it along to the preprocessor.

gcc -Xpreprocessor -v $(CFLAGS) -c [etcetera]

Output looks like:

#include "..." search starts here:
#include <...> search starts here:
/home/dpoole/include/openssl
...
/usr/lib/gcc/x86_64-linux-gnu/7/include
/usr/local/include
/usr/lib/gcc/x86_64-linux-gnu/7/include-fixed
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.

Oops. First include should have been /home/dpoole/include not /home/dpoole/include/openssl   Fixed my CFLAGS and I’m now building Python against my local copy of openssl.

Python crashes on startup but progress!