Distro Thoughts #1

So, after some consideration, I’ve decided to resurrect my previous efforts at building a Linux distribution. Mostly – because I’d like to tinker with a light-weight Linux that’s easily customizable. Something that really goes “back to basics”.

My first experiments were attempting to bootstrap a Clang/Musl build variant. Ugh!

My initial environment was Ubuntu 18.04 – with a modern C++ toolchain. I thought it’d be easy to populate a chroot environment, especially without cross compiling required. The LLVM code looks generally pretty clean – big, it does a lot, but clean. The build system though? See my previous comments on build system messes. The ‘one repo’ approach assumes a rather complete environment and does not bootstrap well to a new rootfs. I thought a hacky compile from source build would be neat – but this does not seem doable.

Too much time spent on this today, time to go outside and enjoy the sun.

Exploding Complexity

I’ve been thinking a lot about embedded Linux and Yocto and what constitutes an “embedded” system lately. My first PC was an 8086 clone with 2 5 1/2 floppy drive and no hard drive. I had a small collection of disks I’d trade out for my projects.

I remember running Linux on an old 386 with 4mb of RAM. I browsed the web with a 486 sporting 8Mb RAM. In more recent memory, I ran a few machines with 512 – 1Gb of RAM that managed to browse the web and accomplish most of what I do on a day to day basis – editing code and writing.

What exactly are programs doing today that they sit eating gobs of RAM? The smallest programming running on my desktop right now is the new Windows 10 terminal application – and THAT still takes up 10+MB of RAM. The argument is “they do so much more”, but I don’t think I can buy it. Software has jumped the shark.

I stumbled upon the people at suckless.org recently. While I don’t share they’re love of weird tiled user interfaces – I do find myself looking over their lists of software thinking – why can’t Linux running on Windows XP era hardware halfway decently. It’s been a long, long times since I heard about restoring older PCs to service by switching over operating systems.

Professionally, one of the larger code bases I deal with is still in C – and I find myself yearning for C++ as I stare at piles and piles of copy-paste, poorly maintained crap. That said, as I look over the software landscape, I’m not sure it’s language choice that matters. Old C code generally has corners of repeated if(error) checks, switch statements / function pointer based polymorphism, and the occasional macro based non-template (or worse, magic with untyped pointers). We can use C++ to good effect to reduce / eliminate / clarify these situations. But, instead of enabling cleaner code, we create brand new realms of furthered complexity with undebuggable template meta programs and spiraling complexity of exception handling. At least in C it’s obvious if you didn’t check for a malloc return – C++ happily throws std::bad_alloc and you’re off to the complexity farm. Bring in the even newer happier cohorts of Perl, Java, and now Python, Javascript, and extended Java / C#. Adding extra gas to the fire is web package management with nuget, pip, and npm.

All of this is to “manage” the complexity of software. And yet, at the end of the day, computers today are still keyboard/mouse or even simpler, a touchscreen. The actual complexity – high bandwidth modems, extended security concerns, increased screen resolution – could easily be dealt with in our old C/Pascal land primitives OR are being dealt with that already.

As I sit staring at the build screen of Yocto taking hours upon hours to finish compilation, I’m starting to form a new and strong opinion on software complexity:

  1. Design failures of any given abstraction geometrically increase the complexity of successive layers failing to address the design failure.
  2. The ability to complicate software scales linearly with the productivity gain of an abstraction.

All hope is not lost though – developers can choose to manage complexity and ease its impact with the the same tools that allow that complexity to exist in the first place. Unfortunately, I don’t think this is happening. Indeed, the combination of opinions 1 & 2 – and the current state of code quality (that seems to be an industry wide pandemic) indicates a final concluding observation:

Applying Improved productivity technology to legacy software conceals underlying design flags of the system.

I feel Linux in general is reaching an odd ‘critical mass’, where the ancient discussions of system flaws and design needs are catching up to us. Meanwhile, I’m starting to come to see the light of long term colleagues that hate C++. Going from C to see C++ now seems to be like digging with a shovel versus a bobcat. Sadly, I don’t think C++ generally has the right level of discipline when considering the mess.

C++ Has Me In a Funk

After spending the past year or so developing large Python applications, I’ve returned back to the C++ fold for my day to day work. For a long time, I developed primarily in C++, enough that the compiler became my tool of choice for simple automation tasks over the more logical scripting platforms. Large applications in a scripting language? Lunacy. And now, returning back to C++ feels painful. Time to spend some time trying to figure out why. I’m hoping to perhaps uncover some improved style for my C++ implementations with this introspection.

Build Systems All Suck

Being interpreted, Python doesn’t require any build systems. Drop in a .py file and it’s off to the races. Drop __init__ into a directory for a package. Simple. Python distribution utilities generally equate to a list of required dependencies readily served up from a pypi repository. In C++ land, we’ve got CMake, Waf, Autotools, Makefile, MS Build, and any number of other alternatives. Pick any one – and quickly you’ll be staring in the monitor wishing you’d chosen differently when some IDE, library, or maintenance task gets in the way. Unfortunately, I’m not sure there’s anything to be done here outside of inventing yet another new standard.

Dynamic Typing vs Generics, Virtual Methods, and Templates

Dynamic Typing with static hinting / analysis just feels more natural than the world of static types no real introspection. Concepts may help bridge the gap here – as, I find myself mostly treating all my Python functions as accepting concept inputs. Further, since all the classes are dynamically bound, there’s no need for a central module to be aware of a plugin module class type or define an interface for it. Perhaps a solution here would be a combination of CRTP and C++20 concepts – or maybe helper methods? Some experimentation is needed. SFINAE substitutions may be able to aid with attribute looking using named class members.

Coroutines and Async

C++20 is hopefully going to help here – but I’ve yet to see a good example (and try out a compiler supporting), the new coroutines spec. Further, C++20 support here seems embryonic and tailored HIGHLY toward library authors. The more I’ve worked in Python, the more I find myself utilizing generators, and lately, async generators, and in some cases, full-on async coroutines. Database search operations, network operation – so much easier to utilize async with the ‘await’ keyword, versus traversing seas of callbacks.

File and Module Namespaces

This one has me pondering adopting a new scheme of defining per-file namespaces in C++ and utilizing using in top ‘module’ directories. More thought is needed – but, my initial thought here is there’s a certain niceness to each python module polluting only it’s own namespace. In C++, a single “using” statement can derail a whole include train traversal. Worse, you’ve got to worry about any third party throwing their crap in as well.

Missing REPL in the Debugger

In Data analysis, there’s no end to the power of a well populated workspace. The primary benefit I find in Matlab is simply the data visualization toolset and ability to have a workspace that your actively manipulating and saving small chunks of to use later. My Python development often sees one window left open with a REPL, in where I’m continuously trying new code segments and verifying operation.

List Comprehensions

This is right up there with generators and co-routines. The ability to build a list directly instead of appending to one – especially as that can be done internally with an iterator allowing filtering and mapping without large intermediary data structures. Will also need to check this out with C++20.

I’ll reserve some space to whine more later – but I think this covers at least the top level points from a language standpoint. I haven’t talked about the elephant in the room that is pypi and the current rather sad C++ ecosystem there. That said, with my current project, so much of the code was developed in house, that using an off the shelf framework like Qt and following up with entirely custom development would be roughly comparable.

Whatever your language of choice, diving into another land and trying to bring back some ideas with you is a powerful way to improve the skill set. Good hunting.