CS Topics: Rand and Xor Magic

In this week’s episode, we dive a bit deeper into the underlying theory behind how random numbers and the importance behind the XOR operator when creating hashes.

This episode does depend on pre-existing knowledge of basic boolean operators. If you aren’t familiar check out this page.


CS Topics: Hash Functions

Today we cover one of the primary building blocks for blockchain – Hash Functions.


Episode 4 – Fall 2020 Programming Developments

A look at major new developments in the coding world for C/C++ – and some brief talk on Amazon BottleRocket and Microsoft’s Hyper-V advancements.

Exploding Complexity

I’ve been thinking a lot about embedded Linux and Yocto and what constitutes an “embedded” system lately. My first PC was an 8086 clone with 2 5 1/2 floppy drive and no hard drive. I had a small collection of disks I’d trade out for my projects.

I remember running Linux on an old 386 with 4mb of RAM. I browsed the web with a 486 sporting 8Mb RAM. In more recent memory, I ran a few machines with 512 – 1Gb of RAM that managed to browse the web and accomplish most of what I do on a day to day basis – editing code and writing.

What exactly are programs doing today that they sit eating gobs of RAM? The smallest programming running on my desktop right now is the new Windows 10 terminal application – and THAT still takes up 10+MB of RAM. The argument is “they do so much more”, but I don’t think I can buy it. Software has jumped the shark.

I stumbled upon the people at suckless.org recently. While I don’t share they’re love of weird tiled user interfaces – I do find myself looking over their lists of software thinking – why can’t Linux running on Windows XP era hardware halfway decently. It’s been a long, long times since I heard about restoring older PCs to service by switching over operating systems.

Professionally, one of the larger code bases I deal with is still in C – and I find myself yearning for C++ as I stare at piles and piles of copy-paste, poorly maintained crap. That said, as I look over the software landscape, I’m not sure it’s language choice that matters. Old C code generally has corners of repeated if(error) checks, switch statements / function pointer based polymorphism, and the occasional macro based non-template (or worse, magic with untyped pointers). We can use C++ to good effect to reduce / eliminate / clarify these situations. But, instead of enabling cleaner code, we create brand new realms of furthered complexity with undebuggable template meta programs and spiraling complexity of exception handling. At least in C it’s obvious if you didn’t check for a malloc return – C++ happily throws std::bad_alloc and you’re off to the complexity farm. Bring in the even newer happier cohorts of Perl, Java, and now Python, Javascript, and extended Java / C#. Adding extra gas to the fire is web package management with nuget, pip, and npm.

All of this is to “manage” the complexity of software. And yet, at the end of the day, computers today are still keyboard/mouse or even simpler, a touchscreen. The actual complexity – high bandwidth modems, extended security concerns, increased screen resolution – could easily be dealt with in our old C/Pascal land primitives OR are being dealt with that already.

As I sit staring at the build screen of Yocto taking hours upon hours to finish compilation, I’m starting to form a new and strong opinion on software complexity:

  1. Design failures of any given abstraction geometrically increase the complexity of successive layers failing to address the design failure.
  2. The ability to complicate software scales linearly with the productivity gain of an abstraction.

All hope is not lost though – developers can choose to manage complexity and ease its impact with the the same tools that allow that complexity to exist in the first place. Unfortunately, I don’t think this is happening. Indeed, the combination of opinions 1 & 2 – and the current state of code quality (that seems to be an industry wide pandemic) indicates a final concluding observation:

Applying Improved productivity technology to legacy software conceals underlying design flags of the system.

I feel Linux in general is reaching an odd ‘critical mass’, where the ancient discussions of system flaws and design needs are catching up to us. Meanwhile, I’m starting to come to see the light of long term colleagues that hate C++. Going from C to see C++ now seems to be like digging with a shovel versus a bobcat. Sadly, I don’t think C++ generally has the right level of discipline when considering the mess.

C++ Has Me In a Funk

After spending the past year or so developing large Python applications, I’ve returned back to the C++ fold for my day to day work. For a long time, I developed primarily in C++, enough that the compiler became my tool of choice for simple automation tasks over the more logical scripting platforms. Large applications in a scripting language? Lunacy. And now, returning back to C++ feels painful. Time to spend some time trying to figure out why. I’m hoping to perhaps uncover some improved style for my C++ implementations with this introspection.

Build Systems All Suck

Being interpreted, Python doesn’t require any build systems. Drop in a .py file and it’s off to the races. Drop __init__ into a directory for a package. Simple. Python distribution utilities generally equate to a list of required dependencies readily served up from a pypi repository. In C++ land, we’ve got CMake, Waf, Autotools, Makefile, MS Build, and any number of other alternatives. Pick any one – and quickly you’ll be staring in the monitor wishing you’d chosen differently when some IDE, library, or maintenance task gets in the way. Unfortunately, I’m not sure there’s anything to be done here outside of inventing yet another new standard.

Dynamic Typing vs Generics, Virtual Methods, and Templates

Dynamic Typing with static hinting / analysis just feels more natural than the world of static types no real introspection. Concepts may help bridge the gap here – as, I find myself mostly treating all my Python functions as accepting concept inputs. Further, since all the classes are dynamically bound, there’s no need for a central module to be aware of a plugin module class type or define an interface for it. Perhaps a solution here would be a combination of CRTP and C++20 concepts – or maybe helper methods? Some experimentation is needed. SFINAE substitutions may be able to aid with attribute looking using named class members.

Coroutines and Async

C++20 is hopefully going to help here – but I’ve yet to see a good example (and try out a compiler supporting), the new coroutines spec. Further, C++20 support here seems embryonic and tailored HIGHLY toward library authors. The more I’ve worked in Python, the more I find myself utilizing generators, and lately, async generators, and in some cases, full-on async coroutines. Database search operations, network operation – so much easier to utilize async with the ‘await’ keyword, versus traversing seas of callbacks.

File and Module Namespaces

This one has me pondering adopting a new scheme of defining per-file namespaces in C++ and utilizing using in top ‘module’ directories. More thought is needed – but, my initial thought here is there’s a certain niceness to each python module polluting only it’s own namespace. In C++, a single “using” statement can derail a whole include train traversal. Worse, you’ve got to worry about any third party throwing their crap in as well.

Missing REPL in the Debugger

In Data analysis, there’s no end to the power of a well populated workspace. The primary benefit I find in Matlab is simply the data visualization toolset and ability to have a workspace that your actively manipulating and saving small chunks of to use later. My Python development often sees one window left open with a REPL, in where I’m continuously trying new code segments and verifying operation.

List Comprehensions

This is right up there with generators and co-routines. The ability to build a list directly instead of appending to one – especially as that can be done internally with an iterator allowing filtering and mapping without large intermediary data structures. Will also need to check this out with C++20.

I’ll reserve some space to whine more later – but I think this covers at least the top level points from a language standpoint. I haven’t talked about the elephant in the room that is pypi and the current rather sad C++ ecosystem there. That said, with my current project, so much of the code was developed in house, that using an off the shelf framework like Qt and following up with entirely custom development would be roughly comparable.

Whatever your language of choice, diving into another land and trying to bring back some ideas with you is a powerful way to improve the skill set. Good hunting.

C++ Needs a New GUI Framework

The landscape of GUI C++ development is pain – native Windows gets third tier support from Microsoft, and Android actively discourages native API. Linux is better with Qt and GTK, but GTK on Windows is rough. My go to choice for years has been Qt.

Lately though, it seems Trolltech Nokia Digia The Qt Company has an active dislike of their users. I’ve brought up the idea of Qt at my day job, but the word is they won’t cut a deal amenable to requirements. So we mush on. There’s lots of homebrew garbage out there – especially if you start looking at widget sets on top of Unity. Hey, why not yet another CSS Browser?

In the end – maybe it’s just that the demand for native code isn’t there. Web front-ends are all the rage, and electron apps can do wonders. Why not take a gig of ram for a text editor and chat client – RAM is cheap these days. Still, there’s something hugely missing in development work when you start looking at the interface between C++ and whatever Javascript engine du jour you’ll be running on.

Qt is almost there for so much. Unless you want to make money or distribute an app with a GPLv3 incompatible license or environment. Given Microsoft’s amazing collection of freeware tools, you might expect a license for commercial development to be reasonable. You’d be wrong. The keepers of Qt licensing want $5k+ per developer. The community shouted. They offered a ‘small business’ package for anyone with less than $100k revenue. The community shouted again. Now, they’ve upped that to $250k. Just don’t look at the fine print if you want to distribute embedded works.

What would make a nice GUI library?

  • Some sort of DOM / Canvas model that is intuitive and easily interacts with C++
  • Scripting support with C++ tie-in
  • Stable API that plays well with “standard” C++

Hit those buttons, don’t charge me an arm and a leg, preferable be open source – GPL + commercial would be ok by me, and we’ll talk. Maybe it’s time for a Motif comeback, I miss you X11 days.

Portable C++: Unpacking integers from binary buffers

As C++ code is so close to the metal, we often make dodgy assumptions that hurt portability. One of the ‘simplest’ problems that I’ve seen repeatedly is packing and unpacking binary data.

The C++ works hard to eliminate definitions that would tie us into a particular hardware architecture, and this area invites a desire to throw caution to the wind and make assumptions as to exactly what’s going on.

The new college grad (and old-hat that views this as all theoretical anyway) might write:

There’s a handful of problems here:

  1. We’re assuming bit size of ‘int’ – it may be anywhere from 8 to 64 bits on common platforms.
  2. We’re assuming that we’re safe to read a char aligned buffer to an integer.
  3. We’re assuming the buffer is packed with appropriate byte order for our processor.
  4. We’re breaking the strict aliasing rule.¬†Edit: This is wrong, see corrections…

Can we write a new version of the function to take care of these challenges? Well, with a little care:

This version was tuned to work with GCC 5 and higher. This function is highly portable – it should operate on any architecture providing 8-bit chars and 32-bit int32. Indeed, the C++ standard definitions for conversion to/from std::uint32_t even handle the mode of twos complement arithmetic vs not. Using bit-shifts and or defines the exact expected behavior of the construction of the 32 bit integer.

And there was much rejoicing… sortof… There’s many a blog¬†post out there that support this method of formatting.

Now, let’s say that this particular call is fairly performance critical (perhaps we’re doing some pixel or image manipulation – use your imagination). In my application, I was processing large data files. Modifying from the first style to the second fixed issues with ARM portability, but slowed down performance.

Most compilers see the above pattern and recognize – “hey, I can just load a 32bit word and return, no harm / no foul.” Sadly, Visual C++ does not. No combination of optimization flag and type manipulation get the optimizer to recognize the pattern. Even GCC is fairly sensitive in situations where it can (hence the std::uint8_t casts throughout). To faciliate portability and performance on all my desired targets, the end result was using std::memcpy to a temporary integer. The ARM compiler happily recognizes we may be accessing unaligned memory, and all the other toolchains optimize away the memcpy to a simple load. Of course, now we’re back to handling byte order again. Ugh!

At the end of the day, maybe the grouch has it right – just worry about the processor you’re running on (hopefully just 1). It’s all fun and games until you find yourself porting to that random platform you’d never worry about.


Technically, the “char*” and “void*” are both exempted from strict aliasing rules – and as we return a copy of the original integer (and not a pointer or a reference) we do not have a situation where another object could share the same memory space with the integer.