Wyrm: Chipping away at ELF

Since Wyrm is utilizing a bootstrapped Scheme variant, building data structures and writing files is surprisingly difficult. Scheme provides the ability to express almost any construct, but without standard libraries every tiny detail most be determined.

The “actual” Wyrm system will eventually provide low level data structures and type support. Any construct utilized by “Mini Scheme” will require duplication between both the Wyrm Scheme implementation, and the “generic” Scheme based system. Writing the ELF decoder, the first obvious challenge to address is some form of structured data support. Normal scheme applications might use a SRFI-1 based “associative list”. Ideally, Wyrm will implement an associative container with “better performance”, but for now, utilizing a wrapped associative list works well. Given the newly defined “dictionary” type, the next hurdle is implementing basic support for serializing structures. For this, a new ‘encoding’ structure is created. With a defined dictionary type and serialization structure, implementing basic support for the primary ELF file header is simple.

So far, basic Test Driven Development (“TDD”) practice has allowed development of substantial infrastructure without gnarly scaffolding. The current ‘Wyrm’ program remains a simple “Hello World” display, but substantial support for ELF and “Wyrm” common scheme library support is present.

The huge challenge remains focus. Even basic decisions get easily bogged down if considering all the possible angles of a full toolchain ecosystem. Worse, there’s nearly infinite complexity possible if attempting to expand to a more modern feature set. But, without the massive set of refactoring and “intelligent” coding tools, any added functionality becomes massively distracting. A worthwhile detour might include integrating Visual Studio Code for improving quick reference documentation.

Wyrm: Baby Steps for ELF

With the July 4th Holiday, I enjoyed a 3-day weekend but intentionally limited the time I spent hacking on Wyrm. There’s a lot to unpack in creating a full operating system and toolchain (even with a limited scope). Instead of jumping into full fledged implementation, I took the opportunity to brainstorm and structure the project.

Given we’ve got a hugeeeee amount of work ahead in bootstrapping a kernel and toolchain, the big question becomes “where to start”. For this project, I’ll be trying to maintain a “Test Driven Development” practice. A unit test framework also creates a simplified environment for early development.

The first project milestone will be a “Hello World” constructed kernel from our toolchain. Qemu supports loading a binary ELF image, and most portable toolchains will work with ELF binaries. For the first project milestone, the Wyrm toolchain will construct a valid ELF kernel image with a “Hello World” assembly kernel. With ELF, the Wyrm toolchain may create images for either or fledging OS or the Linux ecosystem. The end-goal is a fully self-hosting system – but until that point, Linux or Windows can provide a host environment.

If someone forced me to select a ‘favorite’ programming language, I’d likely fall into the Python camp. Python does not, unfortunately, make for a good “system” programming language. However, both Julia and Nim advertise some degree of compiled / system programming features. For our toolchain, I’m going to pull a page out of Julia’s playbook and utilize the simplicity of Scheme for compiler and runtime implementation. With a strong Scheme toolchain, I hope to experiment with a maybe-Python / maybe-Scala frontend. With a scheme-work-alike, we can utilize a “proper” scheme implementation to bootstrap the system. I’ve selected “Chicken Scheme”.

With the few commits this weekend, there’s a small test framework and the start of some low level scheme primitives for building ELF files.

Introducing Wyrm

For a long time, I’ve maintained various iterations of low level operating system logic or programming language interpreters. The earliest iterations focused on recreating QBasic and DOS. Newer iterations focused on various technology stack ideas (the last being microkernel and exokernel based approaches). The only time my software stack ever ventured out to be seen by others was… as sample code for a job interview.

I’ll be covering this project on the podcast – but, before adding the glitz to the idea, I find myself wanting to sit and write about the idea. Starting with the why and for and what.

I’d be lying if I didn’t admit to a strong desire to build “the next thing”. And – I’d be lying to myself if I argued Wyrm had any hope of being the next thing. Instead, the mission of Wyrm is simple: a playground for OS and programming language conceptual development. My hope is to build upon (or create) some framework similar to the hello world staples provided at the OSDev Wiki. Instead of duplicating Unix and C, my intent for Wyrm is to explore the history of Amiga, Newton, and LISP machines. And, of course, duplicate Unix and C at some point.

I do not plan on supporting many hardware platforms – only the ARM, and likely only one or two available single board computers. I’m considering the Raspberry Pi 4, Asus TinkerBoard, and a QEmu Aarch64 machine for starters. This does presuppose that I manage to get the language itself into a workable state. As I don’t have a lot of time to dedicate toward the project, I suspect progress will be slow and be redirected to other ARM (or Risc-V) cores as time goes on.

I’m starting with a “blank slate” for this project. My goal will be to cover the fits and starts and pain associated with birthing an Operating System from scratch. There’s multiple toy OS projects out there – and multiple “real” projects – but developers tend to “wait” until some mythical “beta” period. Realistically, I don’t see myself having the time to hit such a milestone quickly. (Especially starting from the ground-up). That said, I’ve built many toy interpreters and kernels – so I suspect there’ll be something that appears at some point. From experience, a bootable “hardware” ARM kernel is a few weekends worth of effort. That said, my free weekends are few…

elfenix/wyrm: OS and Language Playground (github.com)

Code Philosophy: Why Worse is Better (Ep. 15)

The Zombie Coder is back for a new season! Today’s episode focuses on one of the Zombie’s key coding philosophies: Worse is Better. Learn about the history of the phrase and join the “Cult of Done” if you dare.

Key Resources for today:

Credits:

RC 0x02 (Research Comments 0x02): Linux Packages

Over lunch, my brain is popping up questions of back-burnered personal tasks. The peril of a software development career is continuous return to build systems, packaging, and best practices for distributing software. Over the years, I’ve seen and maintained a number of Linux packages, build systems, embedded distributions, and containers.

With the newer releases of Ubuntu, Canonical has been heavily pushing their “snap” standard – which, as a user, I find less than desirable. The big complaint is weird breakage on basic things – like clicking a link in another application opening in a new browser instance (and worse, without my privacy extensions enabled!).

In my toy distribution, I really wanted to take an approach of absolute minimalism and experiment with containerized setup of applications – up to and including the base system layer.

In the windows world, applications tend to rely on a system installed C runtime (Visual Studio Redistributable) and media library (DirectX). This system provides a strong “base layer” but the application then must ship all the other libraries or static link them to the executable. While I fault Microsoft with many things, the reality is that their approach to system backward compatibility and best practices for shipping apps has resulted in easy installation of programs from 10+ years ago. Indeed, today I have software from even earlier (early 2000s) that installs today and works on Windows. The same can not be easily said for Linux.

I believe Flatpak to be superior from my user experience point of view – but the standard is not as well supported by major players, and integration is “subpar”. Moreso, flatpak seems to have a very GNOME/desktop centric focus.

I’m hoping that the base levels of flatpak can provide a solid foundation. My main hangup at present is that the technology feels a bit limited when attempting to gather dependencies for a C/C++ application that don’t fit within the general idea of the system SDK. The “dependency soup” managed by RPM and DPKG very quickly appears and becomes difficult to manage without a nice / proper system root. From a management aspect- I really am hoping for some sort of system that provides extensive configuration and artifact management to insure complete reproducability of system configurations.

I’m hopeful that something like Yocto could provide a means around this, but I’m not sure I’m feeling it. I’m starting to consider some sort of new middle-ware layer necessary, but there’s not an amount of development effort I’d like to spend there.

Going forward, the big questions are – is flatpak suitable for building a distribution? How can we ‘cleanup’ the dependency soup requirements for building packages?

Research Comments 0x01: Modelling Text

I’ve found my growing collection of “Note.txt” files floating around in random directories to be an unbearable way of keeping notes. I’ve debated moving those into some sort of Wiki form – but at present, I’ve decided to move documents to this blog in the hopes that others might find them useful. I’ll be labelling these posts as “Research Comments” and numbering them in publish order for my own reference. These documents are not intended to be academic or authoritative in nature – they are research notes collected with links to other documents.

As I type into the new fancy “block based” wordpress editor, I’m reminded of the complexity of HTML vs the simplicity of a simple text block control. WordPress enables fancier things in blocks, but for a long time (and multiple purposes) simple text blocks where all we had.

I’m not so sure that the box layout model of CSS is really the “best” thing when it comes to the idea of supporting simple text layout – and it definitely was not intended (originally) to compete with advanced manual typesetting that might be done by a Publisher. Still, today’s graphics artists are forced to utilize a system developed by 90s era nerds without much concern of the needs of the typesetting industry.

For forum software, this presents a unique set of challenges in that we want the ability to include a substantial number of tags and at the same time limit the feature set to preserve the overall representation of the page. Early social media giant “MySpace” provided minimal filtering allowing teenagers to customize their pages to extreme levels – at times completely replacing significant elements of the MySpace UI. Unfortunately, not much is left of the ‘old web’ to point to, but it does make for some fun discussions on current forums.

We can group approaches utilized by social networks into several major camps. Facebook and Twitter provide “text only” modelling with the addition of meta data to allow a degree of enhancement (eg: link detection, post backgrounds in FB, ‘moods’, and attached images). This model allows substantially simplified user interface (no need to deal with formatting shortcuts) – but at the cost of the user’s ability to represent more complicated text with inline Linux. The second camp utilized by major web forums allows text via some other markup language with limited functionality (eg: BBCode). The translation from this syntax into HTML removes the need for detection of error codes and extended cases that may be problematic if utilizing HTML directly. Finally, the rare case is a system that allows direct HTML / CSS with (hopefully!) filtering.

One of the more interesting ideas I’ve seen is utilization of TeX under the hood for allowing better document markup potential than HTML. While working on complex documents with extensive sourcing – the tooling TeX provides is fairly invaluable. The resultant TeX documents also tend to “look” fairly decent as well – so long as one is careful to utilize modern features and fonts when creating finished documents.

For word processor / document based formats – multiple major techniques appear to exist. Confluence utilizes a subset of HTML intended to allow better “flow” of document text. This subset includes additions for elements such as images and tables. The table model itself allows nesting, but does not allow control of text flow around the table itself (the table is treated as a breaking paragraph). Historically, Word has hidden underlying makeup of documents from users as much as possible, but WordPerfect provides detailed commentary as to the internal “codes” used for document markup. Adobe FrameMaker text bares a degree similarity to HTML, but is likely better compared to TeX in operation.

After Hours and Season 2 Preview (Ep 14)

In this episode I’ll talk a bit about plans for the next season of Zombie Coder, and review some of the lessons learned over the past several months. Look for more episodes to resume after the new year!

Music: https://audionautix.com

CS Topics: Welcome to Hash World (Ep. 13)

In this episode, I conclude a series on Merkle Trees – or, the key technology and ideas behind distributed systems. I hope this episode captures your imagination to the potential applications of the distributed web.

Resources and Links:

Credits:

Avoiding Death March Projects

What is a death march project? Recognizing Death March projects is easier than you might expect, and avoiding them means simply setting your own boundaries.

And remember:

Don’t work for Sh*tBags!