intro to bootc
what bootc means to me
I’ve become something of a bootc fanboy lately. It started when I used bootc to help me develop and test a kernel patch for some work I was doing with one of my employer’s hardware partners. I’ve been helping my gaming friends abandon Windows problems1 and move to Bazzite from the folks at Universal Blue for a bulletproof, performant experience. I’ve helped coworkers build custom RHCOS images to enable capabilities on new hardware inside OpenShift. I’ve helped Fedora community members migrate their Silverblue installations to their (as yet unofficial) bootc equivalents and get to enjoy easy, reliable package layering through derived builds instead of fighting rpm-ostree issues during major releases (like this week’s Fedora 44 release 🎉). Here’s how I navigated the Fedora 44 upgrade, by the way.
It’s probably safe to say that I’m a bootc mega-fan. I think that I can help convince you that it’s worth getting into, too. In fact, you shouldn’t need any convincing. Go get started right now, it’s not hard. Download Bazzite or Silverblue or, if you’re feeling adventurous, try out a bootcrew image. Most of those things have easy instructions and wizards to guide you through installation.
I don’t want to use this series just to convince people to use bootc, though. I want it to help people understand what makes bootc so much better than traditional package-based distributions. It would help to understand what a “package” is first, I suppose. Once you understand that, maybe “Linux distribution”2 will make more sense, too. So for this, we’ll just start… I guess a little earlier than that.
why do programming languages exist
Programming languages for computers were first invented in 1952. Grace Hopper famously advocated for english-language computer programming and invented what would be recognizable as an early dialect of COBOL, then called Business Language version 0. Computers do not understand these languages natively. They are built on mathematical constructs as simple as integer addition and as cryptic to the newcomer as XOR and floating point numbers. One interesting characteristic of early computing that’s not really been an issue for decades is that different computers used to use these numbers in different ways. The alignment of numbers as the computer works on them, the types of mathematical instructions they understand, the schemes used to encode letters and other characters as numbers, even the way in which they can fundamentally order and process groups of instructions together used to be very different between one computer model and another.
This has made a lot of people very angry and has been widely regarded as a bad move.
— Douglas Adams
Programming languages provided not just a convenient way to write software more simply, but also a way to make those programs more portable. A type of software called a compiler takes a programming language and turns it into some related language, sometimes all the way down to the mathematical notation that the computer actually executes. The nuance of linkers and assemblers doesn’t really matter for this discussion, so let’s gloss over the nuance for a minute. Compilers weren’t a perfect solution, though - they still required writing new compiler targets for every new computer. Although writing a compiler once and recompiling all the existing software is better than just rewriting all software, compilers are relatively complicated programs and adding these new targets often takes a lot of time.
It became clear that something about the way software was written for computers had to change. The industry started working towards a better way. These characteristics of the types of numbers, instructions, and more were organized together as a set of design goals for the physical hardware of the computer to form what is called an Instruction Set Architecture or ISA. Instead of every individual computer requiring custom programming for the exact way that particular computer model works, the hardware manufacturers form a type of contract with software authors through the ISA. They say “you can program this computer this way, and we’ll handle the rest.” Within those bounds they have freedom to optimize, implementing clever tricks and brute power to fulfill the goals of the ISA faster for newer models, and sparking competition between those manufacturers. For a while, almost all modern personal computers (mainframes and embedded devices notwithstanding) used the x86 architecture from Intel before AMD popularized a backwards-compatible ISA modification called x86-64. Today, the story has gotten somewhat messier again with everything from Arm to RISC-V.
Of course, all of this nice ISA standardization went out the window when hardware manufacturers would add optional ISA extensions and expose ways to discover and consume those extensions. When I was younger, you absolutely needed an MMX-capable CPU if you wanted to be able to run the latest and greatest games. There have been some on-again, off-again attempts to standardize this kind of thing. You know how this goes.
and then, packages
Wikipedia claims package managers were introduced in 1989, but it also says [citation needed] right now. I was born in
1986, so I wasn’t much into package managers at the time and couldn’t tell you.
Software packages, and package managers, were built to help solve the problems of distributing software. Some packages contain source code alongside the instructions to compile it, maybe with some optional ISA or extension specific optimizations. Some contain compiled versions of software ready to run on one computer or another, alongside the metadata that shows what kinds of computers it would be expected to run on.
One of the first package managers, and indeed the first one I ever encountered, was
dpkg. dpkg was originally just a shell script, written by the
late, great Ian Murdock for his relatively new project,
Debian. Not long after came another package manager you’ve maybe heard a thing or two about
when Marc Ewing wrote rpm for
Red Hat Linux. Package formats and package managers made
Linux distributions what they are. When you build packages to make sharing software easier, and distribute those
packages, you have made a distribution.
Packages have continued to evolve to meet the neds of the ever-modernizing software landscape in the ever-more-complex
computer hardware ecosystem. Automatic dependency resolution for
dynamically-linked executables as a feature contributed a lot to the
package management landscape. Instead of compiling every program directly for a given ISA, you can compile libraries of
software with functions baked in that can be called by other software. The single most impactful example of this in
today’s world is the GNU C Library. You can optimize transfer time, size on disk,
and improve overall efficiency by packaging and distributing a certain version libc, then packaging and distributing
lots of other programs with dynamic links to that version of that library.
So, modern packages then are fundamentally made up of mostly the following things:
- collections of files that make up the package content itself (usually in compiled binary form, sometimes in source, the distinction isn’t helpful here)
- metadata about how the package relates to other packages in a given distribution, most notably dependencies (such as the library example above)
- often included is documentation, how to use the software that’s bundled in the package
- licenses pertaining to the use or distribution of software in the package
- extra files that are useful for operating the software, such as service definitions or artifacts like icons and metadata used in GUI launchers for the program
There’s one other feature that package managers got pretty early on, and it’s kind of the heart of the problem that atomic, transactional updates help solve. It’s not the only part of the problem, but it’s the simplest to understand.
scripts
Sometimes when you install a package, there are things you’d like to have happen just before. Maybe you want the files to be owned by a certain user, but you don’t know the UID that user will get for sure on any given system, so you need to make sure the user exists and has been assigned a UID by the system first. For this, package managers got pre-installation scripts.
Sometimes there are things you need to have happen just after you install a package. Maybe your package includes a service definition that needs to be registered with your service manager so that it can be started when the system does. For this, package managers got post-installation scripts.
Actually, most software has things like that. I got both of those examples from the first spec file I thought to go check.
Making sure that the files in a package are laid down on disk properly is easy. We can be sure we have all the right bits in a package stored in all the places they’re supposed to be. Making sure that the changes from pre-installation and post-installation scripts have run is much harder. Or at least, being very sure that they’ve run is much harder. Modern package managers dutifully record all of the things they’ve successfully done in databases to keep a running record of what was requested to happen, what did happen, etc.
Have you ever had a file corrupt on your disk? Have you ever accidentally closed an application window while it was doing something important? Have you ever had a brief power outage during an update?
Wouldn’t it be nice to be very sure what state your computer was in?
Anyways, that’s not really the problem bootc was built solve. That problem got solved other ways long before.
Next up, a brief development history of bootc by an outsider looking in. Then, after that, I guess I’ll get to some practical examples to get you started.
-
Fun easter egg, hover over each letter in the word
problemsthere… ↩︎ -
I’d just like to interject for a moment. What you’re referring to as Linux, is in fact, GNU/Linux, or as I’ve recently taken to calling it, GNU plus Linux. Linux is not an operating system unto itself, but rather another free component of a fully functioning GNU system made useful by the GNU corelibs, shell utilities and vital system components comprising a full OS as defined by POSIX. Many computer users run a modified version of the GNU system every day, without realizing it. Through a peculiar turn of events, the version of GNU which is widely used today is often called “Linux”, and many of its users are not aware that it is basically the GNU system, developed by the GNU Project. There really is a Linux, and these people are using it, but it is just a part of the system they use. Linux is the kernel: the program in the system that allocates the machine’s resources to the other programs that you run. The kernel is an essential part of an operating system, but useless by itself; it can only function in the context of a complete operating system. Linux is normally used in combination with the GNU operating system: the whole system is basically GNU with Linux added, or GNU/Linux. All the so-called “Linux” distributions are really distributions of GNU/Linux. ↩︎