I disagree that software used to be "serene and pleasant." I remember using MS-DOS for things like desktop publishing and video games and a lot of things were fairly slow and unreliable. Like watch the screen paint the UI, slow. Crashes are probably a toss up. They were worse having gone from computer crashes, to application crashes, to tab crashes, to cringe mascots. On the other hand, the world now has so much more software I experience crashes a dozen or more times a day. We literally joke about "snow days" at work every time Github shits the bed; a near weekly occurrence these days.
But software used to crash enough that I developed a compulsive Ctrl-S habit from word processors and image editors that would crash and loose your work every few hours. I remember the advice to periodically close and restart Photoshop because of how buggy and unstable it was (is?). Overall, I think there's a lot of rose tinting on this assessment. You usually remember games from your childhood as these incredible experiences, but far too often, you go back and play them and many don't hold up to your now refined modern standards. More importantly, they've also all stabilized. Developers aren't updating them anymore so they're either known to still work or not. Really popular old software may even have community patches you're expected to apply. But it's all been worked out. You know what to expect. It won't stop working tomorrow because someone pushed a broken change through CI.
I agree with the problem, there's too much code, that's too many bugs, too many security vulnerabilities, too much fiddling after launch. Monocultures are inherently a problem. Large codebases are inherently a problem. "Many eyes make bugs shallow" is a myth. A good theory, but it's not backed up empirically. Who's eyes and what they're looking for matters more. Even when the project is doing literally everything they can, the pressure to break in becomes enormous with enough users, and the larger the interface the more porous to attack it becomes. It's why I don't put OpenSSH directly on the internet.
Even still, I think his sales pitch is a bit weak. The reason operating systems consolidated (or we consolidated on operating systems if you want) isn't because of hardware interoperability. It's that running on an operating system is more convenient than running on metal. Preemptive multitasking is just that good. It's one of those core computer technologies that make a computer what it's become like networking, bitmap displays, floating point arithmetic, and most importantly, backward compatibility. Even if I have my computer set to boot to USB, having to shutdown, dig through my stash of sticks, plug it in, and boot again—it's too much. At one point I dual boot Windows and Linux. I got rid of the Windows partition because juggling two whole systems on the same machine was annoying.
In this every application is an operating system world, I still want to listen to music while I program and consult various sources of documentation. Then when I build and run that program, I ideally want changes I make in code to be reflected in that program as fast as possible so I can iterate quickly. I also want to be able to use one or more best in class debuggers and profilers to quickly figure out where and what the problems are. That leads the development environment back to being a platform and you invoke jwz's law, Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can. Unix is just a developer environment that grew into an operating system. Browsers are just fancy document readers that grew into an operating system.
If you want the user experience, consider that I might want to check a wiki while I play some video games. I might need to consult my email while working on a spreadsheet. I want to balance my ledger against my bills and bank account. I could go on. Would any of these be as possible if we relied on a single vendor to bundle all these features into their application?
Games are maybe the only completely isolated software experiences left, and even there many people want to stream podcasts, take screenshots, reference the community wiki, chat with friends, stream their session online, or install mods to their experience of it without relying on the developer's support for any of this. It's the classic TV-VCR goal dilution trap. Wildly unpopular despite how good the TV or VCR was because you always knew getting each in a dedicated unit would be better. When we switched to DVD, anyone who did get the combo unit ended up having to get a dedicated box anyway or throw it all away and upgrade to a three-in-one. The GNSS (a.k.a. GPS) built into modern cars is universally awful because car manufacturers don't have the pressure to be as good as Google or Apple Maps do. You already bought the car.
Do we expect people to buy half a dozen different computers to be able to do these at once? An MP3 player, a wiki reader, a handheld email client, a digital painting tablet? That's all called a phone and it's wildly successful because it's portable, integrated, and so simple you can park a toddler in front of it as a babysitter.
Convenience plays a massive role in purchasing decisions. This is why developers have also embraced this tower. To send you these text files, I don't have to write a font renderer. I don't have to write a video codec. I don't have to write a web server or file system or network driver or remote administration server or resource cache or smooth scrolling interface or vector graphics interpreter or hyperlink loader. Better still, you've already downloaded half that stuff in a browser and I can cobble the rest together with Neocities and PeerTube for free in an afternoon.
These towers of code are as big as they are for various human factors, just one being the allure of convenience. Others include the overhead and imprecision of communicating between people, especially over time. How easy something is to get started with and learn. The challenge of fighting entropy in a long lived software project. The temptation to add a layer of indirection or reuse something it was only kind of designed to handle originally. The fear of not making payroll and trying to find safety in doing whatever users say they want. Egos, politics, inertia, greed, ignorance, sloth; the list goes on.
A lot of factors lead us here. SoCs don't seem like they're going to help much. Raspberry Pi and the ESP32 are probably the most popular examples of this. You might argue it's the binary blob firmware that's holding them back. Maybe? The ESP32's Wi-Fi stack is being reverse engineered and open sourced though. You could then review that code to create your own MAC driver for the chip. Maybe after that the dominant application delivery platform will begin to shift?
Obviously you can level the, "it's not good enough," retort that's become popular these days. They're not as powerful as your average desktop computer. Sure, but if they were, you'd then want to run all your existing software on it. That would require Windows or maybe one of the big two mobile operating systems. If you're running one of those, why would you want to reboot into a game instead of launch from within? Latency is a super niche reason. That's why Windows is still popular despite being widely loathed. It runs almost all the world's consumer software. Every game on Steam has to run on Windows according to the Steam publishing agreement. All the most popular Linux "only" software has been ported to Windows.
That's why Linux still only has single digit adoption. The console wars understood this. VHS and Betamax understood this. Content sells platforms. Nobody buys a platform. They buy a platform to access content. Think of it as a little two by two matrix. On one side is their current platform. On the other, the new one. Between them you have forces pushing from old to new, forces pulling from old to new, forces repelling them from the new, and forces holding them back on the old. To switch, the first two must overcome the second two. One insanely valuable application can drive the sale of a whole platform, but usually it's the sum of all the potential that drives the sale because making something so valuable it's worth hundreds or thousands of dollars in switching costs is really hard.
With that in mind, the 3-6% or whatever Linux is currently at is almost exclusively driven by just how incompetent directors at Microsoft are in managing their cash cow, resulting in people who hate Windows enough to switch. There's very little pulling users, lots repelling them, and a mountain of things holding them back. But that mountain's been shrinking as they keep abusing their developers who've been moving to the web, the pushes have been getting stronger with ads and instability, and the repellents have been slowly easing as polished user friendly UIs and Windows compatibility layers have gotten pretty great. All that's missing is the killer app. Something you can only get on Linux and not Windows that's worth switching for.
That's also why people build on popular platforms and end up reinforcing their dominance. Gambling on being so good people would switch for you is risky. Why use a different platform if it's not likely to pay off spectacularly well if you succeed.
That all aside, we can see if this plays out because who says the best market is wealthy industrialized nations? With the insanely low cost of solar that continues to fall, lots of developing nations are rapidly expanding their energy capacity. That opens up a wealth of opportunities for low cost computerization like Raspberry Pis even in remote areas. If there's ever been a time to see if SoC bare metal single purpose systems will take off, it started a couple years ago and will play itself out over the next decade and a half. And it kind of is. Inverter and charge controllers are becoming a huge area of concern because they are these network connected embedded single purpose proprietary systems that also pose significant risk to nations as countries continue to expand their weaponization of supply chains.
I've considered going with a unikernel for my backend. A single binary operating system and application together. Boot the machine directly to the server shipped as a single disk image. But then I ask, why? I reboot this server a couple times a year. All that effort to save maybe 60 seconds a year waiting for the server to boot. This machine sees a normal CPU load of between 0.5% and 1.5%, mostly depending on how much connection spam I'm handling.
Sure, I could spend my time maintaining my own data storage (file system, database, and backups), network stack (TCP/IP, DNS, HTTPS, and firewall), thread scheduler, and remote administration/debugging interface. Or, I could spend my time writing these posts to collect my thoughts and spread my ideas. I'm not cursed to live forever. I can only do so much in the very limited time I have left to live. I'm not seeing an advantage to doing it myself instead of bringing together a pile of open source software. With those, I spend the equivalent of 1/5th of a full time dev (in exchange for not learning to play guitar or something) plus $0.03/hour ($20/month) building and maintaining this thing.
And that's for a target that almost amounts to an SoC. I run on a VPS, which conforms to the VirtIO machine spec running on an x86_64 Skylake chip. But why do I want to write and maintain a SCSI controller? The machine is already so huge for the workload and the existing driver in Linux is good enough. Sure there are efficiencies that could be gained by writing a controller that exposes a relational database interface directly instead of layering storage controller, file system, and database. It's also thousands of hours of work to build. It's hard to justify. This site is never going to get big. If it does, I can throw money at it and consider the tradeoffs involved in rewriting it to be more performant. Sure, writing it to be able to handle planet scale from the start sounds great, maybe even sounds cost effective, but I'm very unlikely to correctly design such a system from first principals (scaling behaviour always surprises you), and I'd spend all my time designing a platform without content and guarantee I never need to scale.
Learned about her last night, or rather, her contribution to medical research for things like the polio vaccine (a disease a segment of the population seems adamant to bringing back from the edge of extinction). Sadly, not voluntarily. Go read her Wikipedia page. Least you can do is learn how much she's helped humanity without her knowledge or consent thanks to her unfortunate early death.
This talk is wonderful. You won't learn too much, but it's such a cool idea taken to great lengths. It's stuff like this that gets me excited about computers. I hope it brightens your week.
Quick refresher on queue theory. Really take some time and do the math he's doing yourself by hand. That's the big skill from this talk. If you just let him do it for you, you won't learn it. It'll build the ability for you to use Little's Law in your own programming and design work. Simple summary was always limit request concurrency. Specifically he shows how to do this leveraging TCP's congestion protocol.
Can't agree more. The failure mode I see again in again in asynchronous services is not limiting queue sizes including the request queue. You never want an infinite queue. Honestly, you usually don't want queues either, you want stacks, because stacks prioritize liveness, not fairness. If someone shows up with a thousand things to do, a stack will ensure everyone with odd requests who show up after gets priority.
The saga continues. I wasn't aware of RRLP and LPP. Why would you add these to the spec? At least Apple phones by nature of designing all the silicon themselves these days are moving in a direction to curb this. It's gotten to the point, I really can't suggest leaving connections to the outside world running when you're not using them. This constant abuse and dehumanization at the hands of software has really lead us to a dark place as a species. You can't readily see it, so why don't I stick a knife in your back for fun and profit?
Pretty frustrating that I have to stay connected when I'm on-call for work, though I have been looking at what sort of a paging system I can setup for myself when out and about by using either APRS or Meshtastic. Frustrating that all our infrastructure is being coopted like this to build a combination surveillance state and pig butchering farm.
It's a sales pitch, but it's also a pretty great talk about some of the design philosophy that's been going on inside TigerBeetle DB and the results they've found with designing explicitly around simulator based testing and eschewing dynamic heap allocations. Almost too much wisdom here to summarize. A lot of this has been said elsewhere, like pointing out that the maintenance costs are significantly more than building software. "If it compiles, you've just begun." One interesting thing was thinking of developers as being either like painters or sculptors, additive or subtractive. Very cool idea I'd never thought of before.
One of the great things he talks about is using assertions as they do. I've felt for a long time that this is really the direction better type checking would take. Not for more type theory, but instead having better first class support for input validation and output assertions. Most of the reason I use types is to enforce what my interface accepts, but it's so limited. I accept int, int, string. That could be so many things. Type aliasing doesn't really help. Ideally what I'd like is to be able to make more compile time assertions of what my function will and won't allow. Lock down that string for example to ensure it must be valid Base64 without having to copy all the data to a Base64 object or something. I also don't want to have to pointlessly typecast things that are already valid. Assertions used like this might just be the right solution with great locality. To be explicit at the top and bottom of your function about what you expect of the caller and the data you think you're about to return. It's more verbose, but you can compile with assertions turned off if you really care about performance more than correctness (i.e. overwriting all my data with garbage is better than being slow).
Tangentially, he's almost talking about Joe Armstrong's motto, "Let it crash." Both he and the Go devs are right in that a reliable system doesn't crash. That's true, but the key they're both talking about is that you need to properly handle errors to do this. Developers routinely don't properly handle errors. We have a lot of data on this. The biggest source of faults in Java programs for example are the exception handlers. Why? Because we don't test our error handling code. Why? Because we usually only test happy paths. Why? Because we don't usually write robust tests of all the contrapositives. This is why fuzz testing is so effective. Why? Because building and breaking are different skill sets.
That's what he's getting at when he starts talking about hackers. Fuzz tests (or their simulator in TigerBeetle's case) are inherently adversarial. That's how you test something. You don't test compliance. You probe for deviance. But doing that requires you to be systematic. Not just add a unit test when you add a function. The key Joe Armstrong found was you need supervisory processes. A process given an error should just crash. Any error that can be handled by the calling code is really just control flow pretending to be an error.
No, when you run out of memory or the host is offline or the disk is full or whatever, in those cases, the code doing the thing is almost always not the code that should recover. Instead, a supervisory tree, one or more programs that are responsible for that program, their job needs to be figuring out what to do if their child process crashes. How to properly recover like blocking new connections or retrying the process or blocking further action and alerting a human. You can actually see us arriving at this haphazardly in container fleets that crash the instance and launch a new one during a fault. That's probably the most naive solution, but that concept of having an explicit supervisor handle it (the container orchestrator) and not some half baked try-catch block, because we know devs can't routinely get it right (or processes would never crash). That's one of Joe's greatest contributions from his thesis and language, Erlang.
There are so many lessons you learn from making software from scratch. One is that the biggest reasons things are so slow is that there's too many layers of indirection. Libraries and frameworks are all so generic trying to remain flexible that they end up doing a lot of extra unnecessary work. It gets worse when developers then make generic things on top of these generic tools to try and leave the real problem solving to someone else or worse—to users. Lots of code spent doing work the problem doesn't require. That or code endlessly copying memory back and forth for no reason. Deserialize to an object, copy all the data to a different object, pass that object to a function that allocates a different object which copies some or all of the data over, take that object and serialize it to pass it to a different system that deserializes it, and so on.
It's reasonable to assume that the amount of work a computer does is equal to the number of lines of code you write, but that's wildly inaccurate. Writing your own code from scratch gives you a better feel for how much work a single library or language invocation is doing for you. Things like rounded corners, drop shadows, async/anonymous functions, format parsing like JSON or SVG, or using HTTPS. So much overhead. How much of it is really useful to the actual problem you're solving? Is that utility worth the cost to performance?
He's also right about the gulf between web and desktop documentation for novices. Doubly so if you try and program for Linux. Same with tooling. Lots of great developer tools and documentation have been created for web development, but desktop development really hasn't improved that much over the last thirty years. If anything, the rent extraction being done by code signing has made it significantly worse. Doing any of the common handmade projects like making an image viewer or text editor will make it painfully obvious how much room there is here for better solutions. "It was hard for them so it should be hard for you," or whatever.
The reality is more banal. Great documentation and tooling, like any great product, requires a lot of skill and time to produce. Who's paying for it? I maintain this site for free because I'm doing it for me and I'm a bit of a freak who spent six hours after work writing and rewriting a couple paragraphs about a conference talk I watched to help collect and organize my thoughts. Most people expect to get paid for their hard work, especially when they're good at what they do. A lot of open source is driven by new developers looking to get a job in the industry. That's why every few weeks a new text to speech system comes out and a year or two later it's unsupported and dead. Every few weeks a new JavaScript framework comes out and a year or two later it's unsupported and dead. Nobody's writing developer books because developers aren't paying for books. Blogs can document whatever fits in a 500-2000 word essay by a hobbyist/freelancer because web advertising is awful for everyone involved and LLMs are killing what's left. Nobody's figured out how to make programming tutorials work on YouTube so that path to paying your bills documenting things doesn't work. The same reality plays out for tools. What's the last software development tool you bought?
Take all that together and that's why docs tend to consist of what limited documentation gets added when a new API is created (down to just some banter on a mailing list) and tools are more a byproduct than product.
Even Microsoft now treats their desktop operating system as nothing but a cash cow. That's why Valve was able to steal desktop gaming from a company with enough resources to succeed in the console wars. They have fundamentally given up on their desktop operating system along with the developers who build for it. People keep saying Linux. What about it? Have you ever created a normal desktop application for Linux? Not a command line utility or backend server. A proper application you could sell to end users. Something with graphics and audio like a video editor or game. There's a reason Valve is investing in Proton to port the entire Win32 API to Linux. The Linux kernel understands API compatibility but the rest of the ecosystem treats constant breaking changes as a feature, not a bug. "Everything should be open source," they say. Get a job doing something else all day to pay for rent and food, then volunteer to create me world class software for free. Oh, and then every few years they should be forced to completely redesign that software because the ecosystem decided to rewrite how IPC works or whatever. Or be like a handful of open source celebrities working bohemian on nothing but a thousand or so a month in donations if they're lucky.
To understand your outcomes, look at your incentive structure.
This is more of what I was talking about when discussing If You're Going to Vibe Code, Why Not Do It in C?. Though this piece has more data points and doesn't waffle on about anecdotes and open questions. Good piece to continue the thought.
Great rundown on a lot of the things going on in hardware fabs and how they could be leveraged for a supply chain attack. Should you be worried? Sure, but I'd be more worried about the rampant fraud in supply chain management over targeted implants at this point in most contexts.
From CVE-2023-38606 it became pretty clear that modern high-end chips are huge amalgams of in-house, open, and closed design components that no one person really understands. Essentially hard software. There's a good chance that chip designs are becoming subject to all the same attacks complex software has but without the ability to inspect the final artifact.
In terms of exploits, component swaps don't even have to be subtle on most devices. While data centers seem like promising targets, they also have higher security on components compared to the damage you could do in automotive, energy, manufacturing, healthcare, and consumer devices with much less scrutiny. Besides, data centers are highly concentrated and highly reliant on infrastructure to and from the outside like fiber lines, electricity, and water. There are many simpler ways to siege a fortress than subterfuge.
It's a kind of macabre comedy watching the concepts of zero trust from technology go mainstream into society. The real world has been built on trust for thousands of years and watching people deliberately undermine that is festering a fairly unhealthy ecosystem they get to live in. Watching grifting go from niche hack, to corruption, to metagame, to the core of society has been darkly interesting to watch.
Not a fan of using "stupid" to describe any of this. No point insulting anyone since even he admits (and I'd echo) we all have to go through thinking about software architectures like this. As you get better and better at software development, you can conceptualize more and more of a system in one go. At one point, breaking a string on commas was a hard problem to me. Now I can conceptualize complex stateful distributed systems.
The problem is that breaking them down by thinking in terms of serial actors is becoming an ever more inefficient way of having the hardware do the work these days. Computers are basically only getting wider and wider. Not much faster. It started doing this around 2005.
By wider, I mean in parallel. Doing more of the same thing all in one go. Performing dozens or even hundreds of the same operation in just one execution. In this case, he's talking about memory but it applies to all shared resource contention including compute, memory, disk, and network. Doing one thing at a time just has an incredibly high overhead when you see it play out in real contexts where the code is usually tasked to operate on hundreds or thousands of that thing.
Think of it like this. You're tasked with taking groceries from your vehicle to the house. Are you going to do it one condiment at a time? Well, we often program like this to make the tasks easier to conceptualize. We design and build a system to pick out a single remaining item and carry it to the house. Then we make a system that knows where everything goes and delegates items as they arrive to systems that each know how to put an item on a specific shelf. We then connect all those systems to ship. This would be for loops to the top, conditionals to the bottom.
If you instead loaded bags as you shopped by destination shelf, you could carry many bags in together. Your dispatcher can hand a bag to each stocker as you get the next load, and each stocker can basically dump the contents on the shelf and neaten up. If your bandwidth gets wider from going to the gym, your program gets even faster until it's a single fetch and load. This would be conditionals to the top, for loops to the bottom.
You may have never thought of sorting at the store and instead sort on a counter at home. This is because you're thinking discretely about the store system and the home system. If you think about them together, a more optimal solution is possible. Instead of randomly loading your basket, you can load it by destination making it much faster later.
When he's talking about arena allocators, he's talking about the overhead of deallocating hundreds of objects one at a time versus setting next_alloc_index = 0. It's significantly faster if you have a data boundary like a rendered video frame or a backend API endpoint handler. Any time you've got a distinct work boundary you really shouldn't be cleaning as you go but instead do all the setup, all the work, and then all the teardown.
This is why it makes sense not to explicitly close files or sockets or even deallocate memory if your program is a short lived application like a command line utility. The operating system will clean up everything for you all at once when you exit much faster than you can request them each be cleaned up one at a time. In effect, your program's memory is running in an arena allocator. Your file descriptors are being managed in a pool. (That's why you usually get the same ID back if you close and open an new one.)
A great successor talk roughly 15 years in the making. I've linked before to his talk, The Coming War On General Computation. You don't need to watch that talk before this one. It's just there in case you want to go back and understand the history.
The thing I think that's really helped the concept being explained in all those years, is being able to now list decades of concrete examples as he does. Everyone complains about having too many texting apps. When you explain that there's no good technical reason the Discord app can't send messages to WhatApp or WeChat users (we used to text SMS back and forth between dozens of phone manufacturers just fine), I've found some people start to get it. People live and die by stories. Explaining concepts leaves too much to the imagination. If you explained any of these now real things to people as hypotheticals, you're labeled an alarmist since it doesn't align their their existing worldview. Surely such a thing could never happen because…
Well it did. And it continues to. Worse, it's accelerating.