Why Talos over OpenShift, k3s, and the rest

Every couple of years I do the same exercise: pick a Kubernetes distribution, write down what I want from a node, and see which option actually delivers it. The list is short and stubborn — immutable host, declarative config, no SSH, small attack surface, atomic upgrades, predictable behavior under power loss. The first three knock most distributions out before the second coffee.

This is the long-form of why I landed on Talos instead of OpenShift, k3s, RKE2, or kubeadm-on-Ubuntu, with the caveats I think you should hear if you’re considering the same move.

The shortlist, briefly

OpenShift is a great product wrapped around opinions I don’t share. The platform is huge — operators on operators on operators — and a lot of the value lives in the OpenShift-flavored extensions: routes, builds, image streams, the web console. If you want those, you should probably pay for them. If you just want vanilla Kubernetes with sane defaults, you’re paying for a lot of platform you’ll fight.

k3s is delightful for edge and homelab. It’s a single binary, the install fits on a sticky note, and SQLite-as-datastore is a perfectly reasonable choice for small clusters. But the underlying host is whatever Linux you put it on, which means you still own kernel patches, package drift, SSH access, and a handful of services that have nothing to do with running containers. The cluster is tidy; the substrate isn’t.

RKE2 sits in the middle — production-shaped, FIPS-friendly, hardened defaults — but again, the host is yours.

kubeadm + Ansible is honest about the deal: you’re going to maintain a Linux distro and a Kubernetes control plane separately, and the seam between them is your problem. It works. I’ve done it. I’ve also stopped wanting to do it.

What Talos actually changes

Talos isn’t a Kubernetes distribution that runs on Linux. It’s a Linux distribution that exists to run Kubernetes, and almost nothing else. There’s no shell. There’s no package manager. There is no SSH. The OS is exposed entirely through a gRPC API.

That sounds extreme until you live with it for a month and realize how much of your operational toolkit is “log in to the box and look around.” On Talos that toolkit doesn’t exist, so you have to write it down: every config knob lives in a YAML machine config, every diagnostic comes through talosctl, every upgrade is the same atomic image swap. The blast radius of a config mistake shrinks because there’s nowhere to drift.

The two consequences I didn’t fully appreciate before:

  1. Onboarding is a config file, not a runbook. Adding a node is talosctl apply-config against a freshly PXE-booted machine. There’s no “install these packages, edit this sysctl, disable this service, set up the kubelet” — that’s all the same MachineConfig.
  2. Upgrades are boring. talosctl upgrade --image … swaps the image, reboots, rejoins the cluster. Failed upgrade rolls back. I’ve done a dozen now and the most exciting one was when I forgot to drain the node first.

The honest tradeoffs

I’m not going to pretend it’s free.

  • The mental model is unfamiliar. If you’ve spent a decade on systemctl status, the first week feels like operating with one hand. Lean into the API. The reflex to SSH in and poke at things has to die.
  • Not every workload likes immutable hosts. Anything that wants to install kernel modules at runtime, or write to /etc, or assume bash exists in a privileged container — those need rethinking. Most of them should be rethought anyway.
  • The ecosystem is smaller. Fewer Stack Overflow answers, fewer pre-baked Ansible roles, fewer “just install this RPM” workarounds. The flip side is that the people on the Talos Slack are unusually helpful.
  • You give up some flexibility for safety. That’s the whole pitch. If you need that flexibility, this isn’t the distro for you.

Where it makes sense

  • Self-managed clusters where you don’t have a platform team to maintain a fleet of base images.
  • Homelab and edge deployments where every node is a liability you don’t want to babysit.
  • Multi-region setups where you want every node to be byte-identical to every other node of its role.

Where it doesn’t:

  • Anywhere your team’s muscle memory is your most expensive resource and retraining isn’t on the table.
  • Workloads that genuinely require host-level customization that doesn’t fit Talos extensions.
  • If you want a turnkey platform with a UI and a vendor on speed-dial — pay for OpenShift or a managed offering. The labor savings are real.

What I’d do differently

I’d start with Talos sooner and stop trying to make Ubuntu + kubeadm look like an immutable platform with enough Ansible. The seams between “the OS” and “Kubernetes” are where most of my historical operational pain has lived. Removing the seam, even at the cost of giving up familiar tools, has been a net win — for uptime, for upgrade velocity, and for the small number of hours per week I’d rather not spend reading systemd journals.

Your mileage will vary. But if your shortlist looks like mine, give it a weekend.

← all posts