Networking problems after installing Kubernetes or upgrading your host OS? It might be your iptables version
Have you run into a problem where installing a new version of Kubernetes breaks your worker nodes, and suddenly you can't ssh into them, or even ping them? You've probably got a conflict between the version of iptables kube-router 1.25 ships with and the one you've got installed. Although this issues manifests itself with certain versions of networking components, it’s actually a much more generic issue.
Let me explain what the problem is, and how you can either fix it or work around it for now.
(Before we start, if you have access to the physical host, you can do iptables --flush to undo the damage, but that's not a permanent solution. Done? OK, let's move on.)
Essentially the problem boils down to an incompatibility issue between iptables 1.8.8 and older versions in terms of how rules are formatted. As a result, if the host is using iptables 1.8.8, when components like kube-router CNI start to write their own network policy and rules with an older version of iptables--kube-router ships with iptables 1.8.7--things go .
This is because kubelet, via iptables 1.8.8 as supplied by the host, writes:
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
Then kube-router, which uses the earlier version of iptables supplied by kube-proxy, does it’s normal iptables-save; modify/add rules; iptables-restore, but doing so with the older version results in reading and re-inserting the rule as:
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -j DROP
As you can see, the rule gets “corrupted”, blocking ALL network traffic on the host. You know you're in trouble when you can’t even ping localhost anymore.
Even if in this specific case we’re using kube-router, all of this can actually happen with any other networking components that use iptables in pods/containers.
In some ways, this brings up the fragility of the networking architecture in k8s (or in general, really). It’s critical to ensure every single networking component is actually using the same version of iptables as the host, and also that all of those are also using the same backend (legacy vs. nftables).
So why do we need the same version? Can't Kubernetes just compensate? Well, it turns out that the netfilter team really does not have any guarantees on version compatibility. (The iptables team, in particular, specifically can't guarantee backward compatibility, given that it’s a pre-containers era tool, which was built on the assumption that one would never have more than one version of iptables on the host.
So what can you do?
In k0s, we're mitigating this issue by:
Detecting the iptables mode using the iptables-wrappers script: This gives us the maximum probability to get everything working in the same mode
Shipping the iptables binary with k0s: This way, operating system upgrades can't break things because k0s never relies on the version provided by the OS
Shipping iptables 1.8.7 with k0s: This way we stay in-sync with other components and we actually test the combinations
If you're not using k0s, you do have a couple of other options:
Downgrade iptables on your host to 1.8.7 to eliminate the incompatibility
Run kubelet with --feature-gates=IPTablesOwnershipCleanup=true, which will cause it to not create the problematic "-j DROP" rule. As @danwinship points out on Github, "Of course, this is an alpha feature and you may run into problems with components that assume kubelet still creates those rules, but if you do then you can report them and help that KEP move forward 🙂."
If you're interested in the sausage-making view, you can also check out some of these discussions:
Hopefully this solves your problem. Please let us know what you find!