Mirantis patches containerd to address race condition
Mirantis recently released Mirantis Container Runtime (MCR) 23.0.10, which included the new upstream containerd v1.6.30. Shortly after our MCR patch release, as our testing and internal usage continued, we discovered that Mirantis customers on Linux risked being affected by an upstream issue in this new version of containerd (Windows users are unaffected). To remedy the situation, Mirantis has produced and made available version 1.6.30~rc.2 of containerd. This new build closely resembles the upstream version, but removes the offending change; we selected this approach to provide Mirantis customers with maximum stability and features while concurrently removing risk.
Because this issue impacts only containerd, there is no need to deploy a new version of MCR to benefit from this fix. Therefore, all future new installations/upgrades of MCR 23.0.10 (or other) that consume the fixed containerd are unaffected, and do not need any corrective action to be taken.
Symptoms of the issue
When using the upstream version of containerd 1.6.30, there is a race condition which can make arbitrary docker exec
commands become unresponsive. The probability of the race condition manifesting increases when there are more concurrent execs into a single container, which can result from docker exec
commands or container health checks. Larger clusters performing a greater number of operations are especially at risk.
While a hanging docker exec
could manifest in a variety of ways, a simple way to determine if there are affected processes active on a given node is to use the ps
command:
ubuntu@host:~ $ ps aux | grep "docker exec"
ubuntu 2815926 0.0 0.1 1623668 25080 pts/0 Sl 05:02 0:00 docker exec nginx-1 true
ubuntu 2815961 0.0 0.1 1697592 24836 pts/0 Sl 05:02 0:00 docker exec nginx-1 true
ubuntu 2816106 0.0 0.1 1623860 25012 pts/0 Sl 05:02 0:00 docker exec nginx-1 true
ubuntu 2816255 0.0 0.1 1623604 24760 pts/0 Sl 05:02 0:00 docker exec nginx-1 true
ubuntu 2816363 0.0 0.1 1697656 24932 pts/0 Sl 05:02 0:00 docker exec nginx-1 true
ubuntu 2816912 0.0 0.1 1697400 25000 pts/0 Sl 05:02 0:00 docker exec nginx-1 true
ubuntu 2817906 0.0 0.1 1697336 24676 pts/0 Sl 05:02 0:00 docker exec nginx-1 true
ubuntu 2817908 0.0 0.1 1623860 24348 pts/0 Sl 05:02 0:00 docker exec nginx-1 true
You are likely experiencing this issue if the output displays either of the following conditions:
Any number of unexpected
docker exec
commands in sleep (Sl) state that do not change over timeA set of
docker exec
commands with the same or older start time
As previously mentioned, depending on the use case, the symptoms may appear in a variety of ways. Generally speaking, if operations suddenly and unexpectedly begin to report timeouts after changing your version of containerd, then this issue may be the root cause.
Determination of susceptibility
If you are using Mirantis Launchpad and/or the MCR install.sh
script and have NOT updated to MCR 23.0.10 (or have done so on or after March 28, 2024), then the probability that you are impacted is low. However, rather than risk experiencing the symptoms of this issue, proactively verifying that you are not running an affected version of containerd is a straightforward task.
The version of containerd that shipped with MCR 23.0.10 contained the affected code, and an installation of MCR 23.0.10 performed prior to March 28, 2024 is likely to have installed this version of containerd. If you have used a customized installation method to install MCR, it is also possible for a previous version of MCR to have used the unpatched containerd.
To verify whether your environment is affected, check each node for the version of containerd in use:
ubuntu@host:~$ docker version
Client: Mirantis Container Runtime
Version: 23.0.10-rc1
API version: 1.42
Go version: go1.21.8m1 X:boringcrypto
Git commit: 8d04317
Built: Wed Mar 13 21:51:54 2024
OS/Arch: linux/amd64
Context: default
Server: Mirantis Container Runtime (Unlicensed - not for production workloads)
Engine:
Version: 23.0.10-rc1
API version: 1.42 (minimum version 1.12)
Go version: go1.21.8m1 X:boringcrypto
Git commit: 2eb2075
Built: Wed Mar 13 21:51:54 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.30-rc.1
GitCommit: 934d1942d1fe36f99a4f7e65bf80db09754f0c76
runc:
Version: 1.1.12-m1
GitCommit: 8ac7905
docker-init:
Version: 0.19.0
GitCommit: de40ad0
If the reported version of containerd is 1.6.30-rc.1
, then the node could be impacted by this race condition in containerd, depending on operating circumstances.
Remediation of the issue
If you determine that your environment is running the affected version of containerd, then the solution is simply to upgrade your containerd package and restart the process. A restart of containerd and the docker service is required regardless of the mechanism used to apply the fix.
Using LaunchPad
Note: If you are using install.sh
for airgapped installations or otherwise caching the script, be sure to use the latest version before updating or installing new instances of MCR to ensure success.
launchpad apply --force-upgrade
With this command, the --force-upgrade
flag is required to ensure that MCR 23.0.10 is reapplied with the new containerd package, despite this MCR version already being installed on the target system.
Using Red Hat Package Manager (RHEL, Oracle Linux, Rocky Linux)
sudo yum install -y containerd.io
Using Debian Package Manager (Ubuntu)
sudo apt-get update
sudo apt-get install -y containerd.io=1.6.30~rc.2-1
Using SUSE Package Manager
sudo zypper refresh
sudo zypper install -y containerd.io-1.6.30-2.2.rc.2.1
Restart components (containerd & engine)
sudo systemctl restart docker containerd
Upon successful update, docker version
will report containerd version 1.6.30-rc.2:
$ docker version
Client: Mirantis Container Runtime
Version: 23.0.10
API version: 1.42
Go version: go1.21.8m1 X:boringcrypto
Git commit: 8d04317
Built: Wed Mar 20 17:59:33 2024
OS/Arch: linux/amd64
Context: default
Server: Mirantis Container Runtime (Unlicensed - not for production workloads)
Engine:
Version: 23.0.10
API version: 1.42 (minimum version 1.12)
Go version: go1.21.8m1 X:boringcrypto
Git commit: 2eb2075
Built: Wed Mar 20 17:55:41 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.30-rc.2
GitCommit: 502191142248816d148ad6b5f4455afac05e8092
runc:
Version: 1.1.12-m1
GitCommit: 8ac7905
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Impact on Mirantis Kubernetes Engine (MKE) users
The use of MCR with MKE does not fundamentally affect the implications of this problem. MKE users can follow the same steps to determine if they are impacted and apply remediation (if necessary) as other consumers of MCR; no additional steps are required.
Learn more about Mirantis Container Runtime.