Cloud configuration challenges are tough—and getting tougher
Fairwinds make the Polaris policy engine for Kubernetes: a useful tool for preventing what their 2023 Kubernetes Benchmark Report makes pretty clear is happening out there, a lot.
Specifically, people are incompletely configuring workloads – something that was less of a problem a year ago, but now apparently impacts 80% of workloads in production. No memory limits or requests ceilings set (and if set, set wrong, which reads to efficiency and cost). No CPU limits. No liveness or readiness probes. Not-quite-right pull policies causing reliability issues. Lots of deployments with zero replicas.
Organizations are also being less good than a year ago in doing stuff like disabling unused Linux features and general worker node hardening. They see 33% of orgs running 90%+ of workloads on insufficiently-hardened hosts.
Other problems: privilege escalation is allowed all over the place, so containers can escalate their own privileges. And many workloads are configured with runAsRoot allowed. 29% of organizations have 91% or greater workloads with this flaw now.
Image vulnerabilities are nothing new, but the trend towards running with known CVEs is building. 62% of orgs have more than 50% of workloads impacted with vulns. Outdated container images are all over the place. Outdated Helm charts are also pervasive – a lot of organizations have almost 100% of workloads deployed with Helm charts that aren’t up to date.
Configuration confusion goes beyond Kubernetes
Elsewhere in configuration challenges, researchers at Wiz found an attack vector in Azure Active Directory that had the potential to give malicious actors access to misconfigured apps. According to Wiz’s research, around 25% of multi-tenant applications were affected, including a number of Microsoft applications. In the company’s blog on the issue, Hillai Ben-Sasson writes:
“We found several high-impact, vulnerable Microsoft applications. One of these apps is a content management system (CMS) that powers Bing.com and allowed us to not only modify search results, but also launch high-impact XSS attacks on Bing users. Those attacks could compromise users’ personal data, including Outlook emails and SharePoint documents.”
The problem—which was disclosed to Microsoft and is now addressed—derived from Azure Active Directory’s implementation of single sign-on. In short, multi-tenant authentication simply validated that users trying to log in had a token from an Azure tenant, with the onus on app owners to validate users’ identities. It wasn’t clear that this was the case, however, and this shared responsibility confusion extended even to major Microsoft apps. That meant users on other Azure tenants could log in freely to misconfigured apps. The Wiz post demonstrates this with a Bing CMS that enabled the researchers to change results, including making the 1995 “Hackers” soundtrack a response to queries for “best soundtrack.” (If you want to learn more about mitigations for this particular issue, check out the blogs by Wiz and Microsoft.)
Confusion deriving from shared responsibility is by no means limited to this case. Shared responsibility is a fundamental concept for cloud security, encompassing increasingly complex facets of cloud deployment ranging from data security to network controls. Problems arise when ownership of a given issue is unclear—or the existence of a given issue is unclear. That makes it all the more important to have expert guidance.
Solving complex problems
On Kubernetes, complex problems sometimes call for complex solutions. In order to tackle cloud and Kubernetes configuration challenges, organizations need:
Policy automation
Kubernetes base and extended cluster configuration automation - so when you deploy clusters, they have a locked-down and entirely predictable composition
Update automation that doesn’t break deployment discipline and wreck known-good configs
Disciplined automated development workflows that prevent CVEs in production, keep Helm charts updated, and help developers and testers configure stuff well according to the principles of least privilege and intelligent resource consumption models.
That may be on top of navigating proper configuration for a given cloud provider. Ultimately, all of this may be too much for many in-house teams, as Fairwinds’ findings suggest. Organizations need ways of off-loading responsibility for part or all of this on providers who are staffed and automated to take complete responsibility and function under metric SLAs.
When it comes to configuring your cloud, small errors can snowball and create big costs. As organizations navigate the increasingly complex cloud-native world, it’s clear that they will need increasingly powerful support.
Need help configuring your cloud?
Mirantis Professional Services can help you design, set up, and fine-tune your cloud for hardened security and optimized performance. Contact us to learn how we can help you today.