What is Kubernetes monitoring?
Kubernetes monitoring provides visibility into what's happening inside your cluster
Why Kubernetes monitoring is important
Kubernetes monitoring is the art of getting visibility and insight into what’s going on inside your Kubernetes clusters and applications. You’ll want to do this for several reasons, including:
Reliability and troubleshooting: Kubernetes applications — particularly those that take advantage of a cloud native or microservices architecture — can be especially complicated, and if something goes wrong, tracking down the source of the issue can be difficult. Appropriate Kubernetes visibility lets you see where issues may be occurring (or about to occur) and monitoring enables you to take action to prevent or resolve problems.
Kubernetes performance tuning: Knowing what’s going on inside your Kubernetes cluster will enable you to make decisions that make the most of your hardware without compromising the performance of your applications.
Cost management: If you’re running Kubernetes on a public cloud infrastructure, it’s important to keep track of how many nodes (servers) you’re running. Even if you’re not running in the public cloud, it’s important to know whether you’re over-resourced.
Chargebacks: In some situations, you will want to know what groups have used what resources, so Kubernetes monitoring can provide you information on usage statistics for the purpose of chargebacks or showbacks, or simply for Kubernetes cost analysis.
Security: In today’s environment it’s crucial to be able to know what’s running where, to spot extra jobs that shouldn’t be there, or to spot DOS attacks. Kubernetes monitoring can’t solve all of your security issues, but without it you’re at a definite disadvantage.
In order to properly monitor your applications and clusters, you need to make sure that you’ve got the appropriate level of Kubernetes visibility.
What visibility is required for Kubernetes monitoring?
Of course, you can’t monitor what you can’t see, so Kubernetes visibility is a huge part of Kubernetes monitoring. What you’re looking for is going to depend on the level at which you’re looking.
Container monitoring: At the container level, there’s not much you can look into besides the basics, such as how much CPU the container is using while it’s running. Containers are ephemeral, so once a container stops, you can’t log into it to see what’s going on.
Application monitoring: Your application is, of course, written by you, and doesn’t come with built-in monitoring hooks, but that means that you can expose any metrics that you feel are appropriate according to the business rules of the application. You’ll want to ensure that you do this in a persistent way by integrating with a monitoring system (we’ll get to that in a minute) rather than within the ephemeral environment of the container.
Pod monitoring: Pods have their own statistics, such as their state and the number of replicas running versus the number that were requested. You’ll want to keep track of that to watch for problems caused by misconfigurations or running out of resources.
Node monitoring: Your applications ultimately run on nodes, so it’s important to monitor those nodes to ensure that they’re healthy. Metrics that should be part of your Kubernetes monitoring include CPU utilization, storage availability, and network status.
Cluster monitoring: Kubernetes monitoring at the cluster level should be more than just an aggregation of metrics from the other levels. Ideally, you should have an overall view using some sort of dashboard that enables you to make sense of utilization and identify anomalies before they become issues.
Kubernetes and the Kubernetes community provide multiple ways to provide Kubernetes visibility and monitoring.
How to do Kubernetes monitoring
It’s important to understand that while the two are related, there is a difference between Kubernetes visibility and Kubernetes monitoring. Kubernetes visibility is how the data is made available by the application. Kubernetes monitoring is how it’s made available to a human. For example, Kubernetes provides a set of limited metrics such as CPU usage and memory usage via the in-memory metrics-server. This component collects information such as CPU and memory usage, and is how components such as the Horizontal Pod Autoscaler know what’s going on within the cluster. Kubernetes provides several ways to get this kind of “live” Kubernetes visibility, such as:
Kubernetes liveness and readiness probes: When you define a container in Kubernetes, you also have the ability to define a programmatic way to determine whether the container is ready, and whether it is still alive. Consider this example from the Kubernetes documentation:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
In this case, Kubernetees will look for a /tmp/healthy file every 5 seconds, and if it doesn’t find one, it will assume the container has died and will kill it and create a new one. For this example, the container will appear to be healthy, and then, when the file is removed, it will appear to have crashed and will be replaced.
Kubernetes uses this information to determine the specific state of the container, but unless the actual probes are designed to do so, their information doesn’t connect to other systems, and is localized in influence to the container or pod.
kubernetes-metrics-server: This add-on component generates an in-memory look at the Kubernetes cluster as a whole, including pod statistics, memory and CPU usage, and so on, and can provide a stream of data to another application asking for it.
Kubernetes Dashboard: This is a separate component that you can install in order to see a live version of what’s going on inside your cluster. It lists workloads, nodes, and so on, and also enables you to take actions such as creating or destroying objects — so if you install it, ensure that your security is set up properly!
The problem with all of these solutions is that they are only a “live” view of what’s going on in the cluster; they don’t save this data, so there’s no way to use it to see trends or understand what happened before a catastrophic failure. To do that, we need to export all of those metrics from Kubernetes to some sort of time-series database such as InfluxDB, with a front end that enables you to create dashboards to see what’s going on.
One of the most popular ways to do Kubernetes monitoring is to use a tool called Prometheus with a GUI called Grafana. For example, Mirantis Stacklight uses these tools together to provide visibility into your Kubernetes clusters, precisely indicating which service caused a failure. It also provides built-in alerts on anomaly and fault detection (AFD) metrics that can be extended to create custom alerts. The alarms can be exported to other systems via standard protocols such as SNMP.
A widely popular tool for monitoring and troubleshooting Kubernetes is Lens, a Kubernetes IDE that integrates Prometheus to visualize trends in resource usage metrics for CPU, memory, network and disk, including the total capacity, actual usage, requests, and limits.
Interested in learning more about Kubernetes visibility and Kubernetes monitoring? Contact us and we’ll be happy to walk you through your options.