How we set Kubernetes limits and requests is essential in optimizing application and cluster performance.
One of the challenges of every distributed system designed to share resources between applications, like Kubernetes, is, paradoxically, how to properly share the resources. Applications were typically designed to run standalone in a machine and use all of the resources at hand. It is said that good fences make good neighbors. The new landscape requires sharing the same space with others, and that makes resource quotas a hard requirement.
Namespace quotas
Kubernetes allows administrators to set quotas, in namespaces, as hard limits for resource usage. This has an additional effect; if you set a CPU request quota in a namespace, then all pods need to set a CPU request in their definition, otherwise they will not be scheduled.
Let’s look at an example:
apiVersion: v1 kind: ResourceQuota metadata: name: mem-cpu-example spec: hard: requests.cpu: 2 requests.memory: 2Gi limits.cpu: 3 limits.memory: 4Gi
If we apply this file to a namespace, we will set the following requirements:
- All pod containers have to declare requests and limits for CPU and memory.
- The sum of all the CPU requests can’t be higher than 2 cores.
- The sum of all the CPU limits can’t be higher than 3 cores.
- The sum of all the memory requests can’t be higher than 2 GiB.
- The sum of all the memory limits can’t be higher than 4 GiB.
If we already have 1.9 cores allocated with pods and try to allocate a new pod with 200m of CPU request, the pod will not be scheduled and will remain in a pending state.
Explaining pod requests and limits
Let’s consider this example of a deployment:
kind: Deployment apiVersion: extensions/v1beta1 metadata: name: redis labels: name: redis-deployment app: example-voting-app spec: replicas: 1 selector: matchLabels: name: redis role: redisdb app: example-voting-app template: spec: containers: - name: redis image: redis:5.0.3-alpine resources: limits: memory: 600Mi cpu: 1 requests: memory: 300Mi cpu: 500m - name: busybox image: busybox:1.28 resources: limits: memory: 200Mi cpu: 300m requests: memory: 100Mi cpu: 100m
Let’s say we are running a cluster with, for example, 4 cores and 16GB RAM nodes. We can extract a lot of information:
- Pod effective request is 400 MiB of memory and 600 millicores of CPU. You need a node with enough free allocatable space to schedule the pod.
- CPU shares for the redis container will be 512, and 102 for the busybox container. Kubernetes always assign 1024 shares to every core, so:
- redis: 1024 * 0.5 cores ≅ 512
- busybox: 1024 * 0.1cores ≅ 102
- Redis container will be OOM killed if it tries to allocate more than 600MB of RAM, most likely making the pod fail.
- Redis will suffer CPU throttle if it tries to use more than 100ms of CPU in every 100ms, (since we have 4 cores, available time would be 400ms every 100ms) causing performance degradation.
- Busybox container will be OOM killed if it tries to allocate more than 200MB of RAM, resulting in a failed pod.
- Busybox will suffer CPU throttle if it tries to use more than 30ms of CPU every 100ms, causing performance degradation.
In order to detect problems, we should be monitoring:
- CPU and Memory usage in the node. Memory pressure can trigger OOM kills if the node memory is full, despite all of the containers being under their limits. CPU pressure will throttle processes and affect performance.
Find these metrics in Sysdig Monitor in the dashboard: Kubernetes → Resource usage → Kubernetes node health
Find these metrics in Sysdig Monitor in the dashboard: Kubernetes → Resource usage → Kubernetes node health
- Disk space in the node. If the node runs out of disk, it will try to free disk space with a fair chance of pod eviction.
Find these metrics in Sysdig Monitor in the dashboard: Kubernetes → Resource usage → Kubernetes node health
- Percentage of CPU quota used by every container. Monitoring pod CPU usage can lead to errors. Remember, Kubernetes limits are per container, not per pod. Other CPU metrics, like cpu shares used, are only valid for allocating so don’t waste time on them if you have performance issues.
Find these metrics in Sysdig Monitor in the dashboard: Hosts & containers → Container limits
- Memory usage per container. You can relate this value to the limit in the same graph or analyze the percentage of memory limit used. Don’t use pod memory usage. A pod in the example can be using 300MiB of RAM, well under the pod effective limit (400MiB), but if redis container is using 100MiB and busybox container is using 200MiB, the pod will fail.
Find these metrics in Sysdig Monitor in the dashboard: Hosts & containers → Container limits
- Percentage of resource allocation in the cluster and the nodes. You can represent this as a percentage of resources allocated from total available resources. A good warning threshold would be (n-1)/n * 100, where n is the number of nodes. Over this threshold, in case of a node failure, you wouldn’t be able to reallocate your workloads in the rest of the nodes.
Find these metrics in Sysdig Monitor in the Overview feature → clusters
- Limit overcommit (for memory and CPU). The best way to clearly see this is the percentage that the limit represents in the total allocatable resources. This can go over 100% in a normal operation.
Custom graph showing cpu usage vs. capacity vs. limits vs. requests.
Choosing pragmatic requests and limits
When you have some experience with Kubernetes, you usually understand (the hard way) that properly setting requests and limits is of utmost importance for the performance of the applications and cluster.
In an ideal world, your pods should be continuously using exactly the amount of resources you requested. But the real world is a cold and fickle place, and resource usage is never regular or predictable. Consider a 25% margin up and down the request value as a good situation. If your usage is much lower than your request, you are wasting money. If it is higher, you are risking performance issues in the node.
Regarding limits, achieving a good setting is a matter of try and catch. There is no optimal value for everyone as it hardly depends on the nature of the application, the demand model, the tolerance to errors and many other factors.
Another thing to consider is the limit overcommit you allow on your nodes.
The enforcement of these limits are on the user, as there is no automatic mechanism to tell Kubernetes how much overcommit to allow.
Conclusion
Some lessons you should learn from this are:
- Dear developer, set requests and limits in your workloads.
- Beloved cluster admin, setting a namespace quota will enforce all of the workloads in the namespace to have a request and limit in every container.
Quotas are a necessity to properly share resources. If someone tells you that you can use any shared service without limits, they are either lying or the system will eventually collapse, to no fault of your own.
A good monitoring system like sysdig monitor will help you to ensure your quotas are properly configured. Request a demo today!