Kubernetes pod autoscaler using custom metrics

NOUVEAUTÉ! Rapport 2022 d'usage et de sécurisation des applications Cloud Native

In this post we are going to demonstrate how to deploy a Kubernetes autoscaler using a third party metrics provider. You will learn how to expose any custom metric directly through the Kubernetes API implementing an extension service. Dynamic scaling is not a new concept by any means, but implementing your own scaler is a rather complex and delicate task. That’s why the Kubernetes Horizontal Pod Autoscaler (HPA) is a really powerful Kubernetes mechanism: it can help you to dynamically adapt your service in a way that is reliable, predictable and easy to configure.

Why you may need custom metrics for your Kubernetes autoscaler?

If the metrics-server plugin is installed in your cluster, you will be able to see the CPU and memory values for your cluster nodes or any of the pods. These metrics are useful for internal cluster sizing, you probably want to configure your Kubernetes autoscaler using a wider set of metrics:

  • Service latency -> net.http.request.time
  • I/O load -> file.iops.total
  • Memory usage -> memory.used.percent

You need a metrics provider that is able to provide detailed performance information, aggregated using Kubernetes metadata (deployments, services, pod). The extension API server implementation in this post uses Sysdig Monitor.

Before we start deploying our custom Kubernetes autoscaler, let’s go over the HPA basics.

Kubernetes autoscaler (HPA) basic concepts

We can start with a simple diagram:

Kubernetes autoscaler diagram

As you can see above, the HPA object will interact with a pod controller like a Deployment or ReplicaSet. It will update this object to configure the “desired” number of pods given the current metric readings and thresholds.

The pod controller, a Deployment for instance, will then terminate or create new pods as part of its reconciliation loop to reach the desired state.

The basic parameters that will you need for any HPA are:

  • Scale target: the controller that this HPA will interact with
    • minReplicas: minimum number of pods, the HPA cannot go below this value
    • maxRepicas: maximum number of pods, the HPA cannot go above this value
  • Target metric(s): metric (or metrics) used to evaluate current load and take scaling decisions
    • targetValue: threshold value for the metric. If the metric readings are above this value, and (currentReplicas < maxReplicas), HPA will scale up.

You can create a Kubernetes HPA in just one line:

$ kubectl autoscale deployment shell --min=2 --max=10 --cpu-percent=10
horizontalpodautoscaler.autoscaling/shell autoscaled

If you generate high CPU loads in these pods, the HPA will scale up the desired number of replicas:

23s         Normal    SuccessfulRescale   HorizontalPodAutoscaler   New size: 4; reason: cpu resource utilization (percentage of request) above target

It will also scale down again when the CPU burst is over. Pretty neat, right? There are many other details covering the HPA algorithm and advanced configuration details in the Kubernetes official documentation.

Like many other things in Kubernetes, the set of metrics available to the HPAs can be expanded implementing an API extension. Let’s see how this is done.

Kubernetes custom metrics API

The Kubernetes HPA is able to retrieve metrics from several APIs out of the box: metrics.k8s.io, custom.metrics.k8s.io (the one that we will use in this post), and external.metrics.k8s.io.

To register custom metrics and update their values, you need to:

  • Enable the Kubernetes aggregation layer
  • Register a new APIService object that will bind the new API path to the Kubernetes service implementing it
  • The actual service implementation (a pod living inside a Kubernetes namespace for this example) that responds to the HPA requests and retrieves the metric values from the external provider

If you are using a recent Kubernetes version (1.11+), the API aggregation layer is probably enabled and configured out of the box, so you can skip this step. You can check the relevant API server parameters describing the kube-apiserver pod living in your kube-system namespace:

$ kubectl describe pod kube-apiserver -n kube-system
...
    Command:
      kube-apiserver
...
      --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
      --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
      --requestheader-allowed-names=front-proxy-client
      --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
      --requestheader-extra-headers-prefix=X-Remote-Extra-
      --requestheader-group-headers=X-Remote-Group
      --requestheader-username-headers=X-Remote-User
...

In case your API server doesn’t have these flags, the Kubernetes documentation has an article explaining how to configure them.

These parameters enable the kube-aggregator, a controller in charge of two tasks:

  • Discovering and registering APIService objects, creating a link between the newly registered API path and the Kubernetes service implementing it.
  • Acting as the front-proxy / forwarder for these requests.

This is how a basic APIService object will look like:

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1alpha1.wardle.k8s.io
spec:
  insecureSkipTLSVerify: true
  group: wardle.k8s.io
  groupPriorityMinimum: 1000
  versionPriority: 15
  service:
    name: api
    namespace: wardle
  version: v1alpha1

Leaving aside the more advanced configuration details, this object will instruct the kube-aggregator to forward v1alpha1.wardle.k8s.io requests to the API extension service, implemented by a pod in the wardle namespace.

Now that we have covered the basic concepts, we can deploy a Horizontal Pod Autoscaler using Kubernetes custom metrics.

Kubernetes autoscaler using Sysdig’s custom metrics

Prerequisites

Sysdig Monitor will be the external metrics provider for the implementation used in this scenario. The prerequisites to deploy this stack are:

Overview diagram

Let’s start with a diagram covering the entire workflow for this integration:

Kubernetes autoscaler overview

1: The Horizontal Pod Autoscaler will be configured to manage a Kubernetes deployments by pivoting on the value of a custom metric and its target threshold. As we explained earlier, the HPA itself is abstracted from the implementation details, it just needs to request the metrics from the Kubernetes API.

2: The number of desired pods in this Deployment will be dynamically updated by the HPA, the reconciliation loop will kill or create new pods to reach the configured state.

3: The Sysdig agent will collect metrics and metric metadata (labels) from two sources. Container metrics are collected directly from the Linux kernel – additional metadata on these metrics (for example the namespace, service or deployment associated with a container metric) is pulled from the Kubernetes API. These two metrics streams are processed, aggregated and sent to the Sysdig backend.

4: The extension API server is a pod living in the same Kubernetes cluster as the HPA. It implements the custom metrics apiserver interface and is able to dynamically pull metric values from the Sysdig backend using the API access token. The current version will post separate metric values for every Namespace, Pod and Service in your cluster, but the code can be easily modified if you need a different aggregation.

5: The kube-aggregator in this Kubernetes API server has been configured (through an APIServer object) to forward custom.metrics.k8s.iorequeststo the extension API server. It will then return the requested value to the HPA.

Installation

Let’s get down to work. The first step is cloning the Sysdig metrics apiserver repository:

$ git clone https://github.com/draios/kubernetes-sysdig-metrics-apiserver.git

You have the complete Golang code, Makefiles and Dockerfiles for the project in this repo, but in this article we are going to focus on the actual deployment and operation. Everything you need is inside the deploy directory.

$ cd deploy
$ ls
00-kuard.yml  01-sysdig-metrics-rbac.yml  02-sysdig-metrics-server.yml  03-kuard-hpa.yml

Target deployment

The first yaml to apply is probably the simplest one – a Kubernetes Deployment and Service to deploy kuard (a demo application found in the “Kubernetes Up and Running” book).

$ kubectl apply -f 00-kuard.yml 
deployment.extensions/kuard created
service/kuard created

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
kuard-6b6995ff77-6gxbm   1/1     Running   0          60s
kuard-6b6995ff77-cvd72   1/1     Running   0          60s
kuard-6b6995ff77-rvznt   1/1     Running   0          60s

Now you have a target Deployment to scale.

APIServer definition and RBAC permissions

The second yaml will create the APIServer definition:

apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  insecureSkipTLSVerify: true
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 1000
  versionPriority: 5
  service:
    name: api
    namespace: custom-metrics
  version: v1beta1

This object will instruct the kube-aggregator to forward custom metrics requests (v1beta1 version) to the api service in the custom-metrics namespace.

This YAML will also create the custom-metrics namespace itself, the ServiceAccount to be used by the custom metrics pod and several RBAC Roles and RoleBindings.

We need these RBAC bindings to allow the custom metrics account to register itself as an API extension and also to read the list of namespaces, pods and services. The custom metrics server needs this metadata to aggregate the requests to the Sysdig backend using the same labels, for example namespace=default service=kuard.

Applying this second yaml file you should see the following output:

$ kubectl apply -f 01-sysdig-metrics-rbac.yml
namespace/custom-metrics created
serviceaccount/custom-metrics-apiserver created
clusterrolebinding.rbac.authorization.k8s.io/custom-metrics:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/custom-metrics-auth-reader created
clusterrole.rbac.authorization.k8s.io/custom-metrics-resource-reader created
clusterrolebinding.rbac.authorization.k8s.io/custom-metrics-apiserver-resource-reader created
clusterrole.rbac.authorization.k8s.io/custom-metrics-getter created
clusterrolebinding.rbac.authorization.k8s.io/hpa-custom-metrics-getter created
service/api created
apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created

Check that the api extension has been configured:

$ kubectl api-versions | grep "custom.metrics"
custom.metrics.k8s.io/v1beta1

Take into account that the API extension has been declared but not implemented (yet), any request to this API will fail:

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/kuard/net.http.request.count" | jq .
Error from server (ServiceUnavailable): the server is currently unable to handle the request

Sysdig metric server

Ok, so let’s deploy the custom metrics pod then. There are two parameters that you will need to configure, the Sysdig API endpoint and access token.

Edit the file 02-sysdig-metrics-server.yml. You can find the API endpoint configured as an environment variable in the deployment definition:

- name: SDC_ENDPOINT
  value: "https://app.sysdigcloud.com/api/"
- name: CLUSTER_NAME 
  value: "YourClusterName"

You have to add a new environment variable named CLUSTER_NAME that has to be the same as the name of your cluster in Sysdig Monitor.

If you are using the SaaS version, the default value should work and you can skip this step. If you want to connect to an on-prem backend, adjust this parameter accordingly.

The access token is mounted using a Kubernetes secret – retrieve your token from the Sysdig interface Settings -> User profileand execute:

$ kubectl create secret generic --from-literal access-key=<YOUR_SYSDIG_API_TOKEN_HERE> -n custom-metrics sysdig-api

Once you have configured these two parameters, you can deploy the custom metrics server:

$ kubectl create -f 02-sysdig-metrics-server.yml 
deployment.apps/custom-metrics-apiserver created

$ kubectl get pods -n custom-metrics
NAME                                       READY   STATUS    RESTARTS   AGE
custom-metrics-apiserver-96d86694b-7shmx   1/1     Running   0          16s

You can check if the new metrics are available in the Kubernetes API using a raw request (jq is optional, used to format the output)

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/kuard/net.http.request.count" | jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/kuard/net.http.request.count"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Service",
        "namespace": "default",
        "name": "kuard",
        "apiVersion": "/__internal"
      },
      "metricName": "net.http.request.count",
      "timestamp": "2019-07-07T15:55:27Z",
      "value": "0"
    }
  ]
}

Note that the describedObjectelement contains the Kubernetes labels used to aggregate this metric. We also have the timestamp for the request and the metric value, which in this case is 0 because we are not sending any http traffic to the kuard service (yet).

Kubernetes autoscaler using custom metrics

Now that you have the Kubernetes custom metrics, you just need an HPA to act on them.

You can configure it to target any Sysdig metric. By default we are going to use the net.http.request.countbecause is a good service load indicator and is also easy to test using any HTTP client.

The HorizontalPodAutoscaler object is quite self-explanatory:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: kuard-autoscaler
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kuard
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Object
    object:
      target:
        kind: Service
        name: deployment;kuard
      metricName: net.http.request.count
      targetValue: 100

It will target the kuard deployment, setting a minimum of 3 replicas and a maximum of 10. The target metric is the net.http.request.count aggregated by the pods belonging to the kuard service.

The target value for this metric is 100 req/s. For testing purposes, you can change it to a much lower value, 4 for example.

Apply the last yaml file:

$ kubectl apply -f 03-kuard-hpa.yml 
horizontalpodautoscaler.autoscaling/kuard-autoscaler created

And check that is working as you expect:

$ kubectl get hpa kuard-autoscaler
NAME               REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
kuard-autoscaler   Deployment/kuard   0/4       3         10        3          66s

The current reading of the metrics is 0, the target value is 4, minimum number of kuard pods is 3 and the maximum is 10. There are currently 3 pods running. The HPA is operative and ready to go, now we can test it live.

Testing the Kubernetes custom metrics HPA

Testing this deployment is fairly simple, we just need to generate HTTP requests.

First, you need to be able to access the kuard service. In a production scenario this will mean configuring an ingress controller, for this example we can just forward the http port to the local host:

$ kubectl port-forward service/kuard 8080:80
Forwarding from 127.0.0.1:8080 -> 8080

Leave that command running and open a different console. Now we need to generate http load, there are multiple ways to do this, for example using the ab tool from Apache:

$ ab -c 10 -t 120 http://localhost:8080/

Leave the command running for a minute. Then, describe your HPA controller and you should see something similar to this output:

$ kubectl get hpa kuard-autoscaler
NAME               REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
kuard-autoscaler   Deployment/kuard   9/4       3         10        10         37m

$ kubectl describe hpa kuard-autoscaler
Normal   SuccessfulRescale             21s   horizontal-pod-autoscaler  New size: 6; reason: Service metric net.http.request.count above target
  Normal   SuccessfulRescale             6s    horizontal-pod-autoscaler  New size: 10; reason: Service metric net.http.request.count above target

Metric readings were above the target value, so the HPA scaled your pods, first from 3 to 6 and then from 6 to 10, which is the maximum.

Once the ab tool is finished sending requests, the target value for the custom metric will go back to 0. If you wait a few minutes, you should be able to check that the HPA has downsized the deployment replica count back to 3:

$ kubectl get hpa kuard-autoscaler
NAME               REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
kuard-autoscaler   Deployment/kuard   0/4       3         10        3          99m


$ kubectl describe hpa kuard-autoscaler
 Normal   SuccessfulRescale             45s (x3 over 66m)    horizontal-pod-autoscaler  New size: 3; reason: All metrics below target

You have deployed a Kubernetes custom metrics provider and a Horizontal Pod Autoscaler that is able to resize your deployments pivoting on those metrics!

Conclusion

One of the strong points of Kubernetes has always been its extensibility. Thanks to the aggregation layer, you can extend the API, without adding extra complexity or configuration to the resource consumers (the Horizontal Pod Autoscaler in our example).

If you plan to use this integration in your organization, or just a lab environment, we definitely want to hear from you! You can reach us using slack or twitter and, of course, PRs to the project are welcome.

If you would like to run this example, but don’t have a Sysdig Monitor account, we invite you to sign-up for a free trial.

Stay up to date

Sign up to receive our newest.

Related Posts

How to monitor Golden signals in Kubernetes

JMX monitoring + Java custom metrics.

Providing visibility + security for AWS App Mesh.