Skip to content

Spaghetti 🍝 (blog)

Client side validations of Kubernetes manifests

To be honest writing Kubernetes manifests can be tedius and it prone to misconfiguration. Of course it will in the end be validated server side, but we would like to avoid most errors before we hand off the manifests to the API server. This can be particular helpful when utilizing GitOps, since the changes will be consumed asynchronous. To achieve this will use the following tooling:

Let's start with kustomize and make sure that we can actually build our manifest bundle.

kustomize build path-to-kustomziation-file

We can now add this to .pre-commit-config.yaml file to the root of the project to have it run every time we commit.

repos:
- repo: local
  hooks:
  - id: kustomize
    name: validate kustmoizations
    language: system
    entry: kustomize
    args:
    - build
    - path-to-kustomziation-file
    always_run: true
    pass_filenames: false

Now on to kubeconform for validating our manifests.

kubeconform -strict -skip CustomResourceDefinition,Kustomization \
  -kubernetes-version 1.33.0 \
  -schema-location default \
  -schema-location 'https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/{{.Group}}/{{.ResourceKind}}_{{.ResourceAPIVersion}}.json' \
  path-to-your-manifests

We of course depend on the CRDs catalog having our CRs and them being updated, but it is relatively easy to contribute to the catalog see PRs #453 and #600.

We can now also add this to our pre-commit config file like so.

repos:
...
- repo: local
  hooks:
  - id: kubeconform
    name: validate kubernetes manifests
    language: system
    entry: kubeconform
    args:
    - -strict
    - -kubernetes-version 1.33.0
    - -skip
    - CustomResourceDefinition,Kustomization
    - -schema-location
    - default
    - -schema-location
    - 'https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/{{.Group}}/{{.ResourceKind}}_{{.ResourceAPIVersion}}.json'
    files: ^path-to-your-manifests/.*

Using pre-commit is nice to validate your commits, but it requires everybody to install it and running pre-commit install. So to enforce above validations we can add a CI step in the form of a Github action.

name: Pre-commit
on:
  - pull_request
jobs:
  pre-commit:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v5
    - uses: alexellis/arkade-get@master
      with:
        kustomize: latest
        kubeconform: latest
    - uses: pre-commit/action@v3.0.1

This setup is not bullet proof, but it do add some extra confidence and it is very low effort to get going.


  1. This action is in maintenance-only mode and you should support the project by using pre-commit.ci instead. But so that everyone can follow the other option is used. 

EBS CSI driver and AL2023

After upgrading to Amazon Linux 2023 (AL2023) we started seeing errors from the aws-ebs-csi-driver running in our clusters.

ebs-plugin I0626 06:40:25.662215       1 main.go:154] "Initializing metadata"
ebs-plugin I0626 06:40:25.662374       1 metadata.go:66] "Attempting to retrieve instance metadata from IMDS"
ebs-plugin E0626 06:40:30.665263       1 metadata.go:72] "Retrieving IMDS metadata failed" err="could not get IMDS metadata: operation error ec2imds: GetInstanceIdentityDocument, canceled, context deadline exceeded"
ebs-plugin I0626 06:40:30.665357       1 metadata.go:75] "Attempting to retrieve instance metadata from Kubernetes API"

This is due to AL2023 improved security ensuring features blocking pods from calling metadata service on the nodes due to a network hop limit of 1. The aws-ebs-csi-driver eventually falls back to using the Kubernetes API, but we are waiting ~5 seconds for the call to timeout. With the release of aws-ebs-csi-driver v1.45.0 they have implemented a flag (--metadata-sources) allowing us to set a priority order or choose a specific way of getting metadata. In our case it would be set to "kubernetes".

This should prevent above shown errors.

Upgrade from AL2 to AL2023 learnings

Ever since AWS annouced that Amazon Linux 2023 (AL2023) AMI type is replacing Amazon Linux 2 (AL2), I have been excited about it. Mainly because of the cgroup v2 upgrade and the improved security with IMDSv2. To explain it quick

  • cgroup v2 should provide more transparency when container sub-processes are OOM killed.
  • IMDSv2 will block pods calling the metadata service on the nodes (getting an AWS context) due to a network hop limit.

The AMI upgrade is needed for upgrading worker nodes on EKS from 1.32 to 1.33, since no AL2 AMI is build for 1.33.

Upon testing we found a few things breaking, but nothing major. The AWS load balancer controller broke, but only needed the --aws-vpc-id and --aws-region flag set to work again. We ended up removing the spot-termination-exporter (supplying insight into spot-instance interruptions), since it realies heavily on the metadata service, which was now blocked. Sad, but we have lived without it before.

We then went on to upgrading all clusters and worker nodes to version 1.33. The upgrade went smooth except for one thing that we overlooked. We rely on flux image-reflector-controller to scan container registries and that also uses the metadata service to use get context of the nodes. Luckily this was a fairly easy fix, where we ended up patching an IRSA role annotation to the image-reflector-controller ServiceAccount in following way.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
patches:
  - patch: |
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: image-reflector-controller
        annotations:
          eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/eks_CLUSTER_NAME_flux-image-reflector
    target:
      kind: ServiceAccount
      name: image-reflector-controller

We are now enjoing AL2023 and are so far happy with the upgrade.

Vertical pod autoscaler flush historical data

We recently had to roll out a new release of DCGM exporter, a tool that monitors Nvidia GPU performance and outputs metrics. It runs as a DaemonSet on all GPU Kubernetes nodes. With the new release there is a significant increase in memory resource consumption, normally this would be easy to handle through increasing resource requests and limits. But what happens if you decided to have Vertical Pod Autoscaler (VPA) manage resources through it's auto mode.

Introduction to Vertical Pod Autoscaler

Have you ever deployed a new and shiny thing, no matter if its custom or something off the shelf, and felt like choosing resource requests and limits was totally unqualified. This is where Vertical Pod Autoscaler comes into the picture, it can free users from setting or guessing resource requests and limits on containers in their pods and updating them if requirements changes.

VPA can run in two modes recommendation or auto mode. Recommendation mode has a lot of value by it self by analysis current and historical resource usage, but requires you to manual changes to follow the recommended resources settings. Auto mode uses the recommendation, but can also adjust resources on the fly. This is great and has a lot of benefits among them to not waste resources on services that fluctuate and cannot scale horizontally.

We run a lot services in VPA auto mode, among them the DCGM exporter.

Roll out new release of DCGM exporter

We already knew from testing that the DCGM exporter had a significant increase in memory resource consumption, so we changed the maxAllowed.memory specification on the VerticalPodAutoscaler custom resource. The hope was that VPA would automatically adjust resources for the DCGM exporter rather quickly, but that didn't happen. DCGM exporter went into OOMKill crashlooping mode while the recommended memory from the VPA slowly crawled upwards. The OOmKill was expected but the slow adjustment from VPA was a surprise. There where probably many contributing factors, but the crashloop backoff didn't help.

So how did we solve it?

Flushing VPA historical data

In the end we ended up deleting the appropiate VPACheckpoint resource and flushing memory on the VPA recommender component.

kubectl delete vpaceckpoint -n dcgm-exporter dcgm-exporter
kubectl delete pod -n kube-system -l app=vpa-recommender

This almost immidiatly got the dcgm-exporter to the appropiate resources and out of OOMKill crashlooping.

Docker Hub rate limits

Docker Hub recently announced 10 pulls/hour for unauthenticated users. This has pretty significant impact in container orchestration, e.g. Kubernetes. I will not cover whether it is fair or not, but give credit to Docker Hub for its contributions to the community.

So how does this rate limit impact Kubernetes?

It can be hard to predict how many images will be pulled when a new node joins the cluster from an operator/administrator perspective.

How could you solve it?

We've opted for implementing a AWS ECR pull through cache. It is easy to setup and works like a charm.

Where there any side effects?

  1. All image references in manifests has to change from nginx:latest to ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/docker-hub/nginx:latest (don't use latest)

  2. Flux GitOps ImageUpdateAutomation breaks for CRD resources that reference images

  3. Renovate updates breaks because the cache doesn't have knowledge of new tags

I will try to cover possible solutions for above side effects in future posts.

Run and debug Renovate locally

Last I gave a quick introduction to Renovate and how to run it in centralised configuration. Today we will go over how to run Renovate locally for debugging and extending configuration purpose, which is very handy.

npx --yes --package renovate -- renovate --dry-run=full --token="GITHUB_TOKEN" wcarlsen/repository0

This requires only a Github token and to change LOG_LEVEL, just set it as an environment variable to DEBUG.

Now go customise your config.js or renovate.json config files to get the best out of Renovate.

Automatic dependency updates (Renovate)

I've for a while now been running selfhosted Renovate at work for handling automatic dependency updates for my team and I can only recommend it. It's like Github's dependabot but on steroids and very simple to setup.

Setup can be structured in two ways, I have implemented the latter.

  • per repository - flexible but not very DRY (don't repeat yourself)
  • centralised - not as flexible but very DRY

All that is needed is a config.js file.

module.exports = {
  branchPrefix: 'update/renovate/',
  username: 'your-service-account-name',
  onboarding: false,
  requireConfig: 'optional',
  platform: 'github',
  repositories: [
    'wcarlsen/repository0',
    'wcarlsen/repository1',
  ],
  packageRules: [
    {
      matchUpdateTypes: [
        'digest',
        'lockFileMaintenance',
        'patch',
        'pin',
      ],
      minimumReleaseAge: '1 day',
      automerge: false,
      matchCurrentVersion: '!/(^0|alpha|beta)/',
      dependencyDashboard: true,
    },
    {
      matchUpdateTypes: [
        'minor'
      ],
      minimumReleaseAge: '7 day',
      automerge: false,
      matchCurrentVersion: '!/(^0|alpha|beta)/',
      dependencyDashboard: true,
    },
    {
      matchUpdateTypes: [
        'major'
      ],
      minimumReleaseAge: '14 day',
      automerge: false,
      dependencyDashboard: true,
    },
  ],
};

and a Github action and service account PAT.

name: Renovate
on:
  schedule:
    - cron: "15 2 * * 1-5" # Every week day at 02.15
  workflow_dispatch:
jobs:
  renovate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Self-hosted Renovate
        uses: renovatebot/github-action@02f4fdeb479bbb229caa7ad82cb5e691c07e80b3 # v41.0.14
        env:
          LOG_LEVEL: ${{ vars.LOG_LEVEL || 'info' }}
          RENOVATE_INTERNAL_CHECKS_FILTER: none
        with:
          configurationFile: config.js
          token: ${{ secrets.RENOVATE_TOKEN }}

Local overwrites can be done in the repositories root with a renovate.json.

{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "packageRules": [
    {
      "matchPackageNames": ["registry.k8s.io/autoscaling/cluster-autoscaler"],
      "allowedVersions": "<1.33.0"
    }
  ]
}

Enjoy those well deserved automatic dependency updates.

renovate