Skip to content

2025

Vertical pod autoscaler flush historical data

We recently had to roll out a new release of DCGM exporter, a tool that monitors Nvidia GPU performance and outputs metrics. It runs as a DaemonSet on all GPU Kubernetes nodes. With the new release there is a significant increase in memory resource consumption, normally this would be easy to handle through increasing resource requests and limits. But what happens if you decided to have Vertical Pod Autoscaler (VPA) manage resources through it's auto mode.

Introduction to Vertical Pod Autoscaler

Have you ever deployed a new and shiny thing, no matter if its custom or something off the shelf, and felt like choosing resource requests and limits was totally unqualified. This is where Vertical Pod Autoscaler comes into the picture, it can free users from setting or guessing resource requests and limits on containers in their pods and updating them if requirements changes.

VPA can run in two modes recommendation or auto mode. Recommendation mode has a lot of value by it self by analysis current and historical resource usage, but requires you to manual changes to follow the recommended resources settings. Auto mode uses the recommendation, but can also adjust resources on the fly. This is great and has a lot of benefits among them to not waste resources on services that fluctuate and cannot scale horizontally.

We run a lot services in VPA auto mode, among them the DCGM exporter.

Roll out new release of DCGM exporter

We already knew from testing that the DCGM exporter had a significant increase in memory resource consumption, so we changed the maxAllowed.memory specification on the VerticalPodAutoscaler custom resource. The hope was that VPA would automatically adjust resources for the DCGM exporter rather quickly, but that didn't happen. DCGM exporter went into OOMKill crashlooping mode while the recommended memory from the VPA slowly crawled upwards. The OOmKill was expected but the slow adjustment from VPA was a surprise. There where probably many contributing factors, but the crashloop backoff didn't help.

So how did we solve it?

Flushing VPA historical data

In the end we ended up deleting the appropiate VPACheckpoint resource and flushing memory on the VPA recommender component.

kubectl delete vpaceckpoint -n dcgm-exporter dcgm-exporter
kubectl delete pod -n kube-system -l app=vpa-recommender

This almost immidiatly got the dcgm-exporter to the appropiate resources and out of OOMKill crashlooping.

Docker Hub rate limits

Docker Hub recently announced 10 pulls/hour for unauthenticated users. This has pretty significant impact in container orchestration, e.g. Kubernetes. I will not cover whether it is fair or not, but give credit to Docker Hub for its contributions to the community.

So how does this rate limit impact Kubernetes?

It can be hard to predict how many images will be pulled when a new node joins the cluster from an operator/administrator perspective.

How could you solve it?

We've opted for implementing a AWS ECR pull through cache. It is easy to setup and works like a charm.

Where there any side effects?

  1. All image references in manifests has to change from nginx:latest to ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/docker-hub/nginx:latest (don't use latest)

  2. Flux GitOps ImageUpdateAutomation breaks for CRD resources that reference images

  3. Renovate updates breaks because the cache doesn't have knowledge of new tags

I will try to cover possible solutions for above side effects in future posts.

Run and debug Renovate locally

Last I gave a quick introduction to Renovate and how to run it in centralised configuration. Today we will go over how to run Renovate locally for debugging and extending configuration purpose, which is very handy.

npx --yes --package renovate -- renovate --dry-run=full --token="GITHUB_TOKEN" wcarlsen/repository0

This requires only a Github token and to change LOG_LEVEL, just set it as an environment variable to DEBUG.

Now go customise your config.js or renovate.json config files to get the best out of Renovate.

Automatic dependency updates (Renovate)

I've for a while now been running selfhosted Renovate at work for handling automatic dependency updates for my team and I can only recommend it. It's like Github's dependabot but on steroids and very simple to setup.

Setup can be structured in two ways, I have implemented the latter.

  • per repository - flexible but not very DRY (don't repeat yourself)
  • centralised - not as flexible but very DRY

All that is needed is a config.js file.

module.exports = {
  branchPrefix: 'update/renovate/',
  username: 'your-service-account-name',
  onboarding: false,
  requireConfig: 'optional',
  platform: 'github',
  repositories: [
    'wcarlsen/repository0',
    'wcarlsen/repository1',
  ],
  packageRules: [
    {
      matchUpdateTypes: [
        'digest',
        'lockFileMaintenance',
        'patch',
        'pin',
      ],
      minimumReleaseAge: '1 day',
      automerge: false,
      matchCurrentVersion: '!/(^0|alpha|beta)/',
      dependencyDashboard: true,
    },
    {
      matchUpdateTypes: [
        'minor'
      ],
      minimumReleaseAge: '7 day',
      automerge: false,
      matchCurrentVersion: '!/(^0|alpha|beta)/',
      dependencyDashboard: true,
    },
    {
      matchUpdateTypes: [
        'major'
      ],
      minimumReleaseAge: '14 day',
      automerge: false,
      dependencyDashboard: true,
    },
  ],
};

and a Github action and service account PAT.

name: Renovate
on:
  schedule:
    - cron: "15 2 * * 1-5" # Every week day at 02.15
  workflow_dispatch:
jobs:
  renovate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Self-hosted Renovate
        uses: renovatebot/github-action@02f4fdeb479bbb229caa7ad82cb5e691c07e80b3 # v41.0.14
        env:
          LOG_LEVEL: ${{ vars.LOG_LEVEL || 'info' }}
          RENOVATE_INTERNAL_CHECKS_FILTER: none
        with:
          configurationFile: config.js
          token: ${{ secrets.RENOVATE_TOKEN }}

Local overwrites can be done in the repositories root with a renovate.json.

{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "packageRules": [
    {
      "matchPackageNames": ["registry.k8s.io/autoscaling/cluster-autoscaler"],
      "allowedVersions": "<1.33.0"
    }
  ]
}

Enjoy those well deserved automatic dependency updates.

renovate