hugolesta.nl

$ cat posts/how-to-upgrade-eks-132-bootstrap-to-nodeadm.md

How to Upgrade EKS 1.32: Making the Switch from bootstrap.sh to nodeadm

ekskubernetesawsnodeadmdevops

EKS 1.32 introduces the most significant architectural change in recent history: the legacy bootstrap.sh script is gone, replaced by nodeadm — a declarative, YAML-based node initialisation tool. If you're running AL2-based node groups, you need to act before November 26, 2025, when AWS ends support for EKS Amazon Linux 2 AMIs.

Staying on a deprecated Kubernetes version in EKS costs you six times more — $0.60/hour instead of the standard $0.10/hour, which can add $500/month per outdated cluster.

Kubernetes 1.32 is the final version with AL2 AMI support. After this, you must be on AL2023 or Bottlerocket.


Why bootstrap.sh Is Obsolete

The traditional /etc/eks/bootstrap.sh script auto-discovered cluster metadata through the EKS DescribeCluster API. It was a bash script that did a lot of implicit heavy lifting.

In AL2023, that script no longer exists.

With nodeadm, you must explicitly provide three parameters via YAML configuration:

  • apiServerEndpoint
  • certificateAuthority
  • serviceCIDR

This is a breaking change if you have any automation, Terraform modules, or launch templates that reference bootstrap.sh arguments.


nodeadm Fundamentals

Unlike bash-based bootstrap arguments (--kubelet-extra-args), nodeadm uses a declarative YAML spec — NodeConfigSpec — for node customisation.

A minimal nodeadm config looks like this:

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: my-cluster
    apiServerEndpoint: https://XXXXXXX.gr7.eu-west-1.eks.amazonaws.com
    certificateAuthority: LS0tLS1CRUdJTi...
    cidr: 10.100.0.0/16
  kubelet:
    config:
      maxPods: 110
    flags:
      - --node-labels=role=worker

This declarative approach also reduces API throttling during large deployments — a real problem at scale when dozens of nodes bootstrap simultaneously and all hit DescribeCluster at once.


OS Options Post-Migration

You have two paths after dropping AL2:

Amazon Linux 2023 (AL2023)

  • Familiar environment for teams coming from AL2
  • Secure-by-default policies with SELinux in permissive mode
  • IMDSv2-only enforcement
  • DNF package manager (replaces yum)

Bottlerocket

  • Purpose-built, container-optimised OS
  • Minimal attack surface — no package manager, read-only root filesystem
  • Automatic updates via the update operator
  • Best choice if you want to reduce node-level operational burden long-term

Pre-Migration Checklist

Before touching anything in production:

# Identify AL2-based node groups
aws eks describe-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --query 'nodegroup.amiType'

# Scan for deprecated APIs — install kubent first
kubent

# Check current cluster version
kubectl version --short

Also:

  • Back up etcd snapshots and cluster configurations
  • Document all custom user-data scripts — these need to be rewritten for nodeadm
  • Audit any Terraform launch template resources that pass bootstrap.sh arguments

Upgrading the Cluster

1. Upgrade the control plane first

aws eks update-cluster-version \
  --name my-cluster \
  --kubernetes-version 1.32

Wait for the update to complete before touching node groups:

aws eks wait cluster-active --name my-cluster

2. Update managed add-ons

Check compatibility before upgrading node groups. CoreDNS in particular has version constraints tied to the Kubernetes version:

aws eks describe-addon-versions \
  --kubernetes-version 1.32 \
  --addon-name coredns \
  --query 'addons[].addonVersions[].addonVersion'

3. Create a new AL2023 node group

aws eks create-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name workers-al2023 \
  --ami-type AL2023_x86_64_STANDARD \
  --instance-types m5.large \
  --scaling-config minSize=2,maxSize=10,desiredSize=3 \
  --disk-size 50 \
  --subnets subnet-xxxx subnet-yyyy \
  --node-role arn:aws:iam::123456789012:role/eks-node-role

4. Drain and delete the old AL2 node group

# Cordon all old nodes
kubectl get nodes -l eks.amazonaws.com/nodegroup=workers-al2 \
  -o name | xargs kubectl cordon

# Drain workloads off old nodes
kubectl get nodes -l eks.amazonaws.com/nodegroup=workers-al2 \
  -o name | xargs kubectl drain --ignore-daemonsets --delete-emptydir-data

# Delete the old node group once new nodes are Ready
aws eks delete-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name workers-al2

Troubleshooting

If a node fails to join the cluster, these are your first commands:

# Validate the nodeadm config locally
nodeadm config check -c file://nodeConfig.yaml

# Run the nodeadm debug tool on the node
nodeadm debug -c file://nodeConfig.yaml

# Check kubelet status
systemctl status kubelet
journalctl -u kubelet -o cat

# Verify node registration from the control plane
kubectl get nodes -o wide

Common failure modes:

  • Missing IAM permissions — the node role must have eks:DescribeCluster. This replaces the implicit call that bootstrap.sh made.
  • Network connectivity — the node must reach the API server endpoint on port 443. Check security groups and NACLs.
  • Timeout on large node groups — pass --timeout 20m0s to give nodes more time to initialise.

Post-Upgrade Validation

# All nodes should show Ready
kubectl get nodes

# Verify daemonset rollout matches node count
kubectl get daemonsets -A

# Test DNS resolution from inside a pod
kubectl run dns-test --image=busybox --restart=Never -- \
  nslookup kubernetes.default.svc.cluster.local
kubectl logs dns-test
kubectl delete pod dns-test

What I'd Do Differently

The migration requires significant effort rewriting automation and Terraform modules. Here's what I learned:

  • Start with non-production clusters. The nodeadm YAML format is straightforward but the devil is in the IAM permissions and network rules.
  • Rewrite launch templates early. Any user_data referencing bootstrap.sh arguments needs a full rewrite — there is no compatibility shim.
  • Pin add-on versions explicitly. Don't rely on LATEST during an upgrade. Pin to a tested version and upgrade add-ons separately from node groups.
  • Use managed node groups where possible. Self-managed groups with custom user-data multiply your migration surface area.

The long-term payoff is real: better security posture with IMDSv2 enforcement, reduced API throttling at scale, and avoiding the extended support cost penalty.

Originally published on DEV Community — October 28, 2025