$ cat posts/how-to-upgrade-eks-132-bootstrap-to-nodeadm.md
How to Upgrade EKS 1.32: Making the Switch from bootstrap.sh to nodeadm
EKS 1.32 introduces the most significant architectural change in recent history: the legacy bootstrap.sh script is gone, replaced by nodeadm — a declarative, YAML-based node initialisation tool. If you're running AL2-based node groups, you need to act before November 26, 2025, when AWS ends support for EKS Amazon Linux 2 AMIs.
Staying on a deprecated Kubernetes version in EKS costs you six times more — $0.60/hour instead of the standard $0.10/hour, which can add $500/month per outdated cluster.
Kubernetes 1.32 is the final version with AL2 AMI support. After this, you must be on AL2023 or Bottlerocket.
Why bootstrap.sh Is Obsolete
The traditional /etc/eks/bootstrap.sh script auto-discovered cluster metadata through the EKS DescribeCluster API. It was a bash script that did a lot of implicit heavy lifting.
In AL2023, that script no longer exists.
With nodeadm, you must explicitly provide three parameters via YAML configuration:
apiServerEndpointcertificateAuthorityserviceCIDR
This is a breaking change if you have any automation, Terraform modules, or launch templates that reference bootstrap.sh arguments.
nodeadm Fundamentals
Unlike bash-based bootstrap arguments (--kubelet-extra-args), nodeadm uses a declarative YAML spec — NodeConfigSpec — for node customisation.
A minimal nodeadm config looks like this:
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: my-cluster
apiServerEndpoint: https://XXXXXXX.gr7.eu-west-1.eks.amazonaws.com
certificateAuthority: LS0tLS1CRUdJTi...
cidr: 10.100.0.0/16
kubelet:
config:
maxPods: 110
flags:
- --node-labels=role=worker
This declarative approach also reduces API throttling during large deployments — a real problem at scale when dozens of nodes bootstrap simultaneously and all hit DescribeCluster at once.
OS Options Post-Migration
You have two paths after dropping AL2:
Amazon Linux 2023 (AL2023)
- Familiar environment for teams coming from AL2
- Secure-by-default policies with SELinux in permissive mode
- IMDSv2-only enforcement
- DNF package manager (replaces yum)
Bottlerocket
- Purpose-built, container-optimised OS
- Minimal attack surface — no package manager, read-only root filesystem
- Automatic updates via the update operator
- Best choice if you want to reduce node-level operational burden long-term
Pre-Migration Checklist
Before touching anything in production:
# Identify AL2-based node groups
aws eks describe-nodegroup \
--cluster-name my-cluster \
--nodegroup-name my-nodegroup \
--query 'nodegroup.amiType'
# Scan for deprecated APIs — install kubent first
kubent
# Check current cluster version
kubectl version --short
Also:
- Back up etcd snapshots and cluster configurations
- Document all custom user-data scripts — these need to be rewritten for nodeadm
- Audit any Terraform launch template resources that pass
bootstrap.sharguments
Upgrading the Cluster
1. Upgrade the control plane first
aws eks update-cluster-version \
--name my-cluster \
--kubernetes-version 1.32
Wait for the update to complete before touching node groups:
aws eks wait cluster-active --name my-cluster
2. Update managed add-ons
Check compatibility before upgrading node groups. CoreDNS in particular has version constraints tied to the Kubernetes version:
aws eks describe-addon-versions \
--kubernetes-version 1.32 \
--addon-name coredns \
--query 'addons[].addonVersions[].addonVersion'
3. Create a new AL2023 node group
aws eks create-nodegroup \
--cluster-name my-cluster \
--nodegroup-name workers-al2023 \
--ami-type AL2023_x86_64_STANDARD \
--instance-types m5.large \
--scaling-config minSize=2,maxSize=10,desiredSize=3 \
--disk-size 50 \
--subnets subnet-xxxx subnet-yyyy \
--node-role arn:aws:iam::123456789012:role/eks-node-role
4. Drain and delete the old AL2 node group
# Cordon all old nodes
kubectl get nodes -l eks.amazonaws.com/nodegroup=workers-al2 \
-o name | xargs kubectl cordon
# Drain workloads off old nodes
kubectl get nodes -l eks.amazonaws.com/nodegroup=workers-al2 \
-o name | xargs kubectl drain --ignore-daemonsets --delete-emptydir-data
# Delete the old node group once new nodes are Ready
aws eks delete-nodegroup \
--cluster-name my-cluster \
--nodegroup-name workers-al2
Troubleshooting
If a node fails to join the cluster, these are your first commands:
# Validate the nodeadm config locally
nodeadm config check -c file://nodeConfig.yaml
# Run the nodeadm debug tool on the node
nodeadm debug -c file://nodeConfig.yaml
# Check kubelet status
systemctl status kubelet
journalctl -u kubelet -o cat
# Verify node registration from the control plane
kubectl get nodes -o wide
Common failure modes:
- Missing IAM permissions — the node role must have
eks:DescribeCluster. This replaces the implicit call thatbootstrap.shmade. - Network connectivity — the node must reach the API server endpoint on port 443. Check security groups and NACLs.
- Timeout on large node groups — pass
--timeout 20m0sto give nodes more time to initialise.
Post-Upgrade Validation
# All nodes should show Ready
kubectl get nodes
# Verify daemonset rollout matches node count
kubectl get daemonsets -A
# Test DNS resolution from inside a pod
kubectl run dns-test --image=busybox --restart=Never -- \
nslookup kubernetes.default.svc.cluster.local
kubectl logs dns-test
kubectl delete pod dns-test
What I'd Do Differently
The migration requires significant effort rewriting automation and Terraform modules. Here's what I learned:
- Start with non-production clusters. The nodeadm YAML format is straightforward but the devil is in the IAM permissions and network rules.
- Rewrite launch templates early. Any
user_datareferencingbootstrap.sharguments needs a full rewrite — there is no compatibility shim. - Pin add-on versions explicitly. Don't rely on
LATESTduring an upgrade. Pin to a tested version and upgrade add-ons separately from node groups. - Use managed node groups where possible. Self-managed groups with custom user-data multiply your migration surface area.
The long-term payoff is real: better security posture with IMDSv2 enforcement, reduced API throttling at scale, and avoiding the extended support cost penalty.
Originally published on DEV Community — October 28, 2025