Azure AKS Troubleshooting Hands-On - Pod Failing to Insufficient Resources
Azure Learning Path for Cloud and DevOps Engineers
Table of contents
📝Introduction
In this hands-on lab, we will guide for troubleshooting a real scenario in Azure Kubernetes Service (AKS) for a common issue: a Pod failing to start due to insufficient resources.
Learning objectives:
In this module, you'll learn how to:
Identify the issue
Resolve the issue
📝Log in to the Azure Management Console
Using your credentials, make sure you're using the right Region. In my case, I am using the region uksouth
in my Cloud Playground Sandbox.
📌Note: You can also use the VSCode tool or from your local Terminal to connect to Azure CLI
More information on how to set it up is at the link.
📝Prerequisites:
Update to PowerShell 5.1, if needed.
Install .NET Framework 4.7.2 or later.
Visual Code
Web Browser (Chrome, Edge)
Azure CLI installed
Azure subscription
Docker installed
📝Setting an Azure Storage Account to Load Bash or PowerShell
- Click the Cloud Shell icon
(>_)
at the top of the page.
- Click PowerShell.
- Click Show Advanced Settings. Use the combo box under Cloud Shell region to select the Region. Under Resource Group and Storage account(It's a globally unique name), enter a name for both. In the box under File Share, enter a name. Click ***Create storage (***if you don't have any yet).
📝Create an AKS Cluster
Create an AKS cluster using the
az aks create
command, but before storing the name of the cluster inside a variable named CLUSTERNAME.Copy
CLUSTERNAME=<AKSClusterName> az aks create -n $CLUSTERNAME -g $RG --node-vm-size Standard_D2s_v3 --node-count 2 --generate-ssh-keys
📝 Connect to AKS Cluster
Use the Azure Cloud Shell to check your AKS Cluster resources, by following the steps below:
Go to Azure Dashboard, and click on the Resource Group created for this Lab, looking for your AKS Cluster resource.
On the Overview tab, click on Connect to your AKS Cluster.
-
A new window will be opened, so you only need to open the Azure CLI and run the following commands:
az login
az account set subscription <your-subscription-id>
az aks get-credentials -g <nameRersourceGroup> -n <nameAKSCluster> --overwrite-existing
After that, you can run some Kubectl commands to check the default AKS Cluster resources.
📝Deploy the Application to AKS
Simulate the Issue:
Deploy a Sample Application: Create a deployment YAML file (
nginx-deployment.yaml
) with resource requests that exceed the available resources on the node:apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest resources: requests: memory: "2Gi" cpu: "2" limits: memory: "2Gi" cpu: "2"
Apply the Deployment:
kubectl apply -f nginx-deployment.yaml
Identify the Issue:
- Check Pod Status:
kubectl get pods
- Describe the Pod:
kubectl describe pod <pod-name>
Look for events indicating why the pod is not starting. You might see messages like “Insufficient cpu” or “Insufficient memory”.
Troubleshoot the Issue:
- Check Node Resources:
kubectl top nodes
Verify the available CPU and memory on the nodes.
- Check Resources Quotas (if any):
kubectl get resourcequotas
Check Cluster Autoscaler: Ensure the cluster autoscaler is enabled and configured correctly:
az aks show -g <nameRersourceGroup> -n <nameAKSCluster> --query "agentPoolProfiles[].enableAutoScaling"
Resolve the Issue:
Scale Up the Cluster: If the cluster autoscaler is not enabled or not sufficient, maybe manually scale up the cluster is the solution:
az aks scale -g <nameRersourceGroup> -n <nameAKSCluster> --node-count <new-node-count>
Adjust Resource Requests: Modify the deployment YAML file to request fewer resources:
resources: requests: memory: "1Gi" cpu: "1" limits: memory: "1Gi" cpu: "1"
Reapply the Deployment:
kubectl apply -f nginx-deployment.yaml
Verify the Resolution:
Check Pod Status Again:
kubectl get pods
Describe the Pod:
kubectl describe pod <pod-name>
Ensure there are no error messages and the pod is running.
Check Node Resources:
kubectl top nodes
Verify that the nodes have sufficient resources and the pod is running smoothly.
📌Note - At the end of each hands-on Lab, always clean up all resources previously created to avoid being charged.
Congratulations — you have completed this hands-on lab covering the basics of Troubleshooting an AKS Pod failing to start due to insufficient resources.
Thank you for reading. I hope you understood and learned something helpful from my blog.
Please follow me on Cloud&DevOpsLearn and LinkedIn, franciscojblsouza