Top 100 Kubernetes Interview Questions and Answers

2025-06-08 1711 words 9 minutes

Contents

This post covers 100 interview questions and answers for Kubernetes, ideal for DevOps engineers.

🚨 Critical & Frequently Asked Questions

1. What is the difference between the control plane and worker nodes?

Control Plane: Runs the Kubernetes API server, controller manager, scheduler, and etcd (the cluster’s key-value store).
Worker Nodes: Run user workloads (pods), managed by components like kubelet and kube-proxy.

2. What is RBAC in Kubernetes? Role-Based Access Control (RBAC) regulates access to cluster resources by defining what actions (verbs) users or service accounts can perform on which resources.

3. How do you stop communication between pods of different microservices? Use NetworkPolicies to define allowed ingress and egress traffic using labels, namespaces, and IP blocks.

4. What happens if the control plane fails? Running pods continue working, but new pod scheduling, scaling, and management won’t happen until control plane components are restored.

5. What is the difference between a Deployment, StatefulSet, and DaemonSet?

Deployment: Manages stateless pods and rolling updates.
StatefulSet: For stateful apps with persistent volumes and stable network IDs.
DaemonSet: Ensures a pod runs on all (or selected) nodes.

6. What’s the difference between a Liveness and a Readiness probe?

Liveness Probe: Checks if the app is alive. Failed checks trigger a restart.
Readiness Probe: Checks if the app is ready to serve traffic. Failing this removes the pod from the service endpoints.

7. How do you move running pods from one node to another (e.g., during upgrade)?

This evicts pods, allowing the scheduler to place them on other healthy nodes.

8. What is a Service and how does it route traffic to pods? A Service in Kubernetes is an abstraction that exposes a set of pods as a network service. It uses label selectors to forward traffic to healthy pods.

9. What is a headless service and when is it used? A headless service (with clusterIP: None) enables direct access to individual pods, commonly used in StatefulSets or when clients handle load balancing.

10. How does Ingress differ from a LoadBalancer Service?

Ingress: A Kubernetes object that manages external HTTP/S routing to services.
LoadBalancer: Exposes the service externally using a cloud provider’s load balancer.

11. How does Horizontal Pod Autoscaler (HPA) work? It automatically scales pod replicas based on observed metrics like CPU or custom metrics. It requires the metrics server to be deployed.

12. What is kube-proxy? kube-proxy manages networking rules on nodes (via iptables/ipvs) to route traffic to appropriate pods behind a service.

13. What are the common ways to upgrade a Kubernetes cluster?

For managed clusters: Use cloud CLI (e.g., az aks upgrade, eksctl, gcloud).
For self-managed: Use kubeadm upgrade and drain/cordon nodes manually.

14. What are taints and tolerations? Taints repel pods from certain nodes unless they have matching tolerations. Useful for reserving nodes for specific workloads.

15. How do init containers and sidecar containers differ?

Init: Runs before the main container, used for setup/preparation.
Sidecar: Runs alongside the main container, often used for logging, syncing, or proxying.

16. What is a PersistentVolumeClaim (PVC)? A PVC allows users to request storage from a PersistentVolume (PV), which may be dynamically provisioned via a StorageClass.

17. How does Prometheus discover targets in Kubernetes? It uses Kubernetes service discovery APIs to find pods and services to scrape metrics from, based on labels and annotations.

18. What is a StatefulSet and why do pods have stable hostnames? StatefulSet ensures pods have stable identities and persistent storage, critical for databases and distributed apps.

19. How do you expose a Kubernetes service to the public internet? Use a type: LoadBalancer service or expose an Ingress through a cloud provider or NGINX Ingress controller.

20. What is a ClusterRoleBinding? A ClusterRoleBinding grants the permissions defined in a ClusterRole to a user, group, or service account across the entire cluster.

🛠️ Operational, Troubleshooting & Edge Questions (Grouped by Category)

🔒 Security & Access Control

21. What is the purpose of NetworkPolicies? They define which pods can communicate with which other pods, based on labels, namespaces, or IP blocks. Used to isolate workloads.

22. What is the difference between Kubernetes RBAC and GCP IAM? Kubernetes RBAC controls in-cluster permissions; GCP IAM manages access to Google Cloud services. GKE uses both.

23. How can you physically restrict app communication in Kubernetes? Use NetworkPolicies, PodSecurityAdmission, node affinity/taints, and firewall rules.

24. What are the security best practices for Kubernetes clusters?

Enable RBAC
Use PodSecurity standards
Scan images
Encrypt secrets
Apply least-privilege policies

🌐 Networking

25. How do Services connect to pods? Using selectors and kube-proxy, the Service forwards traffic to pods matching the label.

26. How can pods in different namespaces communicate? Use the full DNS: service-name.namespace.svc.cluster.local

27. What is kube-dns or CoreDNS? Internal DNS service that resolves pod and service names into IP addresses within the cluster.

28. What are the types of Kubernetes Services?

ClusterIP (default)
NodePort
LoadBalancer
ExternalName

⚙️ Workloads & Scheduling

29. How do you ensure pods run on different nodes? Use podAntiAffinity or topologySpreadConstraints to spread pods across zones or nodes.

30. What is node affinity vs node selector?

Node selector: Simple label match.
Node affinity: Advanced constraints (requiredDuringSchedulingIgnoredDuringExecution).

31. What happens when a node crashes? The control plane detects it and reschedules pods on other healthy nodes if resources are available.

32. How do taints and tolerations affect scheduling? Taints prevent pods from being scheduled unless they have matching tolerations.

33. Can you schedule pods without a scheduler? Yes, by setting the nodeName field directly in the pod spec (manual placement).

📦 Storage & Volumes

34. What’s the difference between static and dynamic provisioning?

Static: Admin pre-creates PersistentVolumes (PVs).
Dynamic: Kubernetes creates PVs on demand using a StorageClass.

35. How are PVCs handled in StatefulSets vs Deployments?

StatefulSet: Each pod gets its own PVC.
Deployment: May share PVCs or use ephemeral volumes.

📈 Monitoring & Observability

36. How does Prometheus collect metrics in Kubernetes? Via service discovery, it scrapes /metrics endpoints from pods or services using label selectors.

37. What is a ServiceMonitor? A CRD from Prometheus Operator that defines how Prometheus should scrape a specific Kubernetes service.

38. What is the Prometheus Pushgateway? It allows short-lived jobs to push metrics to Prometheus for scraping later.

39. What is federated Prometheus? A setup where multiple Prometheus instances scrape subsets of data, and one top-level instance aggregates them.

🔄 CI/CD & GitOps

40. How does ArgoCD connect to a cluster? Use argocd cluster add <context> to register a cluster so ArgoCD can deploy apps to it.

41. What is the App of Apps pattern in ArgoCD? A pattern where one ArgoCD app manages multiple child applications, enabling modular and hierarchical deployment.

42. How do you migrate workloads from one cluster to another?

Reapply manifests
Move PVC data via backup or snapshot
Update DNS or traffic routing

43. What is the full CI/CD flow for deploying to AKS from Azure DevOps? Code → Pipeline → Container Build → Push to ACR → Deploy using Helm or manifests to AKS

☁️ Cloud Provider Specific (AKS, EKS, GKE)

44. What container runtime is used in AKS? AKS uses containerd as the default container runtime.

45. How do you upgrade a Kubernetes cluster in AKS? Use the Azure CLI: az aks upgrade --resource-group <RG> --name <cluster-name> --control-plane-only

46. How do you enable node auto-upgrades in AKS? Use: az aks nodepool update --enable-auto-upgrade

47. How do you deploy ArgoCD in AKS? Apply the official ArgoCD manifests or Helm chart. Set up RBAC and expose the UI via a LoadBalancer or Ingress.

48. What are Node Pools or Node Groups in managed Kubernetes? They represent collections of nodes with similar configuration, allowing scaling and upgrade at the group level.

📊 Scaling & Performance

49. What happens if the HPA isn’t sufficient under high load? Increase maxReplicas, or configure Cluster Autoscaler to add more nodes. Optimize resource requests/limits.

50. What if all node pools are full and traffic increases? Scale up existing node pools or add new ones. Ensure Cluster Autoscaler is enabled.

51. What is the difference between HPA and VPA?

HPA: Scales pod replicas based on CPU/memory.
VPA: Adjusts container resource requests/limits automatically.

52. What is a 504 Gateway Timeout error in Kubernetes? Often caused by Ingress timeout, service misrouting, or readiness probes failing. Troubleshoot with kubectl logs, describe, and Ingress annotations.

🧠 Advanced Design & Troubleshooting

53. What is the complete pod scheduling lifecycle?

API server receives the Pod spec.
Scheduler selects a node.
Kubelet on the node pulls the image.
Pod is started, and health checks begin.

54. What happens when multiple users update cluster state simultaneously? The last change applied wins. Best practices involve GitOps, validation pipelines, or admission controllers.

55. What if the scheduler is not running? No new pods are scheduled. Existing pods stay unaffected until rescheduling is needed.

56. How do you connect two containers in one pod? They share localhost and volumes. Communicate over ports or file system paths.

57. How do you restrict DB access to a specific app? Use NetworkPolicies to allow ingress only from pods with a specific label (e.g., app=my-app).

58. How do you connect to a pod without using a Service? Use kubectl port-forward <pod> <localPort>:<containerPort> or access via pod IP inside the cluster.

59. How do you restart a deployment correctly? Use: kubectl rollout restart deployment/<name>

60. How do you update a running container image? Build and push a new Docker image. Update the deployment spec to use the new tag. Avoid committing changes to a running container.

95. What’s the complete path from DNS to Pod IP? DNS → External LB → Ingress → Service → Endpoint → Pod IP

96. What is the difference between ClusterIP and Headless Service?

ClusterIP: Default; load-balancing
Headless: No cluster IP; DNS resolves to individual pod IPs

97. How do you implement a private GKE endpoint? Use private clusters, Master Authorized Networks, and VPN or internal load balancers.

98. What’s the difference between Node Affinity and Pod Affinity?

Node Affinity: Schedule pods to specific nodes
Pod Affinity: Co-locate pods based on labels

99. What are common metrics to monitor in Kubernetes?

CPU/memory usage
Pod restarts
Node health
Service response time

100. What happens when you run out of cluster resources? New pods stay in Pending. Use Cluster Autoscaler or scale down less critical workloads.