Kubernetes is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications.
// Example: Deploying a simple nginx pod in Kubernetes apiVersion: v1 kind: Pod metadata: name: nginx-pod spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80
Developed by Google based on their internal system Borg, Kubernetes was open sourced in 2014 and is now maintained by the CNCF community.
Timeline: - 2014: Kubernetes 1.0 released - 2015: CNCF formed and Kubernetes donated - Continuous growth with major cloud provider support
Kubernetes offers declarative configurations, self-healing, and advanced scheduling compared to simpler, script-based orchestration methods.
Traditional: - Manual scripts - Limited scaling & recovery Kubernetes: - Automated scheduling - Rolling updates & self-healing
A Pod is the smallest deployable unit, a Node is a worker machine, and a Cluster is a set of nodes managed together.
Definitions: - Pod: One or more containers sharing network/storage - Node: Physical or virtual machine - Cluster: Collection of nodes managed by Kubernetes
Kubernetes follows a master-worker architecture with a control plane managing nodes and workloads running on worker nodes.
Architecture: - Control Plane (Master): API Server, Scheduler, Controller Manager, Etcd - Worker Nodes: Kubelet, kube-proxy, container runtime
The master manages cluster state; worker nodes run pods and containers.
Master components: - API Server - Scheduler - Controller Manager - Etcd Worker components: - Kubelet - Kube-proxy - Container runtime (Docker, containerd)
The API Server is the front-end that exposes the Kubernetes API and serves as the cluster’s main control point.
kubectl command interacts with API Server like: kubectl get pods kubectl apply -f deployment.yaml
Etcd stores all cluster data, including configuration and state, and is critical for high availability and consistency.
// Etcd is a distributed, consistent key-value store etcdctl get /registry/pods/nginx-pod
The scheduler assigns pods to nodes based on resource availability, while the controller manager ensures the cluster state matches the desired configuration.
Scheduler decides: - Which node a pod runs on Controller Manager handles: - Node lifecycle - Replication controller - Endpoint management
Kubernetes networking provides each pod with a unique IP, and services to allow communication inside and outside the cluster.
// Sample Service YAML apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: ClusterIP
Storage in Kubernetes is handled via Volumes, PersistentVolumes, and PersistentVolumeClaims to manage stateful applications.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-example spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
Kubernetes uses declarative resource definitions in YAML/JSON files describing the desired state of objects like Pods, Services, and Deployments.
Example resource YAML snippet: apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.19 ports: - containerPort: 80
Kubernetes uses YAML files to define resources declaratively. These files are applied using the kubectl CLI.
kubectl apply -f deployment.yaml # Sample deployment.yaml file defines pods, replicas, containers, etc.
Minikube and Kind allow you to run Kubernetes locally for learning and development.
// Minikube install and start minikube start // Kind create cluster kind create cluster
kubectl is the command-line tool to interact with Kubernetes clusters for deploying and managing resources.
// Common kubectl commands kubectl get pods kubectl describe svc nginx-service kubectl delete pod nginx-pod
The master components control the cluster, including API Server, Scheduler, Controller Manager, and Etcd as the data store.
Master components overview: - API Server: main cluster interface - Scheduler: assigns pods to nodes - Controller Manager: maintains cluster state - Etcd: key-value store for configs and state
Worker nodes run the containers and have components like Kubelet, kube-proxy, and the container runtime.
Worker components: - Kubelet: manages pods on node - Kube-proxy: network proxy for services - Container runtime: runs containers (Docker, containerd)
The control plane manages the cluster's desired state; the data plane runs the actual workloads (pods) on worker nodes.
Control Plane: - API Server, Scheduler, Controllers Data Plane: - Nodes running pods and containers
Nodes are grouped into node pools, allowing different machine types or configurations within the same cluster.
// Example GKE node pools: gcloud container node-pools create pool1 --cluster=my-cluster --machine-type=n1-standard-1
Nodes register with the master and periodically send heartbeats. The master monitors node health and takes action if nodes fail.
// Check node status kubectl get nodes // Node heartbeat example NodeStatus: Ready
Pods go through lifecycle phases: Pending, Running, Succeeded, Failed, or Unknown. The scheduler assigns pods to nodes based on constraints.
Pod phases: - Pending - Running - Succeeded - Failed Scheduler assigns pods based on resource availability.
Kubernetes requires every pod to have a unique IP and supports flat networking where pods communicate transparently.
// Network plugin example: Calico, Flannel kubectl apply -f calico.yaml
Services provide stable DNS names and IPs for accessing pods. CoreDNS handles name resolution inside the cluster.
// Example Service DNS my-service.my-namespace.svc.cluster.local
CoreDNS runs as a Kubernetes service to provide DNS resolution for service names and pods within the cluster.
// Check CoreDNS pods kubectl get pods -n kube-system -l k8s-app=kube-dns
The CRI abstracts container runtimes so Kubernetes can use different engines like Docker or containerd interchangeably.
// Check container runtime on node kubectl get node-o jsonpath='{.status.nodeInfo.containerRuntimeVersion}'
kube-proxy manages virtual IPs for services and routes traffic. Networking plugins provide pod networking and policies.
// kube-proxy runs on nodes to handle traffic routing kubectl get pods -n kube-system -l k8s-app=kube-proxy
HA setups run multiple master nodes, etcd members, and use load balancers to avoid single points of failure.
// HA master nodes run in multi-node clusters # Example: kubeadm HA setup with stacked etcd
Autoscalers dynamically adjust node count based on resource usage to optimize cost and availability.
// Example: Enable cluster autoscaler in GKE gcloud container clusters update my-cluster --enable-autoscaling --min-nodes=1 --max-nodes=5 --node-pool=default-pool
Managing multiple Kubernetes clusters across regions or cloud providers enables high availability and disaster recovery.
// Tools: Rancher, Google Anthos, Azure Arc
Use Prometheus, Grafana, and Kubernetes metrics API to monitor resource usage, pod status, and cluster performance.
// Sample Prometheus query: kube_pod_status_phase{phase="Running"}
A Pod is the smallest deployable unit in Kubernetes, representing one or more containers running together on a node.
apiVersion: v1 kind: Pod metadata: name: example-pod spec: containers: - name: nginx image: nginx:latest
Pods go through phases: Pending, Running, Succeeded, Failed, and Unknown.
// Check pod status: kubectl get pods example-pod kubectl describe pod example-pod
Pods can have multiple containers that share storage/networking and coordinate closely.
spec: containers: - name: app-container image: myapp:latest - name: sidecar-container image: log-collector:latest
Init containers run before app containers to perform setup tasks.
spec: initContainers: - name: init-db image: busybox command: ['sh', '-c', 'setup-db.sh'] containers: - name: app image: myapp:latest
Pod specs define the pod configuration; templates are used in controllers like Deployments.
apiVersion: apps/v1 kind: Deployment spec: template: # pod template spec: containers: - name: app image: myapp:latest
Each pod gets its own IP; containers communicate via localhost and network namespaces.
// Pods communicate using pod IP addresses directly. // Kubernetes provides flat networking within cluster.
Defines security options like user IDs and capabilities for pods or containers.
spec: securityContext: runAsUser: 1000 runAsGroup: 3000 containers: - name: app image: myapp:latest
Specify resource requests and limits to control pod resource usage.
resources: requests: memory: "128Mi" cpu: "250m" limits: memory: "256Mi" cpu: "500m"
Configure scheduling preferences like node affinity and anti-affinity for pod placement.
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: - ssd
Defines the number of pods that can be unavailable during maintenance or disruptions.
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: pdb-example spec: minAvailable: 2 selector: matchLabels: app: myapp
Labels organize pods for selection; annotations store metadata.
metadata: labels: app: myapp annotations: description: "Pod running myapp version 1.0"
Use kubectl commands to inspect pod status, logs, and events for troubleshooting.
kubectl describe pod example-pod kubectl logs example-pod kubectl exec -it example-pod -- /bin/sh
View container logs and pod-related events to diagnose issues.
kubectl logs example-pod kubectl get events --field-selector involvedObject.name=example-pod
Use sidecar containers to provide auxiliary services like logging or proxies alongside main containers.
spec: containers: - name: main-app image: myapp:latest - name: sidecar-logger image: log-collector:latest
Keep pods small, use resource limits, label well, and monitor continuously.
// Tips: // - Use liveness/readiness probes // - Avoid running as root // - Use ConfigMaps and Secrets for config
A Service exposes a set of Pods as a network service, providing stable IPs and DNS names.
apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app: myapp ports: - protocol: TCP port: 80 targetPort: 8080
Different service types expose pods inside or outside the cluster with varying accessibility.
// ClusterIP (default) — internal access only spec: type: ClusterIP // NodePort — exposes service on node port externally spec: type: NodePort ports: - port: 80 nodePort: 30007 // LoadBalancer — external cloud load balancer spec: type: LoadBalancer
Services without a ClusterIP, useful for direct pod access or StatefulSets.
spec: clusterIP: None selector: app: myapp ports: - port: 80
Maps service to an external DNS name without proxying traffic through Kubernetes.
apiVersion: v1 kind: Service metadata: name: external-service spec: type: ExternalName externalName: example.com
Kubernetes DNS resolves services to their cluster IP or endpoints for discovery.
// Pods can access service by DNS name, e.g. my-service.default.svc.cluster.local
Services select pods by labels; endpoints represent actual pod IPs behind the service.
// Service selector example selector: app: myapp // View endpoints: kubectl get endpoints my-service
Kubernetes provides DNS service inside clusters for easy service name resolution.
// DNS example: curl http://my-service.default.svc.cluster.local
Ingress manages external HTTP/HTTPS access routing to services.
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-ingress spec: rules: - host: example.com http: paths: - path: / pathType: Prefix backend: service: name: my-service port: number: 80
Services balance network traffic across pods for high availability.
// ClusterIP service load balances TCP/UDP requests internally automatically
Service meshes add features like traffic routing, retries, and security on top of services.
// Example: Istio injects sidecars to control traffic policies and observability
Create services to expose pods internally or externally as needed.
kubectl expose deployment myapp --type=LoadBalancer --name=my-service --port=80 --target-port=8080
Use kubectl commands to check service status, endpoints, and troubleshoot connectivity.
kubectl describe svc my-service kubectl get endpoints my-service kubectl logs pod-name
Network Policies restrict pod-to-pod or pod-to-service traffic for security.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-svc-traffic spec: podSelector: matchLabels: app: myapp ingress: - from: - podSelector: matchLabels: role: frontend
Monitor services using Prometheus or other tools to track availability and performance.
// Use metrics-server, Prometheus exporters, or cloud provider monitoring
Use appropriate service types, label selectors carefully, monitor service health, and secure access.
// Keep services lean and well-labeled // Use readiness probes to control pod availability // Regularly audit exposed services
A Deployment in Kubernetes manages the lifecycle of Pods and ReplicaSets, enabling declarative updates to applications.
# Simple Deployment YAML snippet apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.21
The deployment spec defines desired state including replicas, pod template, selectors, and strategy.
# Key fields in Deployment spec spec: replicas: 3 # Number of pod replicas selector: # Selector for pods matchLabels: app: nginx template: # Pod template spec metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.21 strategy: # Update strategy (RollingUpdate or Recreate) type: RollingUpdate
Deployments perform rolling updates to avoid downtime. If something goes wrong, you can rollback to a previous version.
// Rollout update command kubectl set image deployment/nginx-deployment nginx=nginx:1.22 // Rollback to previous revision kubectl rollout undo deployment/nginx-deployment
RollingUpdate (default) gradually replaces pods. Recreate kills old pods before creating new ones.
strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 maxSurge: 1
ReplicaSets ensure a specified number of pod replicas are running at all times.
// Get ReplicaSets for a deployment kubectl get rs -l app=nginx // Scale ReplicaSet directly (not recommended) kubectl scale rs nginx-deployment-xxxxx --replicas=5
You can scale the number of replicas manually or with autoscaling.
// Manual scale kubectl scale deployment nginx-deployment --replicas=5
Kubernetes stores revisions of deployments to support rollbacks.
// Check rollout history kubectl rollout history deployment/nginx-deployment
Pause a deployment to make multiple changes, then resume to apply updates.
kubectl rollout pause deployment/nginx-deployment kubectl rollout resume deployment/nginx-deployment
Deploy a new version to a small subset of users before a full rollout to test changes safely.
// Example: Create a deployment with fewer replicas of new version alongside stable version kubectl apply -f canary-deployment.yaml
Maintain two separate environments (blue and green) and switch traffic between them to minimize downtime.
// Create blue and green deployments and update service selector accordingly kubectl apply -f blue-deployment.yaml kubectl apply -f green-deployment.yaml
Annotations store metadata and can be used for custom tracking or tooling integrations.
metadata: annotations: deployment.kubernetes.io/revision: "2"
Monitor deployment progress using rollout status command.
kubectl rollout status deployment/nginx-deployment
Automatically scale pods based on CPU or custom metrics using Horizontal Pod Autoscaler (HPA).
kubectl autoscale deployment nginx-deployment --min=2 --max=10 --cpu-percent=80
Use logs, describe commands, and events to troubleshoot deployment issues.
kubectl describe deployment nginx-deployment kubectl logs deployment/nginx-deployment kubectl get events
Use rolling updates, monitor health probes, limit max surge/unavailable, and enable autoscaling for reliability.
// Example rollingUpdate strategy strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 0
ConfigMaps store non-confidential configuration data as key-value pairs to be consumed by pods.
apiVersion: v1 kind: ConfigMap metadata: name: example-config data: APP_COLOR: "blue" LOG_LEVEL: "debug"
Create ConfigMaps from files, directories, or literals and mount them as env variables or volumes.
// Create ConfigMap from literal kubectl create configmap example-config --from-literal=APP_COLOR=blue // Use in pod environment env: - name: APP_COLOR valueFrom: configMapKeyRef: name: example-config key: APP_COLOR
You can mount ConfigMaps as files inside pods.
volumes: - name: config-volume configMap: name: example-config containers: - name: app volumeMounts: - name: config-volume mountPath: /etc/config
ConfigMaps can inject configuration data as environment variables.
env: - name: LOG_LEVEL valueFrom: configMapKeyRef: name: example-config key: LOG_LEVEL
Secrets store sensitive data such as passwords, tokens, and keys, encoded in base64.
apiVersion: v1 kind: Secret metadata: name: example-secret type: Opaque data: password: cGFzc3dvcmQ= # base64 encoded "password"
Create secrets from files or literals and use them as environment variables or volumes.
// Create secret from literal kubectl create secret generic example-secret --from-literal=password=password // Use in pod env env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: example-secret key: password
Opaque is generic, TLS stores certificates, Docker-registry holds container registry credentials.
# TLS secret example kubectl create secret tls tls-secret --cert=cert.pem --key=key.pem
Secrets can be mounted as files or environment variables to secure application data.
volumes: - name: secret-volume secret: secretName: example-secret containers: - name: app volumeMounts: - name: secret-volume mountPath: /etc/secret
Integrate with Vault or AWS Secrets Manager for advanced secret management.
// Vault integration example: // Configure Vault Agent to inject secrets as files or env variables
Kubernetes supports encrypting secrets at rest in etcd to enhance security.
// Enable encryption in kube-apiserver config: // encryptionProviders: // - aescbc: // keys: // - name: key1 // secret:
Use RBAC to restrict who can read or modify secrets.
kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: namespace: default name: secret-reader rules: - apiGroups: [""] resources: ["secrets"] verbs: ["get", "list"]
Periodically update secrets to reduce risk from leaked credentials.
// Update secret kubectl create secret generic example-secret --from-literal=password=newpassword --dry-run=client -o yaml | kubectl apply -f -
Use kubectl describe and logs to diagnose issues with config or secret mounting.
kubectl describe configmap example-config kubectl describe secret example-secret kubectl logs
Keep secrets encrypted and access-restricted, separate config from code, and automate rotation.
// Example best practice: // Use external vaults for production secrets // Avoid hardcoding secrets in manifests
Popular vault solutions include HashiCorp Vault, AWS Secrets Manager, and Azure Key Vault to securely store and inject secrets.
// Use Vault CSI driver to mount secrets as volumes in pods // Configure Kubernetes auth method for Vault access
Kubernetes abstracts storage with volumes that persist beyond container life, ensuring data durability.
# Volumes provide data storage accessible by containers # They can be ephemeral or persistent depending on use case
PVs are cluster resources representing actual storage (e.g., disks).
apiVersion: v1 kind: PersistentVolume metadata: name: pv-example spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: "/mnt/data"
PVCs are requests for storage by users, binding to matching PVs.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-example spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
StorageClasses define types of storage and allow automatic provisioning of PVs.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-storage provisioner: kubernetes.io/aws-ebs parameters: type: gp2
Plugins allow integration with different storage backends: cloud disks, network storage, local volumes.
# Examples: # awsElasticBlockStore, gcePersistentDisk, nfs, hostPath, cephfs, iscsi, etc.
HostPath mounts a file or directory from the node; EmptyDir is ephemeral storage created for a pod's lifetime.
apiVersion: v1 kind: Pod metadata: name: hostpath-pod spec: containers: - name: container image: nginx volumeMounts: - mountPath: /data name: host-volume volumes: - name: host-volume hostPath: path: /mnt/data type: Directory
NFS allows shared storage accessible by multiple pods across nodes.
volumes: - name: nfs-volume nfs: server: nfs.example.com path: /exports/data
Cloud-specific storage volumes integrate with Kubernetes via plugins and StorageClasses.
# Example AWS EBS volume in pod spec volumes: - name: ebs-volume awsElasticBlockStore: volumeID: vol-0abcd1234efgh5678 fsType: ext4
StatefulSets use persistent storage with stable identities, often via PVC templates.
apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 5Gi
Access modes define how volumes can be used: ReadWriteOnce, ReadOnlyMany, ReadWriteMany.
accessModes: - ReadWriteOnce # Mounted by a single node as read-write - ReadOnlyMany # Mounted read-only by many nodes - ReadWriteMany # Mounted read-write by many nodes
Snapshots capture volume state; backup solutions protect data from loss.
# Snapshot example with CSI drivers: apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: snapshot-example spec: volumeSnapshotClassName: csi-snapclass source: persistentVolumeClaimName: pvc-example
Encrypt data at rest and in transit, enforce RBAC for storage resources.
# Use cloud provider encryption options or third-party tools # Control access with Kubernetes RBAC on PVC/PV objects
Limit storage consumption per namespace using ResourceQuota objects.
apiVersion: v1 kind: ResourceQuota metadata: name: storage-quota spec: hard: requests.storage: 50Gi
Check pod events, PVC/PV status, CSI driver logs, and node health to troubleshoot storage problems.
kubectl describe pvc pvc-example kubectl get events --namespace=my-namespace journalctl -u kubelet
Use dynamic provisioning, clean up unused PVs, monitor storage health, and enforce access policies.
# Always use StorageClasses to avoid manual PV management # Monitor volume usage and expand PVCs when needed # Secure storage with encryption and RBAC
A StatefulSet manages stateful applications, providing stable network IDs and persistent storage per pod.
# StatefulSet example manages pods with unique identities apiVersion: apps/v1 kind: StatefulSet metadata: name: mysql spec: serviceName: "mysql" replicas: 3 selector: matchLabels: app: mysql
Deployments manage stateless apps with interchangeable pods; StatefulSets manage ordered, unique pods with persistent state.
# Deployments scale stateless replicas # StatefulSets guarantee stable pod names and storage
Databases, queues, and other services requiring stable identities and persistent storage.
# Examples: # MySQL, Cassandra, Kafka, Elasticsearch
Use volumeClaimTemplates to provision PVCs for each pod automatically.
spec: volumeClaimTemplates: - metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 10Gi
Scale pods one at a time in order, ensuring data consistency.
kubectl scale statefulset mysql --replicas=5 # Pods start with stable names: mysql-0, mysql-1, ...
Updates happen sequentially, one pod at a time, respecting pod ordering.
kubectl rollout restart statefulset mysql
A DaemonSet ensures one pod runs on each (or selected) node for tasks like monitoring or logging.
apiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter spec: selector: matchLabels: app: node-exporter template: metadata: labels: app: node-exporter spec: containers: - name: node-exporter image: prom/node-exporter
Node-level agents: log collectors, monitoring, network proxies.
# Examples: # Fluentd, Prometheus Node Exporter, Calico agents
Update DaemonSets carefully as pods run on all nodes and can affect cluster stability.
kubectl rollout status daemonset node-exporter kubectl delete daemonset node-exporter
DaemonSets update pods one at a time or in batches controlled via updateStrategy.
spec: updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1
Control which nodes run DaemonSet pods using node selectors and affinity rules.
spec: template: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/worker operator: In values: - "true"
Check pod status, events, logs, and describe pods for troubleshooting.
kubectl get pods -l app=mysql kubectl describe pod mysql-0 kubectl logs mysql-0
Use appropriate update strategies, monitor resource usage, and apply node affinity thoughtfully.
# For StatefulSets: # - Avoid scaling down without draining data # For DaemonSets: # - Limit resource use to avoid node overload # - Use tolerations for scheduling on tainted nodes
Headless Services provide stable network identities for StatefulSet pods.
apiVersion: v1 kind: Service metadata: name: mysql spec: clusterIP: None selector: app: mysql
StatefulSets for databases, DaemonSets for monitoring and logging across nodes.
# Example: # StatefulSet runs Cassandra cluster nodes # DaemonSet runs node monitoring agents on each node
A Kubernetes Job creates one or more pods and ensures that a specified number of them successfully terminate.
# Basic job YAML example apiVersion: batch/v1 kind: Job metadata: name: example-job spec: template: spec: containers: - name: hello image: busybox command: ["echo", "Hello World"] restartPolicy: Never backoffLimit: 4
Create jobs using kubectl apply and manage them with kubectl commands.
# Create job from yaml file kubectl apply -f job.yaml # Check job status kubectl get jobs # Delete job kubectl delete job example-job
Kubernetes tracks job completions and can retry failed pods up to backoffLimit times.
# backoffLimit: number of retries before marking Job as failed # status.conditions indicates success or failure kubectl describe job example-job
Configure Jobs to run multiple pods in parallel using completions and parallelism fields.
spec: completions: 5 # total pods to complete parallelism: 2 # pods run concurrently
CronJobs schedule Jobs to run periodically at fixed times, like Linux cron.
apiVersion: batch/v1 kind: CronJob metadata: name: example-cronjob spec: schedule: "*/5 * * * *" # runs every 5 minutes jobTemplate: spec: template: spec: containers: - name: hello image: busybox command: ["echo", "Hello from CronJob"] restartPolicy: OnFailure backoffLimit: 3
Use standard cron format for schedule field (minute, hour, day of month, month, day of week).
# Examples: "0 0 * * *" # every day at midnight "0 */6 * * *" # every 6 hours "15 14 1 * *" # 2:15 PM on the first day of each month
CronJobs run according to the time zone of the kube-controller-manager (usually UTC). For time zone adjustments, specify in schedule or container logic.
# No native timezone support; handle with: # - Adjusted cron expression for UTC offset # - Timezone logic inside container commands/scripts
Configure how many successful and failed jobs to keep with successfulJobsHistoryLimit and failedJobsHistoryLimit.
spec: successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1
Use kubectl logs and describe to diagnose job failures.
# Get logs from job pods kubectl logs job/example-job # Describe job for event info kubectl describe job example-job # List pods with label selector from job kubectl get pods --selector=job-name=example-job
Specify CPU/memory requests and limits in job pod specs for resource allocation.
spec: template: spec: containers: - name: batch image: my-batch-image resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "512Mi" cpu: "1" restartPolicy: OnFailure
Jobs are ideal for batch tasks like ETL, backups, report generation, or image processing.
# Example batch job command command: ["python", "process_data.py"]
Keep jobs idempotent, monitor job status, clean up old jobs, and handle retries properly.
# Tips: # - Use backoffLimit to prevent endless retries # - Use labels for easy job filtering # - Use TTL controller to auto-clean finished jobs
Control retry behavior using backoffLimit and activeDeadlineSeconds.
spec: backoffLimit: 3 # retry count activeDeadlineSeconds: 600 # max job duration in seconds
Use kubectl get jobs and watch job pods to track progress and status.
kubectl get jobs --watch kubectl get pods --selector=job-name=example-job --watch
Common use cases include database backups, batch data import, report generation, email sending.
# Example CronJob for nightly DB backup: schedule: "0 2 * * *" command: ["sh", "-c", "pg_dump mydb > /backup/db_$(date +%F).sql"]
Kubernetes networking assumes all pods can communicate with each other without NAT, following a flat network model.
# Key points: # - Every pod gets its own IP address # - Pods can talk to any other pod directly # - Network plugins implement this model
CNI is a standard interface for configuring network interfaces in Linux containers, used by Kubernetes for pod networking.
# Popular CNIs: Calico, Flannel, Weave Net # CNIs configure IP addressing, routing, and network policies
Plugins provide pod networking, IP management, and enforce network policies.
# Examples: # - Calico: Network policy enforcement + routing # - Flannel: Simple overlay network # - Weave Net: Encrypted networking
Pods communicate using IP addresses assigned by the CNI without NAT or port mapping.
# Each pod has a unique IP address # Pods on same node communicate via local interfaces # Pods on different nodes communicate over overlay networks
Services provide stable IPs and DNS names, proxying traffic to backend pods.
# Service IP is virtual # kube-proxy forwards requests to healthy pods # Supports ClusterIP, NodePort, LoadBalancer types
Network policies restrict which pods can communicate with each other, enhancing security.
# Define ingress/egress rules using labels and ports # Only enforced if network plugin supports policies (e.g., Calico)
Use YAML to define NetworkPolicy resources with selectors and rules.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-nginx spec: podSelector: matchLabels: app: nginx ingress: - from: - podSelector: matchLabels: role: frontend ports: - protocol: TCP port: 80
Apply policies to isolate pods, restrict access, and comply with security requirements.
# Block all traffic by default # Allow only specific pods or namespaces # Enforce rules for multi-tenant clusters
Common tools include ping, traceroute, logs of network plugins, and checking policies.
# Check pod IPs and routes kubectl exec -it podname -- ping otherpodip # Check network plugin logs on nodes journalctl -u calico-node # Check NetworkPolicy logs (if enabled)
Service meshes like Istio provide advanced routing, load balancing, and security on top of Kubernetes networking.
# Inject sidecar proxies (Envoy) into pods # Manage traffic policies without changing app code
Ingress controls external HTTP/S traffic into cluster; Services route internal traffic.
# Ingress resources define rules for host/path routing # Services expose pods internally or externally (NodePort, LoadBalancer)
Use network segmentation, restrict policies, monitor traffic, and keep plugins updated.
# Use separate namespaces with policies # Monitor traffic flows and audit logs # Avoid wide-open network policies
Tools like Prometheus and Grafana help monitor latency, throughput, and packet loss.
# Export metrics from CNI plugin # Set alerts for anomalies
Examples: isolate dev/test from prod, allow only ingress controller to access services, limit database access.
# Example: allow frontend pod to access backend DB pods only on port 5432
Includes IPv6 dual-stack, multi-cluster networking, network encryption, and custom CNI plugins.
# IPv6 and IPv4 dual stack support # Service mesh multi-cluster routing # Network encryption via WireGuard or IPsec
An Ingress is a Kubernetes API object that manages external access to services, typically HTTP, routing traffic to different services based on rules.
# Example: Basic ingress resource YAML apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-ingress spec: rules: - host: example.com http: paths: - path: / pathType: Prefix backend: service: name: my-service port: number: 80
Ingress controllers are pods that implement the Ingress API, handling the routing of requests. Examples: NGINX, Traefik, HAProxy.
# Deploying NGINX Ingress Controller (simplified) kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller.yaml
Ingress resources define routing rules: hosts, paths, services, and TLS settings.
# YAML snippet configuring path-based routing spec: rules: - host: example.com http: paths: - path: /app1 backend: service: name: app1-service port: number: 80 - path: /app2 backend: service: name: app2-service port: number: 80
Ingress can terminate TLS connections, providing HTTPS by specifying TLS secrets with certificates.
# Example TLS config in Ingress spec: tls: - hosts: - example.com secretName: tls-secret
Route requests based on URL paths to different backend services, useful for hosting multiple apps behind one domain.
# Path-based routing example paths: - path: /api backend: service: name: api-service port: number: 80 - path: /web backend: service: name: web-service port: number: 80
Ingress routes traffic based on hostname, allowing multiple domains to be handled by one Ingress.
rules: - host: api.example.com http: paths: - path: / backend: service: name: api-service port: number: 80 - host: www.example.com http: paths: - path: / backend: service: name: web-service port: number: 80
Annotations customize Ingress controller behavior, such as timeouts, rewrites, authentication.
metadata: annotations: nginx.ingress.kubernetes.io/rewrite-target: / nginx.ingress.kubernetes.io/ssl-redirect: "true"
Ingress Controllers often expose a LoadBalancer service to receive traffic from outside the cluster.
# Example service type for Ingress Controller kind: Service apiVersion: v1 metadata: name: ingress-nginx spec: type: LoadBalancer ports: - port: 80 selector: app: ingress-nginx
Use TLS, limit access via IP whitelisting, enable authentication, and keep controller updated.
Check logs of Ingress controller pods, ensure service and endpoints are correct, and verify DNS settings.
Ingress provides Layer 7 routing (HTTP), while Service LoadBalancers provide Layer 4 (TCP/UDP) routing.
Most popular Ingress controller, configurable via annotations and config maps.
Traefik is a modern dynamic ingress with features like automatic cert management and dashboard.
Monitor ingress traffic, error rates, and latency using Prometheus and Grafana with exporter metrics.
Support for canary releases, rate limiting, authentication, and custom error pages via annotations.
Helm is a package manager for Kubernetes that simplifies deployment of complex apps using Helm Charts.
Helm consists of a client, a server component (Tiller in v2), and a repository for charts.
# Install Helm CLI (Linux/macOS) curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash # Verify installation helm version
Charts package Kubernetes manifests with templating for easier configuration and repeatability.
# Create a new chart scaffold helm create my-chart # Directory structure created for templates, values.yaml etc.
# Add a repo helm repo add stable https://charts.helm.sh/stable # Update repo info helm repo update # Search charts helm search repo mysql
# Install a chart release helm install my-release stable/mysql # Upgrade release with new values helm upgrade my-release stable/mysql -f custom-values.yaml
Use values.yaml to configure templates dynamically using Go templating syntax.
# Example template snippet (deployment.yaml) apiVersion: apps/v1 kind: Deployment metadata: name: {{ .Release.Name }} spec: replicas: {{ .Values.replicaCount }} template: spec: containers: - name: app image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
Hooks let you run jobs before/after install, upgrade, or delete to manage lifecycle events.
Define dependencies in Chart.yaml to include other charts as subcharts.
Use plugins like helm-secrets to encrypt sensitive values for secure deployments.
Keep charts modular, reusable, documented, and avoid hardcoded values.
# Render templates locally for debugging helm template my-chart # Dry-run install to see actions without applying helm install my-release my-chart --dry-run --debug
Use Helm in pipelines for automated testing, deployment, and rollback of Kubernetes apps.
Alternatives include Kustomize, Kapp, and Operators — each with different strengths in templating and customization.
Kubernetes security covers securing clusters, workloads, API access, network policies, and runtime defenses.
// Security layers: // - API Server protection // - Authentication & Authorization // - Network segmentation // - Pod & container hardening
Common authentication includes certificates, tokens, OpenID Connect, and service accounts.
// Example: Using client certificates for API access kubectl config set-credentials user --client-certificate=cert.pem --client-key=key.pem
Role-Based Access Control (RBAC) governs who can do what within the cluster.
// Enable RBAC (default in modern Kubernetes) --authorization-mode=RBAC
Roles define permissions in a namespace; ClusterRoles are cluster-wide.
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: default name: pod-reader rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "watch", "list"]
Bind roles to users, groups, or service accounts either namespace-scoped or cluster-wide.
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-pods namespace: default subjects: - kind: User name: jane apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io
Secure with TLS, audit logging, authentication, and authorization checks.
// Start API server with flags: --tls-cert-file=server.crt --tls-private-key-file=server.key --authorization-mode=RBAC --audit-log-path=/var/log/kube-apiserver/audit.log
Control pod communication with Network Policies restricting ingress and egress.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-frontend namespace: default spec: podSelector: matchLabels: role: frontend policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: role: backend
Deprecated but used to enforce pod security constraints like privilege escalation and capabilities.
// Example: Disallow privileged pods apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: restricted spec: privileged: false allowPrivilegeEscalation: false requiredDropCapabilities: - ALL
Specify user IDs, capabilities, and SELinux labels in pod specs to harden pods.
apiVersion: v1 kind: Pod metadata: name: secure-pod spec: securityContext: runAsUser: 1000 runAsGroup: 3000 fsGroup: 2000 containers: - name: app image: myapp securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL
Store sensitive info in Secrets; enable encryption at rest for Secrets.
// Create a secret kubectl create secret generic db-password --from-literal=password='s3cr3t' // Enable encryption in etcd via encryption config
Audit all API requests for security and compliance purposes.
// Configure audit policy yaml and start kube-apiserver with --audit-policy-file flag
Scan container images for vulnerabilities before deployment using tools like Trivy or Clair.
// Scan image with Trivy trivy image myapp:latest
Popular tools: kube-bench, kube-hunter, Falco, OPA Gatekeeper.
// Run kube-bench to check CIS Kubernetes benchmarks kube-bench
Regularly update images, apply patches, and monitor CVEs affecting cluster components.
// Automate scanning and patching in CI/CD pipelines
Follow least privilege principle, limit host access, use namespaces, enable logging and monitoring.
// Summary: // - Use RBAC and limit permissions // - Encrypt secrets and enable audit logs // - Use network policies and pod security contexts // - Scan images and monitor clusters continuously
Monitoring ensures cluster health, performance, and alerts for anomalies.
// Monitor resource usage, availability, and events // Collect metrics from nodes, pods, and control plane
A lightweight aggregator for resource metrics used for autoscaling and monitoring.
// Deploy metrics-server kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml // Check metrics kubectl top nodes kubectl top pods
Prometheus collects time-series metrics and supports powerful queries.
// Deploy Prometheus via Helm chart helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus prometheus-community/prometheus // Access metrics at Prometheus server UI
Alertmanager manages alerts sent by Prometheus, supports grouping, silencing, and routing.
// Configure Alertmanager.yaml with alert rules and receivers // Integrate with Slack, email, PagerDuty, etc.
Grafana visualizes metrics from Prometheus and other sources in customizable dashboards.
// Deploy Grafana and add Prometheus as data source // Import Kubernetes monitoring dashboards
Logs are collected at node and cluster levels using agents and stored centrally.
// Use Fluentd or Fluent Bit as log collectors running as DaemonSets
Fluentd collects, transforms, and forwards logs to storage backends like Elasticsearch.
// Deploy Fluentd DaemonSet with configuration for log forwarding
Popular logging stack for searching, visualizing, and analyzing logs.
// Deploy Elasticsearch, Fluentd, Kibana in cluster // Kibana UI for log search and dashboards
Ensure logs are structured, searchable, and protected with retention policies.
// Use JSON format logs and index management in Elasticsearch
Tracing helps follow requests through microservices for debugging performance issues.
// Use tools like Jaeger or Zipkin integrated with Kubernetes workloads
Events provide real-time cluster notifications, metrics track resource usage and health.
// View events kubectl get events --all-namespaces
Track health and metrics of stateful and daemon workloads to ensure stability.
// Use Prometheus exporters and pod metrics for StatefulSets/DaemonSets
Combine logs, metrics, and events for comprehensive troubleshooting.
// Use kubectl logs and Prometheus queries together
Horizontal Pod Autoscaler scales pods based on CPU or custom metrics from metrics server or Prometheus.
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70
Common tools: Metrics Server (basic), Prometheus + Grafana (advanced), commercial SaaS options.
// Choose tools based on scale, customization needs, and budget
Operators automate complex Kubernetes application management tasks by encoding human operational knowledge.
// Operator manages app lifecycle beyond basic Kubernetes controllers, // e.g., backups, upgrades, failovers.
CRDs extend the Kubernetes API with custom objects representing domain-specific resources.
apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: name: memcacheds.cache.example.com spec: group: cache.example.com versions: - name: v1alpha1 served: true storage: true scope: Namespaced names: plural: memcacheds singular: memcached kind: Memcached
Controllers watch CRDs and manage resources, ensuring desired state matches actual state.
// Go code snippet with client-go controller watching Memcached CRD (simplified) func (c *Controller) Run(stopCh <-chan struct{}) { // Watch Memcached resources and reconcile }
A toolkit for building Kubernetes Operators with SDKs and tools simplifying development.
// Operator SDK CLI to create new operator operator-sdk init --domain example.com --repo github.com/example/memcached-operator
Deploy community Operators from OperatorHub.io to add functionality without coding.
// Example: Install Prometheus Operator with Helm helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus prometheus-community/prometheus-operator
Package Operators as Helm charts for simplified deployment and management.
// Helm install operator from chart directory helm install my-operator ./operator-chart
OLM manages Operator installation, updates, and lifecycle within Kubernetes clusters.
// Install OLM on cluster kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.20.0/crds.yaml kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.20.0/olm.yaml
Operators support multiple CRD versions to enable smooth upgrades and backward compatibility.
// CRD spec versions array includes v1beta1, v1 etc. // Use conversion webhook for version migration.
Follow reconciliation loops, idempotency, and event-driven patterns for stable Operators.
// Ensure Reconcile function is safe to run multiple times without side effects
Use logs, events, and Kubernetes API to debug operator behavior and state transitions.
// View operator logs kubectl logs deployment/my-operator // Check events kubectl get events --namespace my-operator-namespace
Operators manage databases, caches, messaging systems, and complex applications on Kubernetes.
// Examples: // - MongoDB Operator for managing database clusters // - Kafka Operator for managing messaging systems
Operators scale by managing multiple resources and running multiple controller instances with leader election.
// Use leader election flags in operator deployment YAML
Expose Prometheus metrics and structured logs for monitoring Operator health and performance.
// Operator exposes /metrics endpoint for Prometheus scraping
Run Operators with least privilege, using Role-Based Access Control (RBAC) and secure secrets management.
// Example RBAC rule snippet for operator permissions apiVersion: rbac.authorization.k8s.io/v1 kind: Role rules: - apiGroups: ["cache.example.com"] resources: ["memcacheds"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Use Operator SDK to scaffold, implement, and deploy a basic Operator managing a sample resource.
// Create operator operator-sdk init --domain example.com --repo github.com/example/memcached-operator operator-sdk create api --group cache --version v1alpha1 --kind Memcached --resource --controller // Implement reconcile logic in Go // Build and deploy operator container
Kubernetes schedules pods to nodes based on resource availability, policies, and constraints.
// Default scheduler decides pod placement based on resource requests and node status
Define rules to prefer or require pods to run on specific nodes or avoid certain nodes.
apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: - ssd
Schedule pods relative to other pods based on labels to co-locate or spread workloads.
spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname"
Prevent pods from scheduling on certain nodes unless they tolerate node taints.
// Add taint to node kubectl taint nodes node1 key=value:NoSchedule // Pod toleration example spec: tolerations: - key: "key" operator: "Equal" value: "value" effect: "NoSchedule"
Implement your own scheduler logic to customize pod placement.
// Run custom scheduler binary in cluster with specific schedulerName in pod spec
Extend default scheduler with external HTTP endpoints to influence scheduling decisions.
// Scheduler calls extender with pod and node info for filtering and scoring
Use policies like priorities, weights, and preemption to control scheduling behavior.
// PriorityClasses define scheduling priority for pods apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000 globalDefault: false description: "High priority class"
Higher priority pods can preempt lower priority pods to free resources.
// Kubernetes automatically evicts low priority pods if resources needed by higher priority pod
Use events, logs, and describe commands to diagnose scheduling issues.
// Check pod events kubectl describe pod// Look for scheduling errors or reasons pods remain pending
Enforce resource consumption limits on namespaces to control cluster resource usage.
apiVersion: v1 kind: ResourceQuota metadata: name: compute-resources spec: hard: requests.cpu: "4" requests.memory: 16Gi limits.cpu: "8" limits.memory: 32Gi
Run multiple schedulers in a cluster for specialized scheduling.
// Specify schedulerName in pod spec to use alternate scheduler spec: schedulerName: custom-scheduler
Use node selectors, affinity, taints, and quotas wisely to optimize resource use and availability.
// Combine affinity and tolerations to achieve workload isolation and availability
Schedule workloads requiring GPUs or hardware accelerators using node labels and resource requests.
// Request GPU resource in pod spec resources: limits: nvidia.com/gpu: 1
Use federated schedulers or multi-cluster controllers to manage scheduling across clusters.
// Federated Kubernetes setup with custom schedulers
Examples include workload isolation, batch processing scheduling, and GPU-intensive job scheduling.
// Use node affinity for batch jobs to run on less busy nodes
The Kubernetes API is the central communication interface for all components, exposing cluster state and operations.
// Kubernetes API server listens on port 6443 by default // Supports RESTful requests to manage cluster resources
API resources are organized into groups (core, apps, batch) and versions (v1, v1beta1) for stability and extensibility.
// Example API path: // /apis/apps/v1/deployments // /api/v1/pods
kubectl CLI interacts with the API server to manage cluster resources.
kubectl get pods kubectl create -f deployment.yaml kubectl delete service my-service
Supports token-based, client certificate, and OIDC authentication for secure access.
// Configure ~/.kube/config with user credentials and certificates
Allows extending the Kubernetes API by adding custom APIs served by external services.
// Register an APIService object pointing to the external API server
CRDs enable users to define their own Kubernetes resource types dynamically.
apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: name: widgets.example.com spec: group: example.com versions: - name: v1 served: true storage: true scope: Namespaced names: plural: widgets singular: widget kind: Widget
Pluggable components that intercept API requests for validation, mutation, or security enforcement.
// Examples: NamespaceLifecycle, LimitRanger, PodSecurityPolicy
Admission webhooks can dynamically validate or mutate objects during creation or update.
// Define a ValidatingWebhookConfiguration with webhook URL and rules
Webhooks enable dynamic admission decisions without changing API server code.
// Useful for policy enforcement and security compliance
Plugins like scheduler plugins, networking plugins extend cluster behavior.
// Example: Using CNI plugins for custom networking
Protect API server from overload by limiting request rates and burst sizes.
// Configurable via --max-requests-inflight and --max-mutating-requests-inflight flags
Configuration flags control authentication, authorization, auditing, and feature gates.
// Example flag: --authorization-mode=RBAC
Use kubectl verbose mode or API server logs to diagnose request issues.
kubectl get pods -v=8 // Check logs in /var/log/kube-apiserver.log
Client SDKs exist for Go, Python, JavaScript, and others for programmatic access.
// Example: Using client-go for Go applications to manage resources
Interact with the API to create, update, delete, and watch resources programmatically.
// Example curl command to list pods curl --cacert ca.crt --header "Authorization: Bearer TOKEN" https:///api/v1/pods
CI/CD automates code integration, testing, and deployment to Kubernetes clusters.
// CI automates building and testing code // CD automates deployment to environments
GitOps uses Git repositories as the source of truth for Kubernetes deployment state.
// Tools like Argo CD and Flux sync cluster state to Git
Jenkins pipelines can build Docker images and deploy to Kubernetes using plugins.
pipeline { agent any stages { stage('Build') { steps { sh 'docker build -t myapp:${env.BUILD_ID} .' } } stage('Deploy') { steps { sh 'kubectl apply -f k8s/deployment.yaml' } } } }
GitLab integrates tightly with Kubernetes clusters for automated builds and deployments.
// .gitlab-ci.yml example with kubectl commands for deploy
Argo CD continuously monitors Git repos and applies changes to Kubernetes clusters automatically.
// Declarative deployment management with Argo CD
Flux watches Git and container registries to automate deployments and updates.
// Install Flux in cluster and link to Git repo
Automate image builds, tagging, and pushing to container registries in CI pipelines.
docker build -t myapp:${CI_COMMIT_SHA} . docker push myapp:${CI_COMMIT_SHA}
Use pipeline stages to deploy updated manifests or Helm charts to Kubernetes.
kubectl apply -f manifests/ helm upgrade myapp ./charts/myapp
Run integration, smoke, and end-to-end tests as part of CI/CD pipelines on Kubernetes.
// Execute tests inside Kubernetes pods or as jobs kubectl run test-runner --image=myapp-tests -- ...
Leverage Kubernetes rollout strategies to rollback or update applications smoothly.
kubectl rollout status deployment/myapp kubectl rollout undo deployment/myapp
Store sensitive data securely using Kubernetes Secrets integrated into pipelines.
kubectl create secret generic db-password --from-literal=password=secret
Restrict pipeline permissions, use scanned images, and manage secrets carefully.
// Use least privilege service accounts // Scan container images for vulnerabilities
Track deployment health and pipeline status with monitoring tools like Prometheus and Grafana.
// Set alerts on failed rollouts or pod crashes
Implement advanced deployment patterns to minimize downtime and risk.
// Use tools like Flagger for automated canary releases
Real-world examples of Kubernetes CI/CD setups in various industries.
// Example: eCommerce platform using Jenkins + Argo CD
Frequent problems include pod crashes, image pull errors, networking failures, and resource shortages.
Use kubectl describe pod
and kubectl logs
to diagnose pod/container issues.
// Describe a pod to get detailed info and events kubectl describe pod pod-name -n namespace // View logs from a container inside a pod kubectl logs pod-name -n namespace
Check CNI plugin status, service endpoints, and DNS resolution within the cluster.
// Check network plugin pods status kubectl get pods -n kube-system -l k8s-app=cni-plugin // Test DNS resolution in a debug pod kubectl run dnsutils --image=tutum/dnsutils -it --rm --restart=Never -- nslookup kubernetes.default
Inspect PersistentVolume (PV) and PersistentVolumeClaim (PVC) bindings and access modes.
// Check PVC status and events kubectl describe pvc pvc-name -n namespace // View PVs and their status kubectl get pv
Review API server logs and check its health endpoint for errors.
// Get logs from API server pod (usually in kube-system namespace) kubectl logs -n kube-system kube-apiserver-master-node // Check API server health curl -k https://localhost:6443/healthz
Use kubectl debug
to start troubleshooting sessions inside nodes or pods.
// Start an ephemeral container in a running pod for debugging kubectl debug pod-name -n namespace --image=busybox --target=container-name -- sh
Use kubectl get events
and kubectl logs
to understand cluster state changes and errors.
// View recent events in a namespace sorted by time kubectl get events -n namespace --sort-by=.metadata.creationTimestamp
Check node conditions, resource usage, and kubelet status.
// Get node status and readiness kubectl get nodes // Describe a node for detailed info kubectl describe node node-name
Check scheduler logs and events to troubleshoot pod scheduling problems.
// Get scheduler logs (usually in kube-system namespace) kubectl logs -n kube-system kube-scheduler-master-node
Common DNS issues include CoreDNS crash loops or misconfiguration causing resolution failures.
// Check CoreDNS pods status kubectl get pods -n kube-system -l k8s-app=kube-dns // Restart CoreDNS pods to fix transient issues kubectl delete pod -n kube-system -l k8s-app=kube-dns
Check rollout status and pod statuses to identify deployment issues.
// Check deployment rollout status kubectl rollout status deployment/deployment-name -n namespace // Describe StatefulSet to review pod events kubectl describe statefulset statefulset-name -n namespace
Pods may fail to schedule or get evicted if they exceed resource limits or quotas.
// View resource quotas in namespace kubectl get resourcequota -n namespace // Check pod resource requests and limits kubectl describe pod pod-name -n namespace
Inspect Helm release status, hooks, and generated manifests for errors.
// Get Helm release status helm status release-name // Template rendering to debug manifests helm template release-name ./chart-directory
Use tools like kubectl-debug
, stern
, k9s
, and dashboard UIs for enhanced troubleshooting.
// Install k9s for terminal UI brew install derailed/k9s/k9s // Use stern for multi-pod log tailing stern pod-name-prefix
Always check logs, events, and resource statuses; isolate problems by reproducing; use namespaces and labels to scope issues.
Kubernetes has a rich ecosystem including tools for CI/CD, monitoring, networking, storage, and security.
Tools like Helm, Prometheus, Istio, and Kubeflow complement Kubernetes functionality.
Knative enables serverless workloads on Kubernetes, abstracting scaling and event handling.
Deploy lightweight Kubernetes clusters at the edge for low-latency and disconnected environments.
Service meshes provide traffic management, security, and observability for microservices.
Kubeflow and other frameworks facilitate deploying and managing AI/ML pipelines on Kubernetes.
Manage IoT devices and data streams with Kubernetes-based platforms for scale and reliability.
Managed Kubernetes services simplify cluster setup and maintenance on AWS, Google Cloud, and Azure.
Federation enables managing multiple clusters across regions or clouds as a single entity.
Run workloads distributed across multiple cloud providers to avoid vendor lock-in and increase resilience.
Combine on-premises and cloud Kubernetes clusters for flexible infrastructure management.
Features like server-side apply, ephemeral containers, and better security are on the roadmap.
Kubernetes has a vibrant open-source community with SIGs, working groups, and regular releases.
Certified Kubernetes Administrator (CKA) and Developer (CKAD) exams validate skills.
Kubernetes will continue evolving to better support edge, AI, multi-cloud, and developer experience improvements.