What is Kubernetes?

Kubernetes is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications.

// Example: Deploying a simple nginx pod in Kubernetes
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
spec:
  containers:
  - name: nginx
    image: nginx:latest
    ports:
    - containerPort: 80

History and Evolution of Kubernetes

Developed by Google based on their internal system Borg, Kubernetes was open sourced in 2014 and is now maintained by the CNCF community.

Timeline:
- 2014: Kubernetes 1.0 released
- 2015: CNCF formed and Kubernetes donated
- Continuous growth with major cloud provider support

Kubernetes vs Traditional Container Orchestration

Kubernetes offers declarative configurations, self-healing, and advanced scheduling compared to simpler, script-based orchestration methods.

Traditional:
- Manual scripts
- Limited scaling & recovery

Kubernetes:
- Automated scheduling
- Rolling updates & self-healing

Core Concepts: Pods, Nodes, Clusters

A Pod is the smallest deployable unit, a Node is a worker machine, and a Cluster is a set of nodes managed together.

Definitions:
- Pod: One or more containers sharing network/storage
- Node: Physical or virtual machine
- Cluster: Collection of nodes managed by Kubernetes

Kubernetes Architecture Overview

Kubernetes follows a master-worker architecture with a control plane managing nodes and workloads running on worker nodes.

Architecture:
- Control Plane (Master): API Server, Scheduler, Controller Manager, Etcd
- Worker Nodes: Kubelet, kube-proxy, container runtime

Kubernetes Components: Master & Worker Nodes

The master manages cluster state; worker nodes run pods and containers.

Master components:
- API Server
- Scheduler
- Controller Manager
- Etcd

Worker components:
- Kubelet
- Kube-proxy
- Container runtime (Docker, containerd)

Kubernetes API Server

The API Server is the front-end that exposes the Kubernetes API and serves as the cluster’s main control point.

kubectl command interacts with API Server like:
kubectl get pods
kubectl apply -f deployment.yaml

Etcd: The Kubernetes Key-Value Store

Etcd stores all cluster data, including configuration and state, and is critical for high availability and consistency.

// Etcd is a distributed, consistent key-value store
etcdctl get /registry/pods/nginx-pod

Scheduler and Controller Manager

The scheduler assigns pods to nodes based on resource availability, while the controller manager ensures the cluster state matches the desired configuration.

Scheduler decides:
- Which node a pod runs on

Controller Manager handles:
- Node lifecycle
- Replication controller
- Endpoint management

Kubernetes Networking Basics

Kubernetes networking provides each pod with a unique IP, and services to allow communication inside and outside the cluster.

// Sample Service YAML
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP

Kubernetes Storage Basics

Storage in Kubernetes is handled via Volumes, PersistentVolumes, and PersistentVolumeClaims to manage stateful applications.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-example
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Kubernetes Resource Model

Kubernetes uses declarative resource definitions in YAML/JSON files describing the desired state of objects like Pods, Services, and Deployments.

Example resource YAML snippet:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.19
        ports:
        - containerPort: 80

Kubernetes YAML Files Introduction

Kubernetes uses YAML files to define resources declaratively. These files are applied using the kubectl CLI.

kubectl apply -f deployment.yaml

# Sample deployment.yaml file defines pods, replicas, containers, etc.

Installing Kubernetes (Minikube, Kind)

Minikube and Kind allow you to run Kubernetes locally for learning and development.

// Minikube install and start
minikube start

// Kind create cluster
kind create cluster

Kubernetes CLI: kubectl Basics

kubectl is the command-line tool to interact with Kubernetes clusters for deploying and managing resources.

// Common kubectl commands
kubectl get pods
kubectl describe svc nginx-service
kubectl delete pod nginx-pod

Kubernetes Master Components Deep Dive

The master components control the cluster, including API Server, Scheduler, Controller Manager, and Etcd as the data store.

Master components overview:
- API Server: main cluster interface
- Scheduler: assigns pods to nodes
- Controller Manager: maintains cluster state
- Etcd: key-value store for configs and state

Worker Node Components

Worker nodes run the containers and have components like Kubelet, kube-proxy, and the container runtime.

Worker components:
- Kubelet: manages pods on node
- Kube-proxy: network proxy for services
- Container runtime: runs containers (Docker, containerd)

Control Plane vs Data Plane

The control plane manages the cluster's desired state; the data plane runs the actual workloads (pods) on worker nodes.

Control Plane:
- API Server, Scheduler, Controllers

Data Plane:
- Nodes running pods and containers

Cluster Nodes and Node Pools

Nodes are grouped into node pools, allowing different machine types or configurations within the same cluster.

// Example GKE node pools:
gcloud container node-pools create pool1 --cluster=my-cluster --machine-type=n1-standard-1

Node Registration and Health Checks

Nodes register with the master and periodically send heartbeats. The master monitors node health and takes action if nodes fail.

// Check node status
kubectl get nodes

// Node heartbeat example
NodeStatus: Ready

Pod Lifecycle and Scheduling

Pods go through lifecycle phases: Pending, Running, Succeeded, Failed, or Unknown. The scheduler assigns pods to nodes based on constraints.

Pod phases:
- Pending
- Running
- Succeeded
- Failed

Scheduler assigns pods based on resource availability.

Kubernetes Networking Model

Kubernetes requires every pod to have a unique IP and supports flat networking where pods communicate transparently.

// Network plugin example: Calico, Flannel
kubectl apply -f calico.yaml

Service Discovery in Kubernetes

Services provide stable DNS names and IPs for accessing pods. CoreDNS handles name resolution inside the cluster.

// Example Service DNS
my-service.my-namespace.svc.cluster.local

Cluster DNS and CoreDNS

CoreDNS runs as a Kubernetes service to provide DNS resolution for service names and pods within the cluster.

// Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns

Container Runtime Interface (CRI)

The CRI abstracts container runtimes so Kubernetes can use different engines like Docker or containerd interchangeably.

// Check container runtime on node
kubectl get node  -o jsonpath='{.status.nodeInfo.containerRuntimeVersion}'

kube-proxy and Networking Plugins

kube-proxy manages virtual IPs for services and routes traffic. Networking plugins provide pod networking and policies.

// kube-proxy runs on nodes to handle traffic routing
kubectl get pods -n kube-system -l k8s-app=kube-proxy

High Availability in Kubernetes Clusters

HA setups run multiple master nodes, etcd members, and use load balancers to avoid single points of failure.

// HA master nodes run in multi-node clusters
# Example: kubeadm HA setup with stacked etcd

Cluster Autoscaling Concepts

Autoscalers dynamically adjust node count based on resource usage to optimize cost and availability.

// Example: Enable cluster autoscaler in GKE
gcloud container clusters update my-cluster --enable-autoscaling --min-nodes=1 --max-nodes=5 --node-pool=default-pool

Multi-Cluster Kubernetes

Managing multiple Kubernetes clusters across regions or cloud providers enables high availability and disaster recovery.

// Tools: Rancher, Google Anthos, Azure Arc

Monitoring Cluster Health

Use Prometheus, Grafana, and Kubernetes metrics API to monitor resource usage, pod status, and cluster performance.

// Sample Prometheus query:
kube_pod_status_phase{phase="Running"}

What is a Pod?

A Pod is the smallest deployable unit in Kubernetes, representing one or more containers running together on a node.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: nginx
    image: nginx:latest

Pod Lifecycle

Pods go through phases: Pending, Running, Succeeded, Failed, and Unknown.

// Check pod status:
kubectl get pods example-pod
kubectl describe pod example-pod

Multi-Container Pods

Pods can have multiple containers that share storage/networking and coordinate closely.

spec:
  containers:
  - name: app-container
    image: myapp:latest
  - name: sidecar-container
    image: log-collector:latest

Init Containers

Init containers run before app containers to perform setup tasks.

spec:
  initContainers:
  - name: init-db
    image: busybox
    command: ['sh', '-c', 'setup-db.sh']
  containers:
  - name: app
    image: myapp:latest

Pod Spec and Pod Templates

Pod specs define the pod configuration; templates are used in controllers like Deployments.

apiVersion: apps/v1
kind: Deployment
spec:
  template: # pod template
    spec:
      containers:
      - name: app
        image: myapp:latest

Pod Networking

Each pod gets its own IP; containers communicate via localhost and network namespaces.

// Pods communicate using pod IP addresses directly.
// Kubernetes provides flat networking within cluster.

Pod Security Contexts

Defines security options like user IDs and capabilities for pods or containers.

spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
  containers:
  - name: app
    image: myapp:latest

Managing Pod Resources (CPU/Memory)

Specify resource requests and limits to control pod resource usage.

resources:
  requests:
    memory: "128Mi"
    cpu: "250m"
  limits:
    memory: "256Mi"
    cpu: "500m"

Pod Scheduling and Affinity

Configure scheduling preferences like node affinity and anti-affinity for pod placement.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: disktype
          operator: In
          values:
          - ssd

Pod Disruption Budgets

Defines the number of pods that can be unavailable during maintenance or disruptions.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pdb-example
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: myapp

Using Labels and Annotations with Pods

Labels organize pods for selection; annotations store metadata.

metadata:
  labels:
    app: myapp
  annotations:
    description: "Pod running myapp version 1.0"

Debugging Pods

Use kubectl commands to inspect pod status, logs, and events for troubleshooting.

kubectl describe pod example-pod
kubectl logs example-pod
kubectl exec -it example-pod -- /bin/sh

Pod Logs and Events

View container logs and pod-related events to diagnose issues.

kubectl logs example-pod
kubectl get events --field-selector involvedObject.name=example-pod

Sidecar Containers Pattern

Use sidecar containers to provide auxiliary services like logging or proxies alongside main containers.

spec:
  containers:
  - name: main-app
    image: myapp:latest
  - name: sidecar-logger
    image: log-collector:latest

Best Practices for Pods

Keep pods small, use resource limits, label well, and monitor continuously.

// Tips:
// - Use liveness/readiness probes
// - Avoid running as root
// - Use ConfigMaps and Secrets for config

What is a Service?

A Service exposes a set of Pods as a network service, providing stable IPs and DNS names.

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: myapp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

Types of Services (ClusterIP, NodePort, LoadBalancer)

Different service types expose pods inside or outside the cluster with varying accessibility.

// ClusterIP (default) — internal access only
spec:
  type: ClusterIP

// NodePort — exposes service on node port externally
spec:
  type: NodePort
  ports:
  - port: 80
    nodePort: 30007

// LoadBalancer — external cloud load balancer
spec:
  type: LoadBalancer

Headless Services

Services without a ClusterIP, useful for direct pod access or StatefulSets.

spec:
  clusterIP: None
  selector:
    app: myapp
  ports:
  - port: 80

ExternalName Services

Maps service to an external DNS name without proxying traffic through Kubernetes.

apiVersion: v1
kind: Service
metadata:
  name: external-service
spec:
  type: ExternalName
  externalName: example.com

Service Discovery Mechanisms

Kubernetes DNS resolves services to their cluster IP or endpoints for discovery.

// Pods can access service by DNS name, e.g. my-service.default.svc.cluster.local

Service Selectors and Endpoints

Services select pods by labels; endpoints represent actual pod IPs behind the service.

// Service selector example
selector:
  app: myapp

// View endpoints:
kubectl get endpoints my-service

DNS for Services

Kubernetes provides DNS service inside clusters for easy service name resolution.

// DNS example: curl http://my-service.default.svc.cluster.local

Using Services with Ingress

Ingress manages external HTTP/HTTPS access routing to services.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 80

Load Balancing in Kubernetes

Services balance network traffic across pods for high availability.

// ClusterIP service load balances TCP/UDP requests internally automatically

Service Mesh Basics (Istio/Linkerd)

Service meshes add features like traffic routing, retries, and security on top of services.

// Example: Istio injects sidecars to control traffic policies and observability

Creating and Exposing Services

Create services to expose pods internally or externally as needed.

kubectl expose deployment myapp --type=LoadBalancer --name=my-service --port=80 --target-port=8080

Debugging Services

Use kubectl commands to check service status, endpoints, and troubleshoot connectivity.

kubectl describe svc my-service
kubectl get endpoints my-service
kubectl logs pod-name

Network Policies and Service Security

Network Policies restrict pod-to-pod or pod-to-service traffic for security.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-svc-traffic
spec:
  podSelector:
    matchLabels:
      app: myapp
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend

Service Metrics and Monitoring

Monitor services using Prometheus or other tools to track availability and performance.

// Use metrics-server, Prometheus exporters, or cloud provider monitoring

Best Practices for Services

Use appropriate service types, label selectors carefully, monitor service health, and secure access.

// Keep services lean and well-labeled
// Use readiness probes to control pod availability
// Regularly audit exposed services

What is a Deployment?

A Deployment in Kubernetes manages the lifecycle of Pods and ReplicaSets, enabling declarative updates to applications.

# Simple Deployment YAML snippet
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21

Deployment Spec and YAML

The deployment spec defines desired state including replicas, pod template, selectors, and strategy.

# Key fields in Deployment spec
spec:
  replicas: 3                   # Number of pod replicas
  selector:                     # Selector for pods
    matchLabels:
      app: nginx
  template:                     # Pod template spec
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
  strategy:                     # Update strategy (RollingUpdate or Recreate)
    type: RollingUpdate

Rolling Updates and Rollbacks

Deployments perform rolling updates to avoid downtime. If something goes wrong, you can rollback to a previous version.

// Rollout update command
kubectl set image deployment/nginx-deployment nginx=nginx:1.22
// Rollback to previous revision
kubectl rollout undo deployment/nginx-deployment

Strategies for Updates

RollingUpdate (default) gradually replaces pods. Recreate kills old pods before creating new ones.

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 1
    maxSurge: 1

Managing ReplicaSets

ReplicaSets ensure a specified number of pod replicas are running at all times.

// Get ReplicaSets for a deployment
kubectl get rs -l app=nginx

// Scale ReplicaSet directly (not recommended)
kubectl scale rs nginx-deployment-xxxxx --replicas=5

Scaling Deployments

You can scale the number of replicas manually or with autoscaling.

// Manual scale
kubectl scale deployment nginx-deployment --replicas=5

Deployment Revision History

Kubernetes stores revisions of deployments to support rollbacks.

// Check rollout history
kubectl rollout history deployment/nginx-deployment

Pausing and Resuming Deployments

Pause a deployment to make multiple changes, then resume to apply updates.

kubectl rollout pause deployment/nginx-deployment
kubectl rollout resume deployment/nginx-deployment

Canary Deployments

Deploy a new version to a small subset of users before a full rollout to test changes safely.

// Example: Create a deployment with fewer replicas of new version alongside stable version
kubectl apply -f canary-deployment.yaml

Blue-Green Deployments

Maintain two separate environments (blue and green) and switch traffic between them to minimize downtime.

// Create blue and green deployments and update service selector accordingly
kubectl apply -f blue-deployment.yaml
kubectl apply -f green-deployment.yaml

Using Annotations in Deployments

Annotations store metadata and can be used for custom tracking or tooling integrations.

metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"

Deployment Rollout Status

Monitor deployment progress using rollout status command.

kubectl rollout status deployment/nginx-deployment

Deployment Autoscaling

Automatically scale pods based on CPU or custom metrics using Horizontal Pod Autoscaler (HPA).

kubectl autoscale deployment nginx-deployment --min=2 --max=10 --cpu-percent=80

Debugging Deployments

Use logs, describe commands, and events to troubleshoot deployment issues.

kubectl describe deployment nginx-deployment
kubectl logs deployment/nginx-deployment
kubectl get events

Best Practices for Deployments

Use rolling updates, monitor health probes, limit max surge/unavailable, and enable autoscaling for reliability.

// Example rollingUpdate strategy
strategy:
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0

What are ConfigMaps?

ConfigMaps store non-confidential configuration data as key-value pairs to be consumed by pods.

apiVersion: v1
kind: ConfigMap
metadata:
  name: example-config
data:
  APP_COLOR: "blue"
  LOG_LEVEL: "debug"

Creating and Using ConfigMaps

Create ConfigMaps from files, directories, or literals and mount them as env variables or volumes.

// Create ConfigMap from literal
kubectl create configmap example-config --from-literal=APP_COLOR=blue

// Use in pod environment
env:
- name: APP_COLOR
  valueFrom:
    configMapKeyRef:
      name: example-config
      key: APP_COLOR

Mounting ConfigMaps as Volumes

You can mount ConfigMaps as files inside pods.

volumes:
- name: config-volume
  configMap:
    name: example-config

containers:
- name: app
  volumeMounts:
  - name: config-volume
    mountPath: /etc/config

Using ConfigMaps as Environment Variables

ConfigMaps can inject configuration data as environment variables.

env:
- name: LOG_LEVEL
  valueFrom:
    configMapKeyRef:
      name: example-config
      key: LOG_LEVEL

What are Secrets?

Secrets store sensitive data such as passwords, tokens, and keys, encoded in base64.

apiVersion: v1
kind: Secret
metadata:
  name: example-secret
type: Opaque
data:
  password: cGFzc3dvcmQ=  # base64 encoded "password"

Creating and Using Secrets

Create secrets from files or literals and use them as environment variables or volumes.

// Create secret from literal
kubectl create secret generic example-secret --from-literal=password=password

// Use in pod env
env:
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: example-secret
      key: password

Types of Secrets (Opaque, TLS, Docker-registry)

Opaque is generic, TLS stores certificates, Docker-registry holds container registry credentials.

# TLS secret example
kubectl create secret tls tls-secret --cert=cert.pem --key=key.pem

Using Secrets in Pods

Secrets can be mounted as files or environment variables to secure application data.

volumes:
- name: secret-volume
  secret:
    secretName: example-secret

containers:
- name: app
  volumeMounts:
  - name: secret-volume
    mountPath: /etc/secret

Managing Secrets with External Tools

Integrate with Vault or AWS Secrets Manager for advanced secret management.

// Vault integration example:
// Configure Vault Agent to inject secrets as files or env variables

Encryption of Secrets

Kubernetes supports encrypting secrets at rest in etcd to enhance security.

// Enable encryption in kube-apiserver config:
// encryptionProviders:
// - aescbc:
//     keys:
//     - name: key1
//       secret:

Access Control for Secrets

Use RBAC to restrict who can read or modify secrets.

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: secret-reader
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list"]

Rotating Secrets

Periodically update secrets to reduce risk from leaked credentials.

// Update secret
kubectl create secret generic example-secret --from-literal=password=newpassword --dry-run=client -o yaml | kubectl apply -f -

Debugging ConfigMaps and Secrets

Use kubectl describe and logs to diagnose issues with config or secret mounting.

kubectl describe configmap example-config
kubectl describe secret example-secret
kubectl logs

Best Practices for ConfigMaps and Secrets

Keep secrets encrypted and access-restricted, separate config from code, and automate rotation.

// Example best practice:
// Use external vaults for production secrets
// Avoid hardcoding secrets in manifests

Integrating Secrets with External Vaults

Popular vault solutions include HashiCorp Vault, AWS Secrets Manager, and Azure Key Vault to securely store and inject secrets.

// Use Vault CSI driver to mount secrets as volumes in pods
// Configure Kubernetes auth method for Vault access

Kubernetes Storage Concepts

Kubernetes abstracts storage with volumes that persist beyond container life, ensuring data durability.

# Volumes provide data storage accessible by containers
# They can be ephemeral or persistent depending on use case

Persistent Volumes (PV)

PVs are cluster resources representing actual storage (e.g., disks).

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-example
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"

Persistent Volume Claims (PVC)

PVCs are requests for storage by users, binding to matching PVs.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-example
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Storage Classes and Dynamic Provisioning

StorageClasses define types of storage and allow automatic provisioning of PVs.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-storage
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2

Volume Plugins Overview

Plugins allow integration with different storage backends: cloud disks, network storage, local volumes.

# Examples:
# awsElasticBlockStore, gcePersistentDisk, nfs, hostPath, cephfs, iscsi, etc.

Using HostPath and EmptyDir Volumes

HostPath mounts a file or directory from the node; EmptyDir is ephemeral storage created for a pod's lifetime.

apiVersion: v1
kind: Pod
metadata:
  name: hostpath-pod
spec:
  containers:
  - name: container
    image: nginx
    volumeMounts:
    - mountPath: /data
      name: host-volume
  volumes:
  - name: host-volume
    hostPath:
      path: /mnt/data
      type: Directory

Configuring Network File System (NFS) Volumes

NFS allows shared storage accessible by multiple pods across nodes.

volumes:
- name: nfs-volume
  nfs:
    server: nfs.example.com
    path: /exports/data

Using Cloud Storage Providers (AWS EBS, GCP PD, Azure Disk)

Cloud-specific storage volumes integrate with Kubernetes via plugins and StorageClasses.

# Example AWS EBS volume in pod spec
volumes:
- name: ebs-volume
  awsElasticBlockStore:
    volumeID: vol-0abcd1234efgh5678
    fsType: ext4

StatefulSets and Storage

StatefulSets use persistent storage with stable identities, often via PVC templates.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 5Gi

Volume Mounts and Access Modes

Access modes define how volumes can be used: ReadWriteOnce, ReadOnlyMany, ReadWriteMany.

accessModes:
- ReadWriteOnce  # Mounted by a single node as read-write
- ReadOnlyMany   # Mounted read-only by many nodes
- ReadWriteMany  # Mounted read-write by many nodes

Volume Snapshots and Backups

Snapshots capture volume state; backup solutions protect data from loss.

# Snapshot example with CSI drivers:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: snapshot-example
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: pvc-example

Storage Security and Encryption

Encrypt data at rest and in transit, enforce RBAC for storage resources.

# Use cloud provider encryption options or third-party tools
# Control access with Kubernetes RBAC on PVC/PV objects

Managing Storage Quotas

Limit storage consumption per namespace using ResourceQuota objects.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
spec:
  hard:
    requests.storage: 50Gi

Debugging Storage Issues

Check pod events, PVC/PV status, CSI driver logs, and node health to troubleshoot storage problems.

kubectl describe pvc pvc-example
kubectl get events --namespace=my-namespace
journalctl -u kubelet

Best Practices for Storage

Use dynamic provisioning, clean up unused PVs, monitor storage health, and enforce access policies.

# Always use StorageClasses to avoid manual PV management
# Monitor volume usage and expand PVCs when needed
# Secure storage with encryption and RBAC

What is a StatefulSet?

A StatefulSet manages stateful applications, providing stable network IDs and persistent storage per pod.

# StatefulSet example manages pods with unique identities
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql"
  replicas: 3
  selector:
    matchLabels:
      app: mysql

StatefulSet vs Deployment

Deployments manage stateless apps with interchangeable pods; StatefulSets manage ordered, unique pods with persistent state.

# Deployments scale stateless replicas
# StatefulSets guarantee stable pod names and storage

StatefulSet Use Cases

Databases, queues, and other services requiring stable identities and persistent storage.

# Examples:
# MySQL, Cassandra, Kafka, Elasticsearch

Managing StatefulSet Volumes

Use volumeClaimTemplates to provision PVCs for each pod automatically.

spec:
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

Scaling StatefulSets

Scale pods one at a time in order, ensuring data consistency.

kubectl scale statefulset mysql --replicas=5
# Pods start with stable names: mysql-0, mysql-1, ...

Rolling Updates for StatefulSets

Updates happen sequentially, one pod at a time, respecting pod ordering.

kubectl rollout restart statefulset mysql

What is a DaemonSet?

A DaemonSet ensures one pod runs on each (or selected) node for tasks like monitoring or logging.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter

DaemonSet Use Cases

Node-level agents: log collectors, monitoring, network proxies.

# Examples:
# Fluentd, Prometheus Node Exporter, Calico agents

Managing DaemonSets

Update DaemonSets carefully as pods run on all nodes and can affect cluster stability.

kubectl rollout status daemonset node-exporter
kubectl delete daemonset node-exporter

Rolling Updates for DaemonSets

DaemonSets update pods one at a time or in batches controlled via updateStrategy.

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1

DaemonSets and Node Affinity

Control which nodes run DaemonSet pods using node selectors and affinity rules.

spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/worker
                operator: In
                values:
                - "true"

Debugging StatefulSets and DaemonSets

Check pod status, events, logs, and describe pods for troubleshooting.

kubectl get pods -l app=mysql
kubectl describe pod mysql-0
kubectl logs mysql-0

StatefulSet and DaemonSet Best Practices

Use appropriate update strategies, monitor resource usage, and apply node affinity thoughtfully.

# For StatefulSets:
# - Avoid scaling down without draining data
# For DaemonSets:
# - Limit resource use to avoid node overload
# - Use tolerations for scheduling on tainted nodes

StatefulSets with Headless Services

Headless Services provide stable network identities for StatefulSet pods.

apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  clusterIP: None
  selector:
    app: mysql

Use Cases and Examples

StatefulSets for databases, DaemonSets for monitoring and logging across nodes.

# Example:
# StatefulSet runs Cassandra cluster nodes
# DaemonSet runs node monitoring agents on each node

What is a Job?

A Kubernetes Job creates one or more pods and ensures that a specified number of them successfully terminate.

# Basic job YAML example
apiVersion: batch/v1
kind: Job
metadata:
  name: example-job
spec:
  template:
    spec:
      containers:
      - name: hello
        image: busybox
        command: ["echo", "Hello World"]
      restartPolicy: Never
  backoffLimit: 4

Creating and Managing Jobs

Create jobs using kubectl apply and manage them with kubectl commands.

# Create job from yaml file
kubectl apply -f job.yaml

# Check job status
kubectl get jobs

# Delete job
kubectl delete job example-job

Job Completion and Failure Handling

Kubernetes tracks job completions and can retry failed pods up to backoffLimit times.

# backoffLimit: number of retries before marking Job as failed
# status.conditions indicates success or failure
kubectl describe job example-job

Parallel Jobs and Completions

Configure Jobs to run multiple pods in parallel using completions and parallelism fields.

spec:
  completions: 5         # total pods to complete
  parallelism: 2         # pods run concurrently

What is a CronJob?

CronJobs schedule Jobs to run periodically at fixed times, like Linux cron.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: example-cronjob
spec:
  schedule: "*/5 * * * *"  # runs every 5 minutes
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            command: ["echo", "Hello from CronJob"]
          restartPolicy: OnFailure
      backoffLimit: 3

Scheduling CronJobs

Use standard cron format for schedule field (minute, hour, day of month, month, day of week).

# Examples:
"0 0 * * *"        # every day at midnight
"0 */6 * * *"      # every 6 hours
"15 14 1 * *"      # 2:15 PM on the first day of each month

CronJob Time Zones and Schedules

CronJobs run according to the time zone of the kube-controller-manager (usually UTC). For time zone adjustments, specify in schedule or container logic.

# No native timezone support; handle with:
# - Adjusted cron expression for UTC offset
# - Timezone logic inside container commands/scripts

Managing CronJob History

Configure how many successful and failed jobs to keep with successfulJobsHistoryLimit and failedJobsHistoryLimit.

spec:
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1

Debugging Jobs and CronJobs

Use kubectl logs and describe to diagnose job failures.

# Get logs from job pods
kubectl logs job/example-job

# Describe job for event info
kubectl describe job example-job

# List pods with label selector from job
kubectl get pods --selector=job-name=example-job

Jobs and Resource Management

Specify CPU/memory requests and limits in job pod specs for resource allocation.

spec:
  template:
    spec:
      containers:
      - name: batch
        image: my-batch-image
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1"
      restartPolicy: OnFailure

Using Jobs for Batch Processing

Jobs are ideal for batch tasks like ETL, backups, report generation, or image processing.

# Example batch job command
command: ["python", "process_data.py"]

Job and CronJob Best Practices

Keep jobs idempotent, monitor job status, clean up old jobs, and handle retries properly.

# Tips:
# - Use backoffLimit to prevent endless retries
# - Use labels for easy job filtering
# - Use TTL controller to auto-clean finished jobs

Handling Job Retries

Control retry behavior using backoffLimit and activeDeadlineSeconds.

spec:
  backoffLimit: 3          # retry count
  activeDeadlineSeconds: 600  # max job duration in seconds

Monitoring Job Status

Use kubectl get jobs and watch job pods to track progress and status.

kubectl get jobs --watch
kubectl get pods --selector=job-name=example-job --watch

Example Use Cases

Common use cases include database backups, batch data import, report generation, email sending.

# Example CronJob for nightly DB backup:
schedule: "0 2 * * *"
command: ["sh", "-c", "pg_dump mydb > /backup/db_$(date +%F).sql"]

Kubernetes Networking Model

Kubernetes networking assumes all pods can communicate with each other without NAT, following a flat network model.

# Key points:
# - Every pod gets its own IP address
# - Pods can talk to any other pod directly
# - Network plugins implement this model

Container Networking Interface (CNI)

CNI is a standard interface for configuring network interfaces in Linux containers, used by Kubernetes for pod networking.

# Popular CNIs: Calico, Flannel, Weave Net
# CNIs configure IP addressing, routing, and network policies

Network Plugins Overview

Plugins provide pod networking, IP management, and enforce network policies.

# Examples:
# - Calico: Network policy enforcement + routing
# - Flannel: Simple overlay network
# - Weave Net: Encrypted networking

Pod-to-Pod Communication

Pods communicate using IP addresses assigned by the CNI without NAT or port mapping.

# Each pod has a unique IP address
# Pods on same node communicate via local interfaces
# Pods on different nodes communicate over overlay networks

Pod-to-Service Communication

Services provide stable IPs and DNS names, proxying traffic to backend pods.

# Service IP is virtual
# kube-proxy forwards requests to healthy pods
# Supports ClusterIP, NodePort, LoadBalancer types

Network Policies Introduction

Network policies restrict which pods can communicate with each other, enhancing security.

# Define ingress/egress rules using labels and ports
# Only enforced if network plugin supports policies (e.g., Calico)

Creating Network Policies

Use YAML to define NetworkPolicy resources with selectors and rules.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-nginx
spec:
  podSelector:
    matchLabels:
      app: nginx
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend
      ports:
      - protocol: TCP
        port: 80

Using Network Policies for Security

Apply policies to isolate pods, restrict access, and comply with security requirements.

# Block all traffic by default
# Allow only specific pods or namespaces
# Enforce rules for multi-tenant clusters

Troubleshooting Network Issues

Common tools include ping, traceroute, logs of network plugins, and checking policies.

# Check pod IPs and routes
kubectl exec -it podname -- ping otherpodip

# Check network plugin logs on nodes
journalctl -u calico-node

# Check NetworkPolicy logs (if enabled)

Service Mesh Networking

Service meshes like Istio provide advanced routing, load balancing, and security on top of Kubernetes networking.

# Inject sidecar proxies (Envoy) into pods
# Manage traffic policies without changing app code

Ingress vs Service Networking

Ingress controls external HTTP/S traffic into cluster; Services route internal traffic.

# Ingress resources define rules for host/path routing
# Services expose pods internally or externally (NodePort, LoadBalancer)

Cluster Networking Best Practices

Use network segmentation, restrict policies, monitor traffic, and keep plugins updated.

# Use separate namespaces with policies
# Monitor traffic flows and audit logs
# Avoid wide-open network policies

Network Performance Monitoring

Tools like Prometheus and Grafana help monitor latency, throughput, and packet loss.

# Export metrics from CNI plugin
# Set alerts for anomalies

Network Policy Use Cases

Examples: isolate dev/test from prod, allow only ingress controller to access services, limit database access.

# Example: allow frontend pod to access backend DB pods only on port 5432

Advanced Networking Concepts

Includes IPv6 dual-stack, multi-cluster networking, network encryption, and custom CNI plugins.

# IPv6 and IPv4 dual stack support
# Service mesh multi-cluster routing
# Network encryption via WireGuard or IPsec

What is an Ingress?

An Ingress is a Kubernetes API object that manages external access to services, typically HTTP, routing traffic to different services based on rules.

# Example: Basic ingress resource YAML
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 80

Ingress Controllers Overview

Ingress controllers are pods that implement the Ingress API, handling the routing of requests. Examples: NGINX, Traefik, HAProxy.

# Deploying NGINX Ingress Controller (simplified)
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller.yaml

Configuring Ingress Resources

Ingress resources define routing rules: hosts, paths, services, and TLS settings.

# YAML snippet configuring path-based routing
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /app1
        backend:
          service:
            name: app1-service
            port:
              number: 80
      - path: /app2
        backend:
          service:
            name: app2-service
            port:
              number: 80

TLS/SSL Termination in Ingress

Ingress can terminate TLS connections, providing HTTPS by specifying TLS secrets with certificates.

# Example TLS config in Ingress
spec:
  tls:
  - hosts:
    - example.com
    secretName: tls-secret

Path-based Routing

Route requests based on URL paths to different backend services, useful for hosting multiple apps behind one domain.

# Path-based routing example
paths:
- path: /api
  backend:
    service:
      name: api-service
      port:
        number: 80
- path: /web
  backend:
    service:
      name: web-service
      port:
        number: 80

Host-based Routing

Ingress routes traffic based on hostname, allowing multiple domains to be handled by one Ingress.

rules:
- host: api.example.com
  http:
    paths:
    - path: /
      backend:
        service:
          name: api-service
          port:
            number: 80
- host: www.example.com
  http:
    paths:
    - path: /
      backend:
        service:
          name: web-service
          port:
            number: 80

Annotations in Ingress

Annotations customize Ingress controller behavior, such as timeouts, rewrites, authentication.

metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"

Using Ingress with External Load Balancers

Ingress Controllers often expose a LoadBalancer service to receive traffic from outside the cluster.

# Example service type for Ingress Controller
kind: Service
apiVersion: v1
metadata:
  name: ingress-nginx
spec:
  type: LoadBalancer
  ports:
    - port: 80
  selector:
    app: ingress-nginx

Ingress Security Best Practices

Use TLS, limit access via IP whitelisting, enable authentication, and keep controller updated.

Troubleshooting Ingress

Check logs of Ingress controller pods, ensure service and endpoints are correct, and verify DNS settings.

Ingress vs Service Load Balancers

Ingress provides Layer 7 routing (HTTP), while Service LoadBalancers provide Layer 4 (TCP/UDP) routing.

NGINX Ingress Controller Basics

Most popular Ingress controller, configurable via annotations and config maps.

Using Traefik Ingress Controller

Traefik is a modern dynamic ingress with features like automatic cert management and dashboard.

Ingress Metrics and Monitoring

Monitor ingress traffic, error rates, and latency using Prometheus and Grafana with exporter metrics.

Advanced Ingress Configurations

Support for canary releases, rate limiting, authentication, and custom error pages via annotations.

What is Helm?

Helm is a package manager for Kubernetes that simplifies deployment of complex apps using Helm Charts.

Helm Architecture

Helm consists of a client, a server component (Tiller in v2), and a repository for charts.

Installing Helm

# Install Helm CLI (Linux/macOS)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Verify installation
helm version

Helm Charts Overview

Charts package Kubernetes manifests with templating for easier configuration and repeatability.

Creating Custom Helm Charts

# Create a new chart scaffold
helm create my-chart

# Directory structure created for templates, values.yaml etc.

Using Helm Repositories

# Add a repo
helm repo add stable https://charts.helm.sh/stable

# Update repo info
helm repo update

# Search charts
helm search repo mysql

Installing and Upgrading Releases

# Install a chart release
helm install my-release stable/mysql

# Upgrade release with new values
helm upgrade my-release stable/mysql -f custom-values.yaml

Helm Values and Templates

Use values.yaml to configure templates dynamically using Go templating syntax.

# Example template snippet (deployment.yaml)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    spec:
      containers:
      - name: app
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"

Helm Hooks and Lifecycle

Hooks let you run jobs before/after install, upgrade, or delete to manage lifecycle events.

Managing Dependencies in Helm

Define dependencies in Chart.yaml to include other charts as subcharts.

Helm Secrets Management

Use plugins like helm-secrets to encrypt sensitive values for secure deployments.

Helm Chart Best Practices

Keep charts modular, reusable, documented, and avoid hardcoded values.

Debugging Helm Charts

# Render templates locally for debugging
helm template my-chart

# Dry-run install to see actions without applying
helm install my-release my-chart --dry-run --debug

Helm and CI/CD Integration

Use Helm in pipelines for automated testing, deployment, and rollback of Kubernetes apps.

Helm Alternatives and Comparisons

Alternatives include Kustomize, Kapp, and Operators — each with different strengths in templating and customization.

Kubernetes Security Overview

Kubernetes security covers securing clusters, workloads, API access, network policies, and runtime defenses.

// Security layers:
// - API Server protection
// - Authentication & Authorization
// - Network segmentation
// - Pod & container hardening

Authentication Methods

Common authentication includes certificates, tokens, OpenID Connect, and service accounts.

// Example: Using client certificates for API access
kubectl config set-credentials user --client-certificate=cert.pem --client-key=key.pem

Authorization: RBAC

Role-Based Access Control (RBAC) governs who can do what within the cluster.

// Enable RBAC (default in modern Kubernetes)
--authorization-mode=RBAC

Role and ClusterRole Definitions

Roles define permissions in a namespace; ClusterRoles are cluster-wide.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

RoleBindings and ClusterRoleBindings

Bind roles to users, groups, or service accounts either namespace-scoped or cluster-wide.

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: User
  name: jane
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Securing the API Server

Secure with TLS, audit logging, authentication, and authorization checks.

// Start API server with flags:
--tls-cert-file=server.crt
--tls-private-key-file=server.key
--authorization-mode=RBAC
--audit-log-path=/var/log/kube-apiserver/audit.log

Network Policies for Security

Control pod communication with Network Policies restricting ingress and egress.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend
  namespace: default
spec:
  podSelector:
    matchLabels:
      role: frontend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: backend

Pod Security Policies (PSP)

Deprecated but used to enforce pod security constraints like privilege escalation and capabilities.

// Example: Disallow privileged pods
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
  - ALL

Security Contexts and Pod Hardening

Specify user IDs, capabilities, and SELinux labels in pod specs to harden pods.

apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  containers:
  - name: app
    image: myapp
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL

Secrets Management and Encryption

Store sensitive info in Secrets; enable encryption at rest for Secrets.

// Create a secret
kubectl create secret generic db-password --from-literal=password='s3cr3t'

// Enable encryption in etcd via encryption config

Kubernetes Audit Logging

Audit all API requests for security and compliance purposes.

// Configure audit policy yaml and start kube-apiserver with --audit-policy-file flag

Image Security and Scanning

Scan container images for vulnerabilities before deployment using tools like Trivy or Clair.

// Scan image with Trivy
trivy image myapp:latest

Security Tools for Kubernetes

Popular tools: kube-bench, kube-hunter, Falco, OPA Gatekeeper.

// Run kube-bench to check CIS Kubernetes benchmarks
kube-bench

Vulnerability Management

Regularly update images, apply patches, and monitor CVEs affecting cluster components.

// Automate scanning and patching in CI/CD pipelines

Security Best Practices

Follow least privilege principle, limit host access, use namespaces, enable logging and monitoring.

// Summary:
// - Use RBAC and limit permissions
// - Encrypt secrets and enable audit logs
// - Use network policies and pod security contexts
// - Scan images and monitor clusters continuously

Monitoring Concepts in Kubernetes

Monitoring ensures cluster health, performance, and alerts for anomalies.

// Monitor resource usage, availability, and events
// Collect metrics from nodes, pods, and control plane

Metrics Server

A lightweight aggregator for resource metrics used for autoscaling and monitoring.

// Deploy metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

// Check metrics
kubectl top nodes
kubectl top pods

Prometheus Monitoring System

Prometheus collects time-series metrics and supports powerful queries.

// Deploy Prometheus via Helm chart
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus

// Access metrics at Prometheus server UI

Alertmanager Setup

Alertmanager manages alerts sent by Prometheus, supports grouping, silencing, and routing.

// Configure Alertmanager.yaml with alert rules and receivers
// Integrate with Slack, email, PagerDuty, etc.

Grafana Dashboards

Grafana visualizes metrics from Prometheus and other sources in customizable dashboards.

// Deploy Grafana and add Prometheus as data source
// Import Kubernetes monitoring dashboards

Logging Architecture in Kubernetes

Logs are collected at node and cluster levels using agents and stored centrally.

// Use Fluentd or Fluent Bit as log collectors running as DaemonSets

Fluentd and Log Aggregation

Fluentd collects, transforms, and forwards logs to storage backends like Elasticsearch.

// Deploy Fluentd DaemonSet with configuration for log forwarding

EFK Stack (Elasticsearch, Fluentd, Kibana)

Popular logging stack for searching, visualizing, and analyzing logs.

// Deploy Elasticsearch, Fluentd, Kibana in cluster
// Kibana UI for log search and dashboards

Centralized Logging Best Practices

Ensure logs are structured, searchable, and protected with retention policies.

// Use JSON format logs and index management in Elasticsearch

Tracing and Distributed Tracing

Tracing helps follow requests through microservices for debugging performance issues.

// Use tools like Jaeger or Zipkin integrated with Kubernetes workloads

Kubernetes Events and Metrics

Events provide real-time cluster notifications, metrics track resource usage and health.

// View events
kubectl get events --all-namespaces

Monitoring StatefulSets and DaemonSets

Track health and metrics of stateful and daemon workloads to ensure stability.

// Use Prometheus exporters and pod metrics for StatefulSets/DaemonSets

Debugging with Logs and Metrics

Combine logs, metrics, and events for comprehensive troubleshooting.

// Use kubectl logs and Prometheus queries together

Autoscaling based on Metrics

Horizontal Pod Autoscaler scales pods based on CPU or custom metrics from metrics server or Prometheus.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Monitoring Tools Comparison

Common tools: Metrics Server (basic), Prometheus + Grafana (advanced), commercial SaaS options.

// Choose tools based on scale, customization needs, and budget

What are Operators?

Operators automate complex Kubernetes application management tasks by encoding human operational knowledge.

// Operator manages app lifecycle beyond basic Kubernetes controllers,
// e.g., backups, upgrades, failovers.

Custom Resource Definitions (CRDs)

CRDs extend the Kubernetes API with custom objects representing domain-specific resources.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: memcacheds.cache.example.com
spec:
  group: cache.example.com
  versions:
    - name: v1alpha1
      served: true
      storage: true
  scope: Namespaced
  names:
    plural: memcacheds
    singular: memcached
    kind: Memcached

Writing Custom Controllers

Controllers watch CRDs and manage resources, ensuring desired state matches actual state.

// Go code snippet with client-go controller watching Memcached CRD (simplified)
func (c *Controller) Run(stopCh <-chan struct{}) {
  // Watch Memcached resources and reconcile
}

Operator Framework Overview

A toolkit for building Kubernetes Operators with SDKs and tools simplifying development.

// Operator SDK CLI to create new operator
operator-sdk init --domain example.com --repo github.com/example/memcached-operator

Using Existing Operators

Deploy community Operators from OperatorHub.io to add functionality without coding.

// Example: Install Prometheus Operator with Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus-operator

Deploying Operators with Helm

Package Operators as Helm charts for simplified deployment and management.

// Helm install operator from chart directory
helm install my-operator ./operator-chart

Operator Lifecycle Manager (OLM)

OLM manages Operator installation, updates, and lifecycle within Kubernetes clusters.

// Install OLM on cluster
kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.20.0/crds.yaml
kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.20.0/olm.yaml

Managing CRD Versions

Operators support multiple CRD versions to enable smooth upgrades and backward compatibility.

// CRD spec versions array includes v1beta1, v1 etc.
// Use conversion webhook for version migration.

Operator Patterns and Best Practices

Follow reconciliation loops, idempotency, and event-driven patterns for stable Operators.

// Ensure Reconcile function is safe to run multiple times without side effects

Debugging Operators

Use logs, events, and Kubernetes API to debug operator behavior and state transitions.

// View operator logs
kubectl logs deployment/my-operator

// Check events
kubectl get events --namespace my-operator-namespace

Use Cases for Operators

Operators manage databases, caches, messaging systems, and complex applications on Kubernetes.

// Examples:
// - MongoDB Operator for managing database clusters
// - Kafka Operator for managing messaging systems

Scaling Operators

Operators scale by managing multiple resources and running multiple controller instances with leader election.

// Use leader election flags in operator deployment YAML

Operator Metrics and Logging

Expose Prometheus metrics and structured logs for monitoring Operator health and performance.

// Operator exposes /metrics endpoint for Prometheus scraping

Security Considerations for Operators

Run Operators with least privilege, using Role-Based Access Control (RBAC) and secure secrets management.

// Example RBAC rule snippet for operator permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
rules:
- apiGroups: ["cache.example.com"]
  resources: ["memcacheds"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Building a Simple Operator

Use Operator SDK to scaffold, implement, and deploy a basic Operator managing a sample resource.

// Create operator
operator-sdk init --domain example.com --repo github.com/example/memcached-operator
operator-sdk create api --group cache --version v1alpha1 --kind Memcached --resource --controller
// Implement reconcile logic in Go
// Build and deploy operator container

Pod Scheduling Overview

Kubernetes schedules pods to nodes based on resource availability, policies, and constraints.

// Default scheduler decides pod placement based on resource requests and node status

Node Affinity and Anti-Affinity

Define rules to prefer or require pods to run on specific nodes or avoid certain nodes.

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

Pod Affinity and Anti-Affinity

Schedule pods relative to other pods based on labels to co-locate or spread workloads.

spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - store
        topologyKey: "kubernetes.io/hostname"

Taints and Tolerations

Prevent pods from scheduling on certain nodes unless they tolerate node taints.

// Add taint to node
kubectl taint nodes node1 key=value:NoSchedule

// Pod toleration example
spec:
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"

Custom Schedulers

Implement your own scheduler logic to customize pod placement.

// Run custom scheduler binary in cluster with specific schedulerName in pod spec

Scheduler Extenders

Extend default scheduler with external HTTP endpoints to influence scheduling decisions.

// Scheduler calls extender with pod and node info for filtering and scoring

Scheduling Policies

Use policies like priorities, weights, and preemption to control scheduling behavior.

// PriorityClasses define scheduling priority for pods
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "High priority class"

Preemption and Eviction

Higher priority pods can preempt lower priority pods to free resources.

// Kubernetes automatically evicts low priority pods if resources needed by higher priority pod

Scheduling Debugging Techniques

Use events, logs, and describe commands to diagnose scheduling issues.

// Check pod events
kubectl describe pod 
// Look for scheduling errors or reasons pods remain pending

Resource Quotas and Limits

Enforce resource consumption limits on namespaces to control cluster resource usage.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 16Gi
    limits.cpu: "8"
    limits.memory: 32Gi

Managing Multiple Schedulers

Run multiple schedulers in a cluster for specialized scheduling.

// Specify schedulerName in pod spec to use alternate scheduler
spec:
  schedulerName: custom-scheduler

Scheduling Best Practices

Use node selectors, affinity, taints, and quotas wisely to optimize resource use and availability.

// Combine affinity and tolerations to achieve workload isolation and availability

Scheduling for GPU and Specialized Hardware

Schedule workloads requiring GPUs or hardware accelerators using node labels and resource requests.

// Request GPU resource in pod spec
resources:
  limits:
    nvidia.com/gpu: 1

Scheduling in Multi-Cluster Environments

Use federated schedulers or multi-cluster controllers to manage scheduling across clusters.

// Federated Kubernetes setup with custom schedulers

Examples and Use Cases

Examples include workload isolation, batch processing scheduling, and GPU-intensive job scheduling.

// Use node affinity for batch jobs to run on less busy nodes

Kubernetes API Overview

The Kubernetes API is the central communication interface for all components, exposing cluster state and operations.

// Kubernetes API server listens on port 6443 by default
// Supports RESTful requests to manage cluster resources

API Groups and Versions

API resources are organized into groups (core, apps, batch) and versions (v1, v1beta1) for stability and extensibility.

// Example API path:
// /apis/apps/v1/deployments
// /api/v1/pods

Working with Kubernetes API using kubectl

kubectl CLI interacts with the API server to manage cluster resources.

kubectl get pods
kubectl create -f deployment.yaml
kubectl delete service my-service

Kubernetes API Authentication

Supports token-based, client certificate, and OIDC authentication for secure access.

// Configure ~/.kube/config with user credentials and certificates

API Aggregation Layer

Allows extending the Kubernetes API by adding custom APIs served by external services.

// Register an APIService object pointing to the external API server

Custom Resource Definitions (CRDs)

CRDs enable users to define their own Kubernetes resource types dynamically.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: widgets.example.com
spec:
  group: example.com
  versions:
  - name: v1
    served: true
    storage: true
  scope: Namespaced
  names:
    plural: widgets
    singular: widget
    kind: Widget

Admission Controllers

Pluggable components that intercept API requests for validation, mutation, or security enforcement.

// Examples: NamespaceLifecycle, LimitRanger, PodSecurityPolicy

Webhooks in Kubernetes

Admission webhooks can dynamically validate or mutate objects during creation or update.

// Define a ValidatingWebhookConfiguration with webhook URL and rules

Dynamic Admission Control

Webhooks enable dynamic admission decisions without changing API server code.

// Useful for policy enforcement and security compliance

Extending Kubernetes with Plugins

Plugins like scheduler plugins, networking plugins extend cluster behavior.

// Example: Using CNI plugins for custom networking

API Rate Limiting and Throttling

Protect API server from overload by limiting request rates and burst sizes.

// Configurable via --max-requests-inflight and --max-mutating-requests-inflight flags

API Server Configuration

Configuration flags control authentication, authorization, auditing, and feature gates.

// Example flag: --authorization-mode=RBAC

Debugging API Requests

Use kubectl verbose mode or API server logs to diagnose request issues.

kubectl get pods -v=8
// Check logs in /var/log/kube-apiserver.log

Client Libraries and SDKs

Client SDKs exist for Go, Python, JavaScript, and others for programmatic access.

// Example: Using client-go for Go applications to manage resources

Practical API Usage Examples

Interact with the API to create, update, delete, and watch resources programmatically.

// Example curl command to list pods
curl --cacert ca.crt --header "Authorization: Bearer TOKEN" https:///api/v1/pods

Introduction to CI/CD Concepts

CI/CD automates code integration, testing, and deployment to Kubernetes clusters.

// CI automates building and testing code
// CD automates deployment to environments

Kubernetes and GitOps

GitOps uses Git repositories as the source of truth for Kubernetes deployment state.

// Tools like Argo CD and Flux sync cluster state to Git

Using Jenkins with Kubernetes

Jenkins pipelines can build Docker images and deploy to Kubernetes using plugins.

pipeline {
  agent any
  stages {
    stage('Build') {
      steps {
        sh 'docker build -t myapp:${env.BUILD_ID} .'
      }
    }
    stage('Deploy') {
      steps {
        sh 'kubectl apply -f k8s/deployment.yaml'
      }
    }
  }
}

GitLab CI/CD Pipelines for Kubernetes

GitLab integrates tightly with Kubernetes clusters for automated builds and deployments.

// .gitlab-ci.yml example with kubectl commands for deploy

Argo CD Overview

Argo CD continuously monitors Git repos and applies changes to Kubernetes clusters automatically.

// Declarative deployment management with Argo CD

FluxCD for GitOps

Flux watches Git and container registries to automate deployments and updates.

// Install Flux in cluster and link to Git repo

Building Container Images in CI/CD

Automate image builds, tagging, and pushing to container registries in CI pipelines.

docker build -t myapp:${CI_COMMIT_SHA} .
docker push myapp:${CI_COMMIT_SHA}

Deploying Applications via Pipelines

Use pipeline stages to deploy updated manifests or Helm charts to Kubernetes.

kubectl apply -f manifests/
helm upgrade myapp ./charts/myapp

Automated Testing in Kubernetes

Run integration, smoke, and end-to-end tests as part of CI/CD pipelines on Kubernetes.

// Execute tests inside Kubernetes pods or as jobs
kubectl run test-runner --image=myapp-tests -- ...

Rollbacks and Rollouts in CI/CD

Leverage Kubernetes rollout strategies to rollback or update applications smoothly.

kubectl rollout status deployment/myapp
kubectl rollout undo deployment/myapp

Secrets Management in CI/CD

Store sensitive data securely using Kubernetes Secrets integrated into pipelines.

kubectl create secret generic db-password --from-literal=password=secret

Pipeline Security Best Practices

Restrict pipeline permissions, use scanned images, and manage secrets carefully.

// Use least privilege service accounts
// Scan container images for vulnerabilities

Monitoring CI/CD Deployments

Track deployment health and pipeline status with monitoring tools like Prometheus and Grafana.

// Set alerts on failed rollouts or pod crashes

Canary and Blue-Green Deployments in Pipelines

Implement advanced deployment patterns to minimize downtime and risk.

// Use tools like Flagger for automated canary releases

Case Studies and Examples

Real-world examples of Kubernetes CI/CD setups in various industries.

// Example: eCommerce platform using Jenkins + Argo CD

Common Kubernetes Issues

Frequent problems include pod crashes, image pull errors, networking failures, and resource shortages.

Debugging Pods and Containers

Use kubectl describe pod and kubectl logs to diagnose pod/container issues.

// Describe a pod to get detailed info and events
kubectl describe pod pod-name -n namespace

// View logs from a container inside a pod
kubectl logs pod-name -n namespace

Network Troubleshooting

Check CNI plugin status, service endpoints, and DNS resolution within the cluster.

// Check network plugin pods status
kubectl get pods -n kube-system -l k8s-app=cni-plugin

// Test DNS resolution in a debug pod
kubectl run dnsutils --image=tutum/dnsutils -it --rm --restart=Never -- nslookup kubernetes.default

Storage Troubleshooting

Inspect PersistentVolume (PV) and PersistentVolumeClaim (PVC) bindings and access modes.

// Check PVC status and events
kubectl describe pvc pvc-name -n namespace

// View PVs and their status
kubectl get pv

API Server Debugging

Review API server logs and check its health endpoint for errors.

// Get logs from API server pod (usually in kube-system namespace)
kubectl logs -n kube-system kube-apiserver-master-node

// Check API server health
curl -k https://localhost:6443/healthz

Using kubectl Debug Commands

Use kubectl debug to start troubleshooting sessions inside nodes or pods.

// Start an ephemeral container in a running pod for debugging
kubectl debug pod-name -n namespace --image=busybox --target=container-name -- sh

Logs and Events Analysis

Use kubectl get events and kubectl logs to understand cluster state changes and errors.

// View recent events in a namespace sorted by time
kubectl get events -n namespace --sort-by=.metadata.creationTimestamp

Node Issues and Health Checks

Check node conditions, resource usage, and kubelet status.

// Get node status and readiness
kubectl get nodes

// Describe a node for detailed info
kubectl describe node node-name

Debugging Scheduler Issues

Check scheduler logs and events to troubleshoot pod scheduling problems.

// Get scheduler logs (usually in kube-system namespace)
kubectl logs -n kube-system kube-scheduler-master-node

DNS Problems and Fixes

Common DNS issues include CoreDNS crash loops or misconfiguration causing resolution failures.

// Check CoreDNS pods status
kubectl get pods -n kube-system -l k8s-app=kube-dns

// Restart CoreDNS pods to fix transient issues
kubectl delete pod -n kube-system -l k8s-app=kube-dns

Debugging Deployments and StatefulSets

Check rollout status and pod statuses to identify deployment issues.

// Check deployment rollout status
kubectl rollout status deployment/deployment-name -n namespace

// Describe StatefulSet to review pod events
kubectl describe statefulset statefulset-name -n namespace

Resource Limits and Quotas Issues

Pods may fail to schedule or get evicted if they exceed resource limits or quotas.

// View resource quotas in namespace
kubectl get resourcequota -n namespace

// Check pod resource requests and limits
kubectl describe pod pod-name -n namespace

Debugging Helm Charts

Inspect Helm release status, hooks, and generated manifests for errors.

// Get Helm release status
helm status release-name

// Template rendering to debug manifests
helm template release-name ./chart-directory

Using Troubleshooting Tools

Use tools like kubectl-debug, stern, k9s, and dashboard UIs for enhanced troubleshooting.

// Install k9s for terminal UI
brew install derailed/k9s/k9s

// Use stern for multi-pod log tailing
stern pod-name-prefix

Best Practices for Troubleshooting

Always check logs, events, and resource statuses; isolate problems by reproducing; use namespaces and labels to scope issues.

Overview of Kubernetes Ecosystem

Kubernetes has a rich ecosystem including tools for CI/CD, monitoring, networking, storage, and security.

Popular Kubernetes Tools and Projects

Tools like Helm, Prometheus, Istio, and Kubeflow complement Kubernetes functionality.

Serverless on Kubernetes (Knative)

Knative enables serverless workloads on Kubernetes, abstracting scaling and event handling.

Kubernetes on Edge Computing

Deploy lightweight Kubernetes clusters at the edge for low-latency and disconnected environments.

Service Mesh Technologies (Istio, Linkerd)

Service meshes provide traffic management, security, and observability for microservices.

Kubernetes and AI/ML Workloads

Kubeflow and other frameworks facilitate deploying and managing AI/ML pipelines on Kubernetes.

Kubernetes and IoT Integration

Manage IoT devices and data streams with Kubernetes-based platforms for scale and reliability.

Cloud Provider Kubernetes Services (EKS, GKE, AKS)

Managed Kubernetes services simplify cluster setup and maintenance on AWS, Google Cloud, and Azure.

Kubernetes Federation

Federation enables managing multiple clusters across regions or clouds as a single entity.

Multi-Cloud Kubernetes

Run workloads distributed across multiple cloud providers to avoid vendor lock-in and increase resilience.

Kubernetes in Hybrid Cloud

Combine on-premises and cloud Kubernetes clusters for flexible infrastructure management.

Upcoming Kubernetes Features

Features like server-side apply, ephemeral containers, and better security are on the roadmap.

Community and Contribution

Kubernetes has a vibrant open-source community with SIGs, working groups, and regular releases.

Certification and Learning Paths

Certified Kubernetes Administrator (CKA) and Developer (CKAD) exams validate skills.

Future of Kubernetes

Kubernetes will continue evolving to better support edge, AI, multi-cloud, and developer experience improvements.

The site is under development.

Kubernetes Tutorial

Chapter 1: Introduction to Kubernetes

What is Kubernetes?

History and Evolution of Kubernetes

Kubernetes vs Traditional Container Orchestration

Core Concepts: Pods, Nodes, Clusters

Kubernetes Architecture Overview

Kubernetes Components: Master & Worker Nodes

Kubernetes API Server

Etcd: The Kubernetes Key-Value Store

Scheduler and Controller Manager

Kubernetes Networking Basics

Kubernetes Storage Basics

Kubernetes Resource Model

Kubernetes YAML Files Introduction

Installing Kubernetes (Minikube, Kind)

Kubernetes CLI: kubectl Basics

Chapter 2: Kubernetes Cluster Architecture

Kubernetes Master Components Deep Dive

Worker Node Components

Control Plane vs Data Plane

Cluster Nodes and Node Pools

Node Registration and Health Checks

Pod Lifecycle and Scheduling

Kubernetes Networking Model

Service Discovery in Kubernetes

Cluster DNS and CoreDNS

Container Runtime Interface (CRI)

kube-proxy and Networking Plugins

High Availability in Kubernetes Clusters

Cluster Autoscaling Concepts

Multi-Cluster Kubernetes

Monitoring Cluster Health

Chapter 3: Kubernetes Pods

What is a Pod?

Pod Lifecycle

Multi-Container Pods

Init Containers

Pod Spec and Pod Templates

Pod Networking

Pod Security Contexts

Managing Pod Resources (CPU/Memory)

Pod Scheduling and Affinity

Pod Disruption Budgets

Using Labels and Annotations with Pods

Debugging Pods

Pod Logs and Events

Sidecar Containers Pattern

Best Practices for Pods

Chapter 4: Kubernetes Services

What is a Service?

Types of Services (ClusterIP, NodePort, LoadBalancer)

Headless Services

ExternalName Services

Service Discovery Mechanisms

Service Selectors and Endpoints

DNS for Services

Using Services with Ingress

Load Balancing in Kubernetes

Service Mesh Basics (Istio/Linkerd)

Creating and Exposing Services

Debugging Services

Network Policies and Service Security

Service Metrics and Monitoring

Best Practices for Services

Chapter 5: Deployments and ReplicaSets

What is a Deployment?

Deployment Spec and YAML

Rolling Updates and Rollbacks

Strategies for Updates

Managing ReplicaSets

Scaling Deployments

Deployment Revision History

Pausing and Resuming Deployments

Canary Deployments

Blue-Green Deployments

Using Annotations in Deployments

Deployment Rollout Status

Deployment Autoscaling