10 Hidden Gems of Kubernetes: Supercharge Your Cluster Like a Pro

Introduction

The Ubiquity of Kubernetes

Kubernetes has become the backbone of modern infrastructure. It's everywhere---powering everything from small startups experimenting with containerized applications to global enterprises managing thousands of microservices.

Its ability to orchestrate workloads efficiently and scale applications seamlessly has made it indispensable.

But here's a question: Are we really using Kubernetes to its full potential? Most of us rely on the basics---deployments, services, and maybe horizontal scaling. But deep down, it is so much more!

That's great for getting started, but beneath the surface lies a treasure trove of lesser-known features that can revolutionize how we manage applications.

Just look at this meme.

So while Kubernetes can be sometimes an enigma, and hard to grasp, if you get the hold of it, it solves so much of your architectural pain points without breaking a sweat.

Setting the Context: Practical Benefits

The beauty of Kubernetes lies in its design---it's not just a tool for running containers; it's an ecosystem.

These hidden features often address real-world challenges: how to ensure uptime during disruptions, prioritize critical workloads, or even debug a live pod without restarting it.

When I started exploring these capabilities, I realized how much easier and more efficient cluster management could be.

These are not just "nice-to-haves"; they're practical tools that can save time, improve reliability, and reduce operational headaches.

This blog is about uncovering those gems---features that take Kubernetes from good to exceptional.

Whether you're a beginner or a Kubernetes pro, these insights will help you rethink how to get the most out of your clusters.

Let's dive in!

Feature 1: Pod Disruption Budgets (PDBs): Guaranteeing Uptime

The Problem: Voluntary Disruptions and Downtime

One of the challenges in running Kubernetes clusters is handling disruptions gracefully. These disruptions can be voluntary---like node maintenance, scaling down resources, or rolling out updates.

While Kubernetes is designed to keep your applications running, these disruptions can unintentionally lead to downtime if too many pods in a critical service are taken offline simultaneously.

Imagine running an e-commerce platform during a big sale. A node update causes half of your frontend pods to restart at the same time.

Even though the disruption is planned, the impact is real: users face degraded service or, worse, downtime. This is where Pod Disruption Budgets (PDBs) come into play.

How PDBs Solve It

Pod Disruption Budgets allow you to define the minimum availability requirements for your application during voluntary disruptions.

They ensure that Kubernetes won't evict more pods than your application can handle at any given time.

For example, let's say you have a deployment with three replicas of your application. You can set a PDB to guarantee that at least two replicas remain available, even during maintenance.

This ensures minimal impact on your service's availability while still allowing updates and scaling operations to proceed.

Here's how a simple PDB might look:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

In this configuration:

minAvailable: 2 ensures that at least two pods must remain running during disruptions.
selector targets the pods of your application based on their labels.

By setting this, Kubernetes manages evictions intelligently, respecting your application's availability requirements.

Real-Life Use Cases

1. Node Maintenance Without Service Downtime

Let's say your team needs to perform maintenance on a node hosting critical workloads. With PDBs in place, Kubernetes will migrate pods off the node, but only at a rate that keeps your service operational.

2. Rolling Updates Without Overloading the System

During application updates, PDBs ensure that enough replicas stay online to handle user requests while the new version rolls out.

3. Multi-Tenant Environments

In clusters shared across teams or applications, PDBs help enforce fair resource usage. Each team can define its own availability standards, ensuring no single application monopolizes the cluster during maintenance.

Why PDBs Are a Must-Have

Without PDBs, Kubernetes doesn't have the context to prioritize your application's uptime during disruptions. This can lead to unintended downtimes or performance bottlenecks.

By using PDBs, you get granular control over how disruptions are managed, ensuring a smoother, more predictable experience for your users.

In short, PDBs are like guardrails for your cluster---keeping things running smoothly even when disruptions are unavoidable.

Whether you're managing a single application or a multi-tenant cluster, they're an essential feature to master.

Feature 2: Dynamic Admission Controllers: Enforcing Cluster-Wide Policies

What Are Admission Controllers?

Admission controllers are Kubernetes' unsung heroes. They act as gatekeepers for your cluster, intercepting API requests before they are persisted.

Think of them as the bouncers at a nightclub, ensuring only the right configurations and workloads make it into the cluster.

Dynamic Admission Controllers in Kubernetes

Kubernetes offers two types of admission controllers:

Mutating Admission Controllers: Modify incoming requests dynamically.
Validating Admission Controllers: Validate requests against policies and reject those that fail.

By using admission controllers, you can enforce organization-wide policies, improve security, and maintain consistency across your deployments.

For example, if you want to ensure every pod has a specific label (like team: backend), an admission controller can validate that during deployment.

Injecting Custom Validations

Dynamic Admission Controllers take this concept further by enabling webhook-based validation. This means you can create your own policies and inject them into the cluster.

Here's how it works:

A MutatingAdmissionWebhook can modify an object before it's saved (e.g., adding default labels or annotations).
A ValidatingAdmissionWebhook ensures that requests meet your custom policies (e.g., restricting images to a specific registry).

Here's an example webhook configuration:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: example-validation
webhooks:
  - name: validate.example.com
    clientConfig:
      service:
        name: validation-service
        namespace: default
        path: "/validate"
      caBundle: <BASE64_CA_CERT>
    rules:
      - apiGroups: ["*"]
        apiVersions: ["*"]
        operations: ["CREATE", "UPDATE"]
        resources: ["pods"]
    admissionReviewVersions: ["v1"]
    sideEffects: None

In this setup:

The webhook validates all pod creation and update operations.
Requests failing the validation logic defined in the webhook service are rejected with an appropriate error message.

Dynamic Security and Compliance

Admission controllers are particularly useful for enforcing security and compliance. Here are some practical examples:

1. Enforcing Resource Limits

Prevent pods from running without resource requests and limits, ensuring no workload starves others of CPU or memory.

2. Restricting Container Images

Allow only trusted images from specific registries, preventing malicious or unvetted containers from running.

3. Ensuring Metadata Consistency

Automatically inject labels, annotations, or environment variables into pods to standardize deployments across teams.

4. Enforcing Namespace Policies

Require specific policies, like enabling network policies for every namespace.

For instance, if your organization requires all pods to use a specific logging sidecar, a Mutating Admission Webhook can inject the sidecar configuration automatically.

Real-World Benefits of Dynamic Admission Controllers

Improved Governance: Enforce consistent policies across clusters without relying solely on developer compliance.
Stronger Security Posture: Eliminate vulnerabilities by controlling what runs in your cluster.
Operational Efficiency: Reduce manual checks by automating validation and modification of API requests.

Dynamic Admission Controllers give you the power to make Kubernetes your own. By tailoring its behavior to your organization's needs, you can create a cluster that's not just secure and reliable but also aligned with your operational standards.

Feature 3: Ephemeral Containers: Debugging Like a Pro

Debug Without Redeployment

Ephemeral Containers are one of Kubernetes' most underappreciated features. They allow you to debug running pods without restarting or redeploying them.

Unlike regular containers in a pod, ephemeral containers are temporary and don't modify the pod's definition.

Think of them as your debugging toolkit, ready to be dropped into a live pod when things go south.

Here's a scenario: A production pod starts acting strangely, but the application logs aren't enough to pinpoint the issue.

In the past, you might have redeployed the pod with extra debugging tools or SSHed into the host node (if you're into living dangerously).

With ephemeral containers, you can attach a temporary container with debugging tools directly to the running pod.

To create an ephemeral container, use the kubectl debug command:

kubectl debug -it <pod-name> --image=busybox --target=<container-name>

This command adds a busybox ephemeral container to the specified pod, targeting a particular container if needed. No redeployment. No downtime. Just pure debugging power.

Common Debugging Scenarios

Ephemeral Containers shine in situations where traditional debugging methods fall short. Here are some common use cases:

1. Inspecting File Systems

Need to check or modify files inside a running container? Attach an ephemeral container with tools like ls, cat, or vi to inspect the filesystem without restarting the application.

2. Network Troubleshooting

Debugging network connectivity issues? Attach a container with tools like curl, netstat, or ping to test connectivity between pods or external services.

3. Environment Variables and Configurations

Sometimes the root cause of a bug lies in misconfigured environment variables. Use an ephemeral container to quickly inspect the running environment.

4. Crash Investigation

If a container is repeatedly crashing, an ephemeral container can be used to explore the pod before it crashes again, helping you capture valuable diagnostic data.

Best Practices

While ephemeral containers are incredibly powerful, they should be used with caution. Here are some tips to ensure safe and effective debugging:

1. Use Minimal Images

Use lightweight images like busybox or alpine for ephemeral containers to minimize resource usage and avoid unnecessary overhead.

2. Avoid Sensitive Data Exposure

Ephemeral containers can access the pod's environment, including secrets and sensitive data. Be mindful of this when granting access or running debugging commands.

3. Remove Debugging Containers After Use

Ephemeral containers are not persisted after the pod restarts, but it's a good habit to clean up debugging containers once you're done.

4. Use Role-Based Access Control (RBAC)

Ensure that only authorized users can add ephemeral containers. This prevents misuse or accidental changes to running workloads.

5. Document Debugging Sessions

For production-grade clusters, maintain a record of debugging sessions. Document the commands and observations to streamline post-mortem analysis.

Feature 4: Resource Quotas: The Art of Fair Allocation

The Problem of Resource Starvation

In a Kubernetes cluster shared across teams or applications, resource contention is a frequent challenge.

Without proper resource controls, a single application can hog CPU, memory, or storage, leaving other workloads starved and degraded.

Imagine a scenario where a team accidentally deploys a memory-intensive application that consumes all available resources.

Suddenly, other critical workloads in the cluster crash or underperform, leading to cascading failures.

This problem is particularly acute in multi-tenant clusters where multiple teams or applications share the same infrastructure.

This is where Resource Quotas come to the rescue.

They allow administrators to allocate and enforce resource limits at the namespace level, ensuring fair and predictable resource distribution across the cluster.

Configuring Quotas for Multi-Tenant Clusters

Resource Quotas are configured at the namespace level, specifying the maximum and/or minimum amount of resources that workloads in a namespace can consume.

These quotas apply to CPU, memory, ephemeral storage, persistent volumes, and even object counts (e.g., number of pods or services).

Here's an example of a Resource Quota for a namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: example-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"       # Total CPU requested by all workloads
    requests.memory: 20Gi    # Total memory requested by all workloads
    limits.cpu: "20"         # Maximum CPU limit
    limits.memory: 40Gi      # Maximum memory limit
    persistentvolumeclaims: 5
    pods: 50                 # Max number of pods in the namespace

In this configuration:

requests.cpu and requests.memory define the guaranteed minimum resources for workloads.
limits.cpu and limits.memory define the upper cap on resource consumption.
Object quotas like pods or persistentvolumeclaims enforce limits on the number of Kubernetes objects within the namespace.

This setup ensures that team-specific namespaces operate within their allocated resources, preventing resource starvation for other tenants.

Quality of Service (QoS) Classes

Kubernetes uses QoS Classes to prioritize workloads based on their resource requests and limits. These classes play a crucial role in how Resource Quotas impact workloads during resource contention.

1. Guaranteed

A pod falls into the Guaranteed class if it specifies equal values for both resource requests and limits for all containers.
Use Case: Critical workloads requiring high priority and zero disruption (e.g., payment processing services).

2. Burstable

A pod is classified as Burstable if it specifies resource requests but sets higher limits or omits them for some containers.
Use Case: Applications that can handle temporary slowdowns but benefit from resource bursts (e.g., batch processing jobs).

3. Best-Effort

Pods that don't specify resource requests or limits fall into the Best-Effort class.
Use Case: Low-priority workloads that are expendable during resource contention (e.g., log processors).

Impact of Resource Quotas on QoS Classes:

Resource Quotas enforce Guaranteed and Burstable pods to stay within their allocations.
Best-Effort pods are the first to be evicted during resource contention, making quotas especially useful in ensuring critical workloads retain priority.

Practical Examples

1. Preventing Overprovisioning in Multi-Tenant Clusters

In a shared environment, assign each team a namespace with specific quotas to ensure no single team dominates resources:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-b-quota
  namespace: team-b
spec:
  hard:
    requests.cpu: "5"
    limits.cpu: "10"
    pods: 30

This guarantees Team B has enough resources while maintaining cluster-wide balance.

2. Enforcing Object Count Limits

Limit the number of high-resource objects (e.g., persistent volume claims) to control storage usage:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
  namespace: storage-heavy-team
spec:
  hard:
    persistentvolumeclaims: 10

This configuration avoids excessive storage claims that might impact other workloads.

3. Managing Dynamic Workloads

In environments with fluctuating workloads, set generous resource limits but conservative requests to allow applications to scale dynamically without exceeding total capacity.

4. Combining Resource Quotas with Cluster Autoscalers

Use Resource Quotas in tandem with Kubernetes Cluster Autoscaler to ensure that scaling decisions respect namespace-level allocations. This prevents unexpected cluster expansions due to rogue workloads.

Why Resource Quotas Matter

Resource Quotas are essential for managing fair resource allocation in shared Kubernetes environments. By defining clear boundaries, they help:

Prevent resource starvation and overprovisioning.
Enforce fairness in multi-tenant clusters.
Align resource usage with organizational priorities.

Coupled with Kubernetes QoS Classes, Resource Quotas offer a robust framework for ensuring that workloads not only coexist but thrive in a shared cluster.

As Kubernetes adoption grows, understanding and leveraging Resource Quotas is crucial for building resilient and scalable infrastructure.

Feature 5: Service Topology: Optimizing Traffic Flow

Understanding Traffic Routing in Kubernetes

Kubernetes services abstract away the complexity of networking, allowing pods to communicate with each other seamlessly.

But in distributed systems, especially in multi-region or multi-zone setups, traditional round-robin load balancing may not be sufficient.

Routing traffic inefficiently can lead to high latency, unnecessary cross-zone data transfer costs, and even degraded user experiences.

Service Topology in Kubernetes addresses these challenges by enabling smarter traffic routing based on the physical or logical location of nodes and pods.

Instead of blindly routing traffic, Kubernetes can consider factors like zone, region, or specific node labels to optimize the data path.

For instance, if your cluster spans multiple availability zones, Service Topology ensures that requests from a pod in Zone A are served by pods in the same zone whenever possible.

This reduces cross-zone traffic and improves latency.

How Topology Keys Work

Service Topology leverages topology keys, which are node labels that represent the physical or logical attributes of a node, such as:

kubernetes.io/hostname: Refers to the specific node's hostname.
topology.kubernetes.io/zone: Represents the availability zone.
topology.kubernetes.io/region: Represents the geographic region.

By defining topology preferences in a service's configuration, you can control how traffic is routed. Here's an example of a Service configured to prioritize zone-based traffic routing:

apiVersion: v1
kind: Service
metadata:
  name: example-service
spec:
  selector:
    app: my-app
  topologyKeys:
    - "topology.kubernetes.io/zone"
    - "topology.kubernetes.io/region"
    - "kubernetes.io/hostname"
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

In this configuration:

Zone-Level Preference: Traffic is routed to pods in the same zone.
Region-Level Preference: If no pods are available in the same zone, traffic is routed within the same region.
Host-Level Preference: As a last resort, traffic is routed to a specific node.

Benefits: Latency and Fault Tolerance

1. Reduced Latency

By prioritizing local traffic (e.g., within the same zone), Service Topology minimizes the network distance between pods.

This is particularly important for latency-sensitive applications, such as real-time analytics or gaming.

2. Cost Optimization

Cross-zone traffic can incur significant costs in cloud environments. By keeping traffic within zones or regions, Service Topology reduces these expenses.

3. Improved Fault Tolerance

Service Topology adds a layer of resilience by routing traffic intelligently during failures.

For instance, if a zone becomes unavailable, traffic is automatically redirected to healthy pods in other zones or regions.

4. Enhanced Performance for Multi-Zone Deployments

Applications with global or regional user bases benefit from faster response times as traffic is routed to the nearest available pods, reducing the load on distant resources.

5. Better Utilization of Resources

Service Topology ensures more efficient use of cluster resources by reducing the overhead caused by suboptimal routing.

Real-World Use Case

Imagine an e-commerce application with users distributed across multiple regions.

Without Service Topology, requests from a user in the US might be served by pods in Europe due to random load balancing, leading to higher latency and slower responses.

With Service Topology:

Requests from US-based users are served by pods in the US region.
European users receive responses from pods in Europe.
In the event of a regional failure, users are routed to the next nearest region with available pods, ensuring continuity of service.

Why Service Topology Matters

As applications scale across zones and regions, efficient traffic routing becomes a critical factor for maintaining performance, cost-effectiveness, and user experience.

Service Topology gives Kubernetes the intelligence to route traffic where it makes the most sense, aligning networking behavior with your application's architecture and user base.

Mastering Service Topology is essential for organizations looking to scale globally or operate in highly distributed environments.

By leveraging topology keys, you can make your cluster not only smarter but also more resilient and cost-efficient.

Feature 6: Horizontal Pod Autoscaler (HPA) with Custom Metrics

Moving Beyond CPU and Memory Metrics

Kubernetes' Horizontal Pod Autoscaler (HPA) is a powerful tool for scaling workloads automatically based on resource utilization. By default, HPA scales pods based on CPU and memory usage, which is sufficient for many use cases. However, modern applications often require scaling based on domain-specific metrics, such as:

Request latency
Queue length
Active user sessions
Custom application performance indicators (APIs)

Relying solely on CPU or memory metrics may not fully capture your application's workload or performance needs. For instance, a queue-based application might have low CPU usage but still require scaling to handle a surge in incoming requests. This is where custom metrics come into play, allowing you to define and act on metrics that are meaningful for your application.

Scaling on Domain-Specific Metrics

Kubernetes enables custom metrics through the Custom Metrics API, which works in conjunction with metrics providers like:

By integrating custom metrics, you can scale your pods based on specific application needs. For example:

An API backend can scale based on request per second (RPS).
A task processing service can scale based on the length of a message queue.
A game server can scale based on the number of active players.

For instance, consider scaling an application based on the average number of active sessions:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: active-sessions-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metricName: active_sessions
      target:
        type: AverageValue
        averageValue: 50

In this configuration:

metricName: active_sessions refers to a custom metric tracked by the metrics provider.
The HPA ensures the average number of active sessions per pod does not exceed 50.

Step-by-Step Configuration

Here's how to configure HPA with custom metrics in your Kubernetes cluster:

1. Install and Configure a Metrics Adapter

Choose and set up a metrics provider that supports custom metrics. For example, if you're using Prometheus:

Install the Prometheus Adapter to expose custom metrics to the Kubernetes API.
Configure the adapter to map Prometheus queries to Kubernetes custom metrics.

Prometheus Adapter example configuration:

rules:
  - seriesQuery: 'active_sessions_total'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "active_sessions"
      as: "active_sessions"
    metricsQuery: 'sum(rate(active_sessions_total{<<.LabelMatchers>>}[5m]))'

This setup maps the Prometheus metric active_sessions_total to a custom Kubernetes metric active_sessions.

2. Expose the Custom Metric

Ensure your application emits the metric you want to use. For example, export active_sessions_total via Prometheus instrumentation in your application code.

3. Verify Metric Availability

Check if the custom metric is available in Kubernetes:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<namespace>/pods/*/active_sessions"

You should see the current value of the metric.

4. Define an HPA with the Custom Metric

Create an HPA manifest (like the one above) specifying your custom metric as the scaling trigger.

5. Monitor Scaling Behavior

Once deployed, monitor the HPA to ensure it reacts appropriately to the custom metric. Use:

kubectl get hpa

This shows the current status, replicas, and metric values driving the scaling decisions.

Real-Life Example: Queue-Based Scaling

Consider a task processing service that uses RabbitMQ. You want to scale pods based on the queue length:

Instrument RabbitMQ to export metrics (e.g., queue_length).
Use a metrics adapter to expose queue_length as a custom Kubernetes metric.
Create an HPA targeting the queue_length metric with a threshold value (e.g., scale if queue length exceeds 100 tasks per pod).

Example HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-based-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: task-processor
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metricName: queue_length
      target:
        type: AverageValue
        averageValue: 100

Why Custom Metrics Matter

Custom metrics unlock the true potential of Kubernetes' HPA by tailoring scaling logic to your application's unique demands. By scaling workloads on domain-specific metrics, you can:

Optimize resource usage.
Improve application performance.
Ensure predictable scaling behavior during demand spikes.

Whether you're scaling APIs, processing queues, or running real-time applications, integrating custom metrics with HPA gives you unparalleled control over your cluster's scalability.

Feature 7: Immutable ConfigMaps and Secrets: Stability First

The Risks of Mutable Configuration

In Kubernetes, ConfigMaps and Secrets are essential for storing non-sensitive and sensitive configuration data, respectively.

They decouple application configuration from the application code, allowing developers to manage configuration changes without redeploying the application.

However, mutable ConfigMaps and Secrets can introduce risks:

Unintended Updates: If a ConfigMap or Secret is updated, all pods referencing it dynamically pick up the changes. This can lead to unexpected behavior, especially if the new configuration contains errors or is incompatible with the current application version.
Debugging Challenges: Mutable configurations make it harder to reproduce issues since the state of the configuration at the time of the problem may have changed.
Inconsistent State: In scenarios where multiple pods use the same ConfigMap or Secret, updates may propagate unevenly, causing inconsistent behavior across the application.

These risks make it essential to consider immutability for critical or stable configurations.

When to Use Immutable ConfigMaps

Immutable ConfigMaps and Secrets provide a safeguard against accidental updates. When marked immutable, their content cannot be changed after creation. Any attempt to modify an immutable ConfigMap or Secret results in an error, ensuring stability and consistency.

When should you use them?

Stable Configurations: Use immutable ConfigMaps for configurations that are not expected to change, such as feature flags or application settings.
Production Secrets: Ensure critical secrets like API keys, database passwords, and encryption keys remain immutable to avoid unintended exposure or changes.
Audit and Compliance: Immutable configurations support better traceability and help meet compliance requirements by ensuring configurations remain unaltered.

Practical Scenarios and Configuration

1. Creating an Immutable ConfigMap

To create an immutable ConfigMap, set the immutable field to true:

apiVersion: v1
kind: ConfigMap
metadata:
  name: immutable-config
data:
  FEATURE_FLAG: "true"
  APP_ENV: "production"
immutable: true

With this configuration:

The ConfigMap cannot be updated or modified.
If you need to make changes, you must delete and recreate the ConfigMap with the updated data.

2. Creating an Immutable Secret

Similarly, for Secrets, you can enable immutability by adding the immutable field:

apiVersion: v1
kind: Secret
metadata:
  name: immutable-secret
type: Opaque
data:
  api-key: c2VjcmV0LWFwaS1rZXk=  # Base64 encoded
immutable: true

This ensures that the Secret remains unchanged once deployed.

3. Use Cases in Practice

Feature Toggles

For a microservices application, you may use a ConfigMap to enable or disable specific features:

apiVersion: v1
kind: ConfigMap
metadata:
  name: feature-toggle
data:
  enable-feature-x: "true"
immutable: true

By making it immutable, you guarantee consistency across all services referencing this ConfigMap.

Database Configuration

For a database connection, store sensitive information in an immutable Secret:

apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
data:
  username: ZGJfdXNlcg==  # Base64 encoded
  password: cGFzc3dvcmQxMjM=  # Base64 encoded
immutable: true

This prevents accidental changes to critical database credentials.

Debugging and Rollbacks

Immutable ConfigMaps and Secrets allow you to maintain an audit trail of configuration versions. When debugging or rolling back, you can recreate previous versions by referencing historical configurations.

Best Practices for Immutable ConfigMaps and Secrets

Adopt Versioning: Use a versioned naming convention (e.g., config-v1, config-v2) to manage updates and rollbacks.
Combine with Deployment Strategies: Use immutable configurations in conjunction with rolling updates or blue-green deployments to ensure seamless application updates.
Enforce in CI/CD Pipelines: Integrate immutability checks in your CI/CD pipelines to prevent accidental modifications to critical configurations.
Avoid Hardcoding Data: Store all environment-specific values in ConfigMaps and Secrets to ensure immutability doesn't lead to rigid application design.

Why Immutable ConfigMaps and Secrets Matter

Immutability brings stability and predictability to Kubernetes configurations, reducing the risk of unintended disruptions in your applications. By leveraging immutable ConfigMaps and Secrets, you:

Ensure consistent application behavior across pods.
Minimize downtime caused by configuration changes.
Enhance security by protecting critical configurations from accidental or malicious updates.

For production-grade systems, immutability is not just a best practice---it's a necessity for maintaining reliability and control over your Kubernetes deployments.

Feature 8: Kubernetes Jobs and CronJobs: Automating the Mundane

Scheduling One-Off and Recurring Tasks

In Kubernetes, Jobs and CronJobs are your go-to resources for automating tasks. They are designed to handle both one-off tasks and recurring schedules, allowing you to offload routine maintenance, data processing, and alerting tasks to Kubernetes.

Jobs: Handling One-Off Tasks

A Job ensures that a specific task runs to completion, even if the node running the task fails. Kubernetes will restart the job's pod(s) until the task completes successfully.

Here's an example of a simple job that processes a batch task:

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-task
spec:
  template:
    spec:
      containers:
      - name: batch-job
        image: busybox
        command: ["sh", "-c", "echo 'Processing batch...' && sleep 10"]
      restartPolicy: OnFailure

CronJobs: Automating Recurring Tasks

CronJobs build on Jobs by introducing a scheduling mechanism, similar to cron in Linux. They are ideal for tasks that need to run at specific intervals, such as daily database backups or periodic log cleanups.

Here's an example CronJob to create daily backups:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"  # Runs daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: my-backup-tool
            command: ["sh", "-c", "backup-script.sh"]
          restartPolicy: OnFailure

With this configuration, Kubernetes ensures the job runs daily at 2 AM, retrying on failure if needed.

Common Use Cases (Backups, Cleanup, Alerts)

Jobs and CronJobs excel in automating mundane tasks, reducing manual intervention. Here are some typical scenarios:

1. Backups

Schedule periodic database or application backups using CronJobs. For instance:

Daily backups at a specific time.
Snapshots of Persistent Volumes stored in S3 or another cloud service.

2. Log Cleanup

Avoid bloating disk space by periodically cleaning up old logs or temporary files:

command: ["sh", "-c", "find /var/log -type f -mtime +30 -delete"]

This command deletes logs older than 30 days.

3. Alerts and Notifications

Trigger alerts or send periodic notifications for monitoring purposes. For example, you can run a script every 5 minutes to check the health of an external service and send alerts to a Slack channel if the service is down.

4. Batch Data Processing

Jobs are ideal for one-time data processing tasks, such as:

Generating reports.
Running ETL (Extract, Transform, Load) pipelines.
Migrating data between services.

5. System Maintenance

Use CronJobs to perform routine cluster maintenance, like scaling resources or applying security patches during off-peak hours.

Best Practices for Scaling Jobs

To maximize the efficiency and reliability of Jobs and CronJobs, follow these best practices:

1. Use `restartPolicy` Judiciously

For jobs, always set restartPolicy: OnFailure to ensure tasks are retried upon failure without introducing infinite loops.

2. Limit Resource Consumption

Define CPU and memory requests/limits for job containers to avoid starving other workloads:

resources:
  requests:
    memory: "128Mi"
    cpu: "500m"
  limits:
    memory: "256Mi"
    cpu: "1"

3. Avoid CronJob Overlaps

Use the concurrencyPolicy field to handle overlapping executions:

Allow: Allows overlapping jobs.
Forbid: Ensures only one job runs at a time.
Replace: Stops the currently running job and replaces it with the new one.

Example:

spec:
  concurrencyPolicy: Forbid

4. Use Deadlines

Set activeDeadlineSeconds to limit the maximum runtime for jobs, preventing them from running indefinitely in case of errors.

5. Monitor and Log Outputs

Integrate Jobs and CronJobs with your logging and monitoring systems (e.g., Prometheus, Fluentd) to track their execution and debug failures.

6. Handle Job Cleanup

Jobs can accumulate over time, cluttering your cluster. Use the ttlSecondsAfterFinished field to clean up completed jobs automatically:

spec:
  ttlSecondsAfterFinished: 3600  # Job is deleted 1 hour after completion

7. Scale Jobs for Large Workloads

For jobs that require processing large datasets, use parallelism to split tasks across multiple pods:

spec:
  parallelism: 4
  completions: 8

This example ensures that the job runs 4 pods in parallel until all 8 completions are finished.

Why Jobs and CronJobs Matter

Jobs and CronJobs are indispensable tools for automating repetitive tasks in Kubernetes. By leveraging them, you:

Reduce manual overhead for routine operations.
Improve consistency and reliability in task execution.
Free up developer time to focus on more critical activities.

Whether you're managing backups, processing data, or performing system maintenance, mastering Jobs and CronJobs is key to creating a self-sufficient and automated Kubernetes environment.

Feature 9: Custom Resource Definitions (CRDs): Extending Kubernetes' Capabilities

Why CRDs Are Game-Changers

Kubernetes is a versatile platform, but its true power lies in its extensibility. Custom Resource Definitions (CRDs) take this flexibility to the next level by allowing you to define your own custom resources. With CRDs, Kubernetes can manage not just its built-in objects like Pods, Services, and Deployments but also your domain-specific objects, seamlessly integrating them into the Kubernetes API.

Why is this a game-changer?

Domain-Specific Customization: Tailor Kubernetes to your specific use case, whether it's a database, machine learning pipeline, or CI/CD workflow.
Unified Management: Manage custom resources alongside native Kubernetes resources using the same tools (e.g., kubectl, dashboards).
Automation: Combine CRDs with Custom Controllers to automate complex workflows, creating operators that act like human operators for your systems.

With CRDs, Kubernetes transforms into a platform not just for orchestrating containers but for orchestrating any system.

Building Domain-Specific Extensions

CRDs allow you to extend Kubernetes' API by defining your own resource types. A CRD is essentially a schema that tells Kubernetes how to validate, store, and manage your custom resources. Once you create a CRD, your custom resource becomes a first-class citizen in your Kubernetes cluster.

Creating a CRD

Here's an example of a simple CRD for managing a custom resource called Database:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  group: example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              name:
                type: string
              engine:
                type: string
              version:
                type: string
              storageSize:
                type: string
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames:
    - db

group: The API group for your resource (e.g., example.com).
names: The resource's plural, singular, and kind names.
schema: Defines the structure of the custom resource (e.g., fields like name, engine, version, etc.).

Defining a Custom Resource

Once the CRD is applied, you can create instances of the Database resource:

apiVersion: example.com/v1
kind: Database
metadata:
  name: my-database
spec:
  name: my-database
  engine: postgres
  version: "13"
  storageSize: "10Gi"

This custom resource is now part of your Kubernetes cluster and can be managed using standard commands:

kubectl get databases
kubectl describe database my-database

Example: A Custom Database Operator

Operators are Kubernetes controllers that use CRDs to manage the entire lifecycle of an application or service. Let's walk through an example of building a Database Operator to automate database deployment and management.

Step 1: Define the CRD

The CRD for Database is already defined above. It allows users to specify the database engine, version, and storage size.

Step 2: Write a Custom Controller

The controller watches for changes to Database resources and performs actions to reconcile the desired state with the current state. Here's a simplified Python example using the Kubernetes Python client:

from kubernetes import client, config, watch

def create_database(name, engine, version, storage_size):
    print(f"Creating {engine} database '{name}' with {storage_size} storage, version {version}")
    # Implement database creation logic (e.g., using cloud provider APIs or Helm charts)

def main():
    config.load_kube_config()
    v1 = client.CustomObjectsApi()
    watcher = watch.Watch()
    
    for event in watcher.stream(v1.list_cluster_custom_object, group="example.com", version="v1", plural="databases"):
        obj = event['object']
        event_type = event['type']
        if event_type == "ADDED":
            spec = obj['spec']
            create_database(spec['name'], spec['engine'], spec['version'], spec['storageSize'])

if __name__ == "__main__":
    main()

This controller listens for ADDED events on Database resources and creates the corresponding database.

Step 3: Automate Updates and Backups

Extend the controller to handle updates (e.g., version upgrades) and recurring backups by responding to MODIFIED and DELETED events.

This allows you to automate the entire database lifecycle, reducing manual intervention.

Benefits of CRDs and Operators

Domain-Specific Workflows: Customize Kubernetes to manage complex, domain-specific systems natively.
Automation: Automate repetitive and error-prone tasks with custom controllers and operators.
Scalability: CRDs and operators scale with your cluster, ensuring consistent management across environments.
Community and Ecosystem: Leverage existing operators for popular systems (e.g., databases, message queues) or contribute your own.

Why CRDs Are Essential

CRDs and operators turn Kubernetes into a universal platform capable of orchestrating any workload, not just containers. By integrating your domain-specific needs directly into Kubernetes, you:

Simplify operational complexity.
Reduce the need for external tooling.
Create a unified system for managing all aspects of your infrastructure.

Whether you're managing custom databases, scaling machine learning pipelines, or automating CI/CD processes, CRDs unlock limitless possibilities for extending Kubernetes' capabilities.

Feature 10: Dynamic Volume Provisioning: Storage on Autopilot

Pain Points in Storage Management

Storage has always been one of the most challenging aspects of managing infrastructure, especially in dynamic environments where workloads and resource needs can change rapidly. Traditional storage management involves:

Manual Provisioning: Administrators need to pre-provision storage volumes, which can lead to overprovisioning (wasting resources) or underprovisioning (causing service disruptions).
Static Configurations: Once a storage volume is provisioned, resizing or modifying it often requires manual intervention or downtime.
Complex Storage Systems: Configuring storage backends for high availability, performance, and scalability can be time-consuming and error-prone.

In Kubernetes, where workloads are ephemeral and dynamic by nature, static storage management isn't a viable solution. This is where Dynamic Volume Provisioning steps in to automate and simplify storage operations.

How Kubernetes Makes It Dynamic

Kubernetes automates the lifecycle of storage volumes using Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). With Dynamic Volume Provisioning, Kubernetes can create storage volumes on-the-fly based on user requests, eliminating the need for pre-provisioned volumes.

How It Works

Dynamic provisioning leverages Storage Classes, which act as blueprints for creating storage volumes. A Storage Class specifies:

The storage backend (e.g., AWS EBS, Google Persistent Disk, Ceph, NFS).
Parameters such as replication, encryption, and performance tiers.
The reclaim policy (e.g., Retain or Delete).

When a PVC requests storage, Kubernetes dynamically provisions a volume that matches the requirements defined in the PVC and Storage Class.

Here's an example Storage Class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-storage
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  iopsPerGB: "10"
  encrypted: "true"
reclaimPolicy: Delete

provisioner: Specifies the backend (e.g., AWS EBS).
parameters: Defines storage-specific options (e.g., performance and encryption).
reclaimPolicy: Determines what happens to the volume after the PVC is deleted.

Simplifying Persistent Volume Claims

PVCs allow users to request storage without worrying about the underlying details of provisioning and management. Here's how you can simplify storage operations with PVCs:

1. Requesting a Persistent Volume

A PVC specifies the required storage size, access mode, and optional storage class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: fast-storage

accessModes: Defines how the volume can be accessed (e.g., ReadWriteOnce, ReadWriteMany).
resources.requests.storage: Specifies the requested size.
storageClassName: Links the PVC to a specific Storage Class.

When this PVC is created, Kubernetes dynamically provisions a volume using the associated Storage Class and binds it to the PVC.

2. Using the Volume in a Pod

Once a PVC is created, it can be referenced in a Pod to mount the provisioned volume:

apiVersion: v1
kind: Pod
metadata:
  name: app-pod
spec:
  containers:
  - name: app-container
    image: my-app-image
    volumeMounts:
    - mountPath: "/data"
      name: app-storage
  volumes:
  - name: app-storage
    persistentVolumeClaim:
      claimName: my-pvc

In this configuration:

The pod mounts the dynamically provisioned volume at /data.
Storage is automatically managed by Kubernetes.

Benefits of Dynamic Volume Provisioning

1. Eliminate Manual Intervention

No more pre-provisioning or manual allocation of storage volumes. Kubernetes handles everything based on the PVCs.

2. Scalability

Dynamic provisioning scales with your workloads. As applications grow or new services are deployed, Kubernetes ensures the required storage is provisioned and attached automatically.

3. Flexibility

By using Storage Classes, you can define multiple tiers of storage (e.g., high-performance SSDs, cost-effective HDDs) and let users choose based on their needs.

4. Resilience

Dynamic provisioning integrates with backend storage systems to ensure high availability, replication, and disaster recovery.

5. Simplified Management

Developers only need to interact with PVCs, abstracting away the complexities of storage backends and configurations.

Real-World Use Case

Scenario: Scaling a Database

Imagine running a MySQL database in Kubernetes. As the application scales, you need additional storage for the database.

Without Dynamic Provisioning:

Manually create a volume in your storage backend (e.g., AWS EBS).
Attach it to the cluster and create a PV.
Bind the PV to a PVC.

With Dynamic Provisioning:

Create a PVC specifying the storage size and class.
Kubernetes dynamically provisions the volume and binds it to the PVC.
The volume is ready to be used by the database pod.

Example: Expanding Storage Dynamically

If the database grows, you can resize the PVC (supported by many storage backends):

kubectl edit pvc my-pvc

Update the resources.requests.storage field, and Kubernetes will expand the underlying volume automatically.

Why Dynamic Provisioning Matters

Dynamic Volume Provisioning transforms Kubernetes into a storage powerhouse:

It reduces operational overhead by automating storage allocation and management.
It ensures seamless scalability and flexibility for applications with varying storage needs.
It abstracts away the complexities of underlying storage backends, empowering developers to focus on building applications.

Whether you're running databases, data analytics, or stateful microservices, Dynamic Volume Provisioning is an essential feature for modern Kubernetes deployments.

By leveraging it, you unlock a level of efficiency and reliability that makes managing storage in dynamic environments effortless.

Conclusion

Why Mastering These Features Matters

Kubernetes is far more than a container orchestration platform---it's a powerful ecosystem capable of revolutionizing how we manage, scale, and optimize modern applications.

However, to truly unlock its potential, you need to look beyond the basics.

Mastering the hidden gems of Kubernetes, such as Pod Disruption Budgets, Dynamic Admission Controllers, and Custom Resource Definitions, not only elevates your skills but also empowers your organization to:

Achieve Reliability: Minimize downtime and ensure consistent performance with features like Service Topology and Resource Quotas.
Boost Efficiency: Automate mundane tasks with Jobs and CronJobs, and scale dynamically using custom metrics in the Horizontal Pod Autoscaler.
Enhance Security: Strengthen governance with Immutable ConfigMaps and Secrets or enforce cluster-wide policies with Admission Controllers.
Streamline Operations: Simplify storage management through Dynamic Volume Provisioning and extend Kubernetes' capabilities with CRDs.

By adopting these advanced features, you not only address the operational pain points but also set a strong foundation for innovation and scalability.

The Competitive Edge: Kubernetes Beyond Basics

In today's fast-paced tech landscape, standing still is not an option. Businesses demand agility, resilience, and cost efficiency from their infrastructure, and Kubernetes delivers---if used to its full potential.

What Sets You Apart

Proactive Problem-Solving: With tools like Ephemeral Containers and CRDs, you're prepared for debugging and managing domain-specific needs efficiently.
Operational Excellence: Features like HPA with custom metrics and Dynamic Volume Provisioning ensure your applications are not just functional but optimized for any scenario.
Future-Ready Skills: As Kubernetes continues to evolve, mastering its lesser-known features positions you as a leader in cloud-native development and operations.

For Organizations

Reduced TCO (Total Cost of Ownership): Advanced Kubernetes features automate resource allocation, optimize workloads, and reduce manual intervention, saving time and money.
Improved Customer Experience: Reliable, scalable applications result in fewer disruptions and better performance, delighting end-users.
Faster Time-to-Market: By leveraging automation and advanced orchestration, teams can focus on delivering value rather than firefighting operational issues.

Final Thoughts: Innovate, Automate, Dominate

Kubernetes is an enigma---a platform that can appear daunting but reveals its elegance and power when explored deeply.

By mastering the hidden features discussed in this blog, you're not just becoming proficient in Kubernetes---you're setting the stage for innovation.

Key Takeaways

Kubernetes is more than just pods and deployments---it's an entire ecosystem designed for scalability, reliability, and automation.
Leveraging its hidden features transforms it from a basic orchestration tool into a robust platform capable of meeting the most demanding business needs.
The journey to Kubernetes mastery is continuous, but the rewards---efficiency, reliability, and competitive advantage---are worth the effort.

As you implement these features in your clusters, remember: Kubernetes is your partner in innovation. Use it not just to keep the lights on but to dominate your industry with resilient, scalable, and cutting-edge applications.

Call to Action

Ready to take the next step? Whether you're just starting or already scaling globally, Kubernetes has something to offer. Dive deep, experiment, and innovate---because the future belongs to those who automate.

That's it for 10 Hidden Gems of Kubernetes! Which feature are you most excited to try out? Connect with me on LinkedIn for more Kubernetes insights!

🔙 Back to all blogs | 🏠 Home Page

About the Author

Sagarnil Das

Sagarnil Das is a seasoned AI enthusiast with over 12 years of experience in Machine Learning and Deep Learning.

He has built scalable AI ecosystems for global giants like the NHS and developed innovative mental health applications that detect emotions and suicidal tendencies.

Some of his other accomplishments includes:

Ex-NASA researcher
Ex-Udacity Mentor
Intel Edge AI scholarship winner
Kaggle Notebooks expert

When he's not immersed in data or crafting AI models, Sagarnil enjoys playing the guitar and doing street photography.

An avid Dream Theater fan, he believes in blending logic with creativity to solve complex problems.

You can find out more about Sagarnil here.

To contact him regarding any guidance, questions, feedbacks or challenges, you can contact him by clicking the chat icon on the bottom right of the screen.

Connect with Sagarnil:

GitHub

Twitter

Share this article

Subscribe to my newsletter 👇

Introduction

The Ubiquity of Kubernetes

Setting the Context: Practical Benefits

Feature 1: Pod Disruption Budgets (PDBs): Guaranteeing Uptime

The Problem: Voluntary Disruptions and Downtime

How PDBs Solve It

Real-Life Use Cases

1. Node Maintenance Without Service Downtime

2. Rolling Updates Without Overloading the System

3. Multi-Tenant Environments

Why PDBs Are a Must-Have

Feature 2: Dynamic Admission Controllers: Enforcing Cluster-Wide Policies

What Are Admission Controllers?

Injecting Custom Validations

Dynamic Security and Compliance

1. Enforcing Resource Limits

2. Restricting Container Images

3. Ensuring Metadata Consistency

4. Enforcing Namespace Policies

Real-World Benefits of Dynamic Admission Controllers

Feature 3: Ephemeral Containers: Debugging Like a Pro

Debug Without Redeployment

Common Debugging Scenarios

1. Inspecting File Systems

2. Network Troubleshooting

3. Environment Variables and Configurations

4. Crash Investigation

Best Practices

1. Use Minimal Images

2. Avoid Sensitive Data Exposure

3. Remove Debugging Containers After Use

4. Use Role-Based Access Control (RBAC)

5. Document Debugging Sessions

Feature 4: Resource Quotas: The Art of Fair Allocation

The Problem of Resource Starvation

Configuring Quotas for Multi-Tenant Clusters

Quality of Service (QoS) Classes

1. Guaranteed

2. Burstable

3. Best-Effort

Practical Examples

1. Preventing Overprovisioning in Multi-Tenant Clusters

2. Enforcing Object Count Limits

3. Managing Dynamic Workloads

4. Combining Resource Quotas with Cluster Autoscalers

Why Resource Quotas Matter

Feature 5: Service Topology: Optimizing Traffic Flow

Understanding Traffic Routing in Kubernetes

How Topology Keys Work

Benefits: Latency and Fault Tolerance

1. Reduced Latency

2. Cost Optimization

3. Improved Fault Tolerance

4. Enhanced Performance for Multi-Zone Deployments

5. Better Utilization of Resources

Real-World Use Case

Why Service Topology Matters

Feature 6: Horizontal Pod Autoscaler (HPA) with Custom Metrics

Moving Beyond CPU and Memory Metrics

Scaling on Domain-Specific Metrics

Step-by-Step Configuration

1. Install and Configure a Metrics Adapter

2. Expose the Custom Metric

3. Verify Metric Availability

4. Define an HPA with the Custom Metric

5. Monitor Scaling Behavior

Real-Life Example: Queue-Based Scaling

Why Custom Metrics Matter

Feature 7: Immutable ConfigMaps and Secrets: Stability First

The Risks of Mutable Configuration

When to Use Immutable ConfigMaps

Practical Scenarios and Configuration

1. Creating an Immutable ConfigMap

2. Creating an Immutable Secret

3. Use Cases in Practice

Feature Toggles

Database Configuration

Debugging and Rollbacks

1. Use `restartPolicy` Judiciously