Hanso Group

Kubernetes Cost Optimization Strategies

Julian Lindner 23 minutes read

Kubernetes has become the de facto standard for container orchestration, enabling organizations to deploy and manage applications at scale. However, without proper cost management, Kubernetes clusters can lead to significant and often unnecessary cloud expenditure. In this article, we’ll explore practical strategies for optimizing Kubernetes costs without compromising application performance or reliability.

Understanding Kubernetes Cost Drivers

Before diving into optimization strategies, it’s essential to understand the primary factors that drive Kubernetes costs:

1. Compute Resources

Compute resources—CPU and memory—typically constitute the largest portion of Kubernetes costs. These costs are driven by:

  • Node size and count: The number and size of worker nodes in your cluster
  • Resource requests and limits: How much CPU and memory your workloads request and are allowed to use
  • Resource utilization: The actual usage compared to allocated resources
  • Instance types: The specific VM types used for your nodes (e.g., general purpose vs. compute-optimized)
2. Storage Costs

Storage costs include:

  • Persistent volumes: The size, performance tier, and number of persistent volumes
  • Storage classes: Different storage classes with varying performance characteristics and costs
  • Backup storage: Storage used for backup and disaster recovery
3. Network Costs

Network costs are often overlooked but can be significant:

  • Data transfer: Costs for data moving between availability zones, regions, or to the internet
  • Load balancers: Costs for load balancer provisioning and data processing
  • Network policies: Potential performance impacts and associated costs
4. Management Overhead

Additional costs include:

  • Control plane costs: Some managed Kubernetes services charge for the control plane
  • Monitoring and logging: Costs for storing and processing metrics and logs
  • CI/CD pipeline execution: Resources consumed during builds and deployments

Resource Right-Sizing Strategies

The foundation of Kubernetes cost optimization is ensuring workloads have the right amount of resources allocated—neither too much nor too little.

Resource Request Right-Sizing

Resource requests in Kubernetes determine the minimum amount of resources guaranteed to a container. Setting these correctly is crucial for cost optimization:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: example-app:1.0
        resources:
          requests:
            cpu: 100m    # 0.1 CPU cores
            memory: 256Mi
          limits:
            cpu: 500m    # 0.5 CPU cores
            memory: 512Mi

Best practices for setting resource requests:

  1. Start with metrics: Base your resource requests on actual observed usage rather than guesswork. Monitor your applications to understand their resource consumption patterns.

  2. Consider percentile-based allocation: Rather than provisioning for peak usage, consider using a high percentile (e.g., 95th) of observed usage as your baseline.

  3. Account for application scaling: Consider how your application scales under load. Some applications scale CPU usage linearly with traffic, while others may have different scaling characteristics.

Implementing Vertical Pod Autoscaling

The Vertical Pod Autoscaler (VPA) automatically adjusts resource requests based on observed usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: example-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: example-app
  updatePolicy:
    updateMode: "Auto"  # or "Off" for recommendations only
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 50m
          memory: 128Mi
        maxAllowed:
          cpu: 1000m
          memory: 1Gi

VPA can be configured in three modes:

  • Auto: Automatically applies recommendations by restarting pods
  • Recreate: Similar to Auto, but all pods are restarted when changes are applied
  • Off: Provides recommendations without automatically applying them

For production workloads, it’s often wise to start with “Off” mode to review recommendations before applying them.

Setting Appropriate Resource Limits

Resource limits define the maximum resources a container can use. Setting these properly prevents resource hogging without unnecessarily constraining workloads:

  1. CPU limits: CPU is a compressible resource, meaning when under contention, it can be throttled. Set CPU limits to allow for occasional bursts while preventing a single container from consuming all available CPU.

  2. Memory limits: Memory is non-compressible—once allocated, it cannot be reclaimed without terminating the process. Set memory limits high enough to prevent OOMKill events but low enough to prevent a single pod from consuming all node memory.

  3. Consider limit-to-request ratio: A common practice is to set limits at 2-3x the request level for CPU and 1.5-2x for memory.

Implementing Horizontal Pod Autoscaling

Horizontal Pod Autoscaler (HPA) adjusts the number of replicas based on metrics, complementing right-sizing efforts:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA enables cost-effective scaling by:

  • Scaling down during low traffic periods: Reducing the number of pods when demand is low
  • Scaling up during high traffic periods: Adding pods to handle increased load
  • Supporting custom metrics: Scaling based on application-specific metrics like queue length or request rate

Cluster Scaling and Optimization

Optimizing at the cluster level provides significant cost-saving opportunities.

Implementing Cluster Autoscaling

The Cluster Autoscaler automatically adjusts the number of nodes based on pod scheduling requirements:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
  namespace: kube-system
data:
  config.yaml: |
    scaleDownUtilizationThreshold: 0.5
    scaleDownUnneededTime: 5m
    scaleDownDelayAfterAdd: 5m
    scaleDownDelayAfterDelete: 0s
    scaleDownDelayAfterFailure: 3m
    scaleDownUnreadyTime: 20m

The Cluster Autoscaler works by:

  1. Monitoring for pods that fail to schedule due to insufficient resources
  2. Adding nodes when needed to accommodate unschedulable pods
  3. Identifying and removing underutilized nodes when pods can be rescheduled to other nodes
Node Optimization Strategies

Selecting the right node types and configurations can significantly impact costs:

  1. Leverage spot/preemptible instances: These instances offer significant discounts (often 60-90% compared to on-demand pricing) but can be reclaimed by the cloud provider with minimal notice. They’re ideal for stateless, fault-tolerant workloads.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: batch-processor
    spec:
      replicas: 5
      template:
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: node.kubernetes.io/instance-type
                    operator: In
                    values:
                    - spot
          tolerations:
          - key: "spot"
            operator: "Equal"
            value: "true"
            effect: "NoSchedule"
          containers:
          - name: batch-processor
            image: batch-processor:1.0
  2. Use node taints and tolerations: Mark specific nodes for certain workloads to ensure efficient resource allocation:

    # Node with a taint
    kubectl taint nodes node1 workload=batch:NoSchedule
    
    # Pod with a matching toleration
    apiVersion: v1
    kind: Pod
    metadata:
      name: batch-job
    spec:
      tolerations:
      - key: "workload"
        operator: "Equal"
        value: "batch"
        effect: "NoSchedule"
      containers:
      - name: batch-container
        image: batch-processor:1.0
  3. Implement node pools for workload types: Create separate node pools optimized for different workload characteristics:

    • General purpose: For standard web applications and services
    • Compute-optimized: For CPU-intensive workloads
    • Memory-optimized: For workloads with high memory requirements
    • GPU nodes: For machine learning and other specialized workloads
Pod Scheduling Optimization

How pods are scheduled across nodes significantly affects resource utilization:

  1. Pod affinity and anti-affinity: Control pod placement to optimize resource usage:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-app
    spec:
      replicas: 3
      template:
        spec:
          affinity:
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 100
                podAffinityTerm:
                  labelSelector:
                    matchExpressions:
                    - key: app
                      operator: In
                      values:
                      - web-app
                  topologyKey: "kubernetes.io/hostname"
          containers:
          - name: web-app
            image: web-app:1.0

    This configuration encourages spreading pods across different nodes, improving availability while preventing resource contention.

  2. Pod priority and preemption: Assign priorities to pods to ensure critical workloads get resources first:

    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: high-priority
    value: 1000000
    globalDefault: false
    description: "This priority class is for critical production workloads."
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: critical-service
    spec:
      priorityClassName: high-priority
      containers:
      - name: critical-service
        image: critical-service:1.0
  3. Topology spread constraints: Ensure pods are distributed across nodes, zones, or regions:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-app
    spec:
      replicas: 6
      template:
        spec:
          topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: topology.kubernetes.io/zone
            whenUnsatisfiable: ScheduleAnyway
            labelSelector:
              matchLabels:
                app: web-app
          containers:
          - name: web-app
            image: web-app:1.0

Workload-Specific Optimization

Different types of workloads benefit from different optimization approaches.

Batch and Job Workloads

Batch jobs and other non-continuous workloads offer unique optimization opportunities:

  1. Use Jobs and CronJobs appropriately:

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: log-analyzer
    spec:
      schedule: "0 2 * * *"  # Run at 2 AM daily
      concurrencyPolicy: Forbid
      jobTemplate:
        spec:
          ttlSecondsAfterFinished: 86400  # Auto-delete after 24 hours
          template:
            spec:
              containers:
              - name: log-analyzer
                image: log-analyzer:1.0
                resources:
                  requests:
                    cpu: 2000m
                    memory: 4Gi
              restartPolicy: OnFailure
  2. Consider specialized node pools for batch processing:

    • Use spot/preemptible instances for cost-effective batch processing
    • Schedule batch jobs during off-peak hours when resource costs may be lower
    • Implement node auto-provisioning for batch workloads
  3. Optimize job parallelism and completion indexes:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: data-processor
    spec:
      parallelism: 5
      completions: 10
      template:
        spec:
          containers:
          - name: processor
            image: data-processor:1.0
            command: ["processor", "--chunk-index=$(JOB_COMPLETION_INDEX)"]
          restartPolicy: Never
Stateful Workloads

Stateful applications like databases require special consideration:

  1. Optimize persistent volume claims:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: database-storage
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: standard-ssd
      resources:
        requests:
          storage: 100Gi
    • Choose the appropriate storage class based on performance requirements
    • Start with smaller volumes and leverage volume expansion when needed
    • Consider using volume snapshots for efficient backups
  2. Enable pod disruption budgets for stateful workloads:

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: database-pdb
    spec:
      minAvailable: 2
      selector:
        matchLabels:
          app: database
  3. Use appropriate StatefulSet configurations:

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: database
    spec:
      serviceName: "database"
      replicas: 3
      updateStrategy:
        type: RollingUpdate
      template:
        spec:
          containers:
          - name: database
            image: database:1.0
            resources:
              requests:
                cpu: 1000m
                memory: 2Gi
            volumeMounts:
            - name: data
              mountPath: /var/lib/database
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: "standard-ssd"
          resources:
            requests:
              storage: 100Gi
Service Meshes and API Gateways

Service meshes and API gateways can have significant resource implications:

  1. Right-size proxy sidecars:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: istio-sidecar-injector
    data:
      values: |-
        pilot:
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
  2. Consider control plane costs:

    • Use shared control planes across multiple clusters where appropriate
    • Implement resource limits for control plane components
    • Evaluate managed service mesh offerings versus self-managed deployments

Implementing Cost Visibility and Governance

Effective cost management requires visibility and governance mechanisms.

Cost Monitoring and Allocation

Implement tools and practices for cost visibility:

  1. Use Kubernetes labels for cost allocation:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: payment-service
      labels:
        app: payment-service
        department: finance
        environment: production
        cost-center: cc-123456
  2. Implement cost monitoring tools:

    • Kubecost: Provides Kubernetes-native cost monitoring and allocation
    • CloudHealth: Offers cross-cloud cost management
    • AWS Cost Explorer, GCP Cost Management, Azure Cost Management: Cloud-provider-specific cost tools
  3. Set up cost anomaly detection:

    Configure alerts for unexpected cost increases or resource usage patterns that may indicate inefficiency or issues.

Implementing Cost Governance

Establish governance processes to ensure ongoing cost optimization:

  1. Resource quotas for namespaces:

    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: team-quota
      namespace: team-a
    spec:
      hard:
        requests.cpu: "10"
        requests.memory: 20Gi
        limits.cpu: "20"
        limits.memory: 40Gi
        pods: "20"
        services: "10"
        count/persistentvolumeclaims: "15"
        persistentvolumeclaims: "15"
  2. Limit ranges for containers:

    apiVersion: v1
    kind: LimitRange
    metadata:
      name: default-limits
      namespace: team-a
    spec:
      limits:
      - default:
          cpu: 500m
          memory: 512Mi
        defaultRequest:
          cpu: 100m
          memory: 256Mi
        type: Container
  3. Admission controllers for policy enforcement:

    • OPA Gatekeeper: Enforce policies on resource configurations
    • Kyverno: Policy management with audit and enforcement capabilities

    Example Gatekeeper constraint template:

    apiVersion: templates.gatekeeper.sh/v1
    kind: ConstraintTemplate
    metadata:
      name: requireresources
    spec:
      crd:
        spec:
          names:
            kind: RequireResources
      targets:
        - target: admission.k8s.gatekeeper.sh
          rego: |
            package requireresources
            violation[{"msg": msg}] {
              container := input.review.object.spec.containers[_]
              not container.resources.requests
              msg := sprintf("Container %v does not have resource requests", [container.name])
            }

    Applying the constraint:

    apiVersion: constraints.gatekeeper.sh/v1beta1
    kind: RequireResources
    metadata:
      name: require-resources
    spec:
      match:
        kinds:
          - apiGroups: [""]
            kinds: ["Pod"]

Advanced Cost Optimization Techniques

For organizations with mature Kubernetes deployments, consider these advanced techniques.

Multi-Cluster and Multi-Cloud Strategies

Distribute workloads across multiple clusters or clouds for cost optimization:

  1. Workload-specific clusters:

    • Development/testing clusters with cost-optimized configurations
    • Production clusters with high-availability configurations
    • Batch processing clusters using spot/preemptible instances
  2. Regional price arbitrage:

    • Deploy non-latency-sensitive workloads in regions with lower costs
    • Use global load balancing to route traffic to the most cost-effective regions
  3. Cloud-specific optimizations:

    • AWS Savings Plans or Reserved Instances
    • GCP Committed Use Discounts
    • Azure Reserved VM Instances
FinOps Practices for Kubernetes

Implement FinOps (Financial Operations) practices for ongoing optimization:

  1. Regular cost reviews: Schedule weekly or monthly reviews of Kubernetes costs.

  2. Showback and chargeback mechanisms: Implement systems to attribute costs to specific teams or departments.

  3. Cost-aware CI/CD pipelines: Integrate cost estimation into your deployment processes to prevent costly configuration changes from reaching production.

  4. Ephemeral environments: Create temporary environments for testing and development that are automatically deprovisioned when not in use.

Real-Time Cost Optimization

Implement systems for dynamic cost optimization:

  1. Workload rescheduling based on spot market prices:

    Use tools like AWS Spot Fleet or GCP preemptible instance managers to dynamically reschedule workloads based on current spot market prices.

  2. Time-based scaling:

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: time-based-scaling
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
      minReplicas: 3
      maxReplicas: 20
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 70
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 50
            periodSeconds: 60
        scaleUp:
          stabilizationWindowSeconds: 60
          policies:
          - type: Percent
            value: 100
            periodSeconds: 30
          - type: Pods
            value: 5
            periodSeconds: 30
          selectPolicy: Max
  3. Cost-aware application design:

    • Implement graceful degradation capabilities
    • Design applications to function with variable resources
    • Build intelligence into applications to adapt to resource availability

Case Study: E-Commerce Platform Cost Optimization

To illustrate these principles in action, let’s examine how a fictional e-commerce company optimized their Kubernetes costs.

Initial State

The company operated a Kubernetes cluster with the following characteristics:

  • 20 nodes (m5.2xlarge on AWS) running 24/7
  • All production services deployed with generous resource requests
  • No distinction between critical and non-critical workloads
  • Separate development and testing clusters with similar configurations
  • Monthly Kubernetes infrastructure cost: $25,000
Optimization Steps Implemented
  1. Resource right-sizing:

    • Analyzed actual resource usage using Prometheus and right-sized requests/limits
    • Implemented VPA in recommendation mode, then applied suggestions after validation
    • Result: 30% reduction in resource requests across the cluster
  2. Cluster optimization:

    • Implemented Cluster Autoscaler with appropriate settings
    • Created separate node pools for different workload types
    • Used spot instances for stateless workloads
    • Result: Reduced average node count from 20 to 12, with dynamic scaling based on demand
  3. Workload-specific optimizations:

    • Moved batch processing jobs to off-peak hours
    • Implemented pod anti-affinity for critical services
    • Optimized storage classes for different workloads
    • Result: Improved resource utilization and reduced storage costs by 25%
  4. Governance and monitoring:

    • Implemented Kubecost for detailed cost monitoring
    • Established namespace quotas for different teams
    • Created monthly cost review process
    • Result: Greater cost awareness and prevention of resource sprawl
Results

After implementing these optimizations, the company achieved:

  • 45% reduction in monthly Kubernetes infrastructure costs (from $25,000 to $13,750)
  • Improved application performance due to better resource allocation
  • Greater visibility into cost drivers
  • Sustainable governance process for maintaining optimizations

Conclusion

Kubernetes cost optimization is a continuous process that requires a combination of technical implementations, governance practices, and organizational awareness. By applying the strategies outlined in this article—resource right-sizing, cluster optimization, workload-specific optimizations, and cost governance—organizations can significantly reduce their Kubernetes costs while maintaining or even improving application performance and reliability.

Remember that effective cost optimization is not a one-time exercise but an ongoing practice. Cloud providers regularly introduce new instance types and pricing models, while application requirements evolve over time. Regular review and refinement of your cost optimization strategy will ensure sustainable savings and efficient resource utilization in your Kubernetes environments.

References

  1. Kubernetes Documentation. (2025). Resource Management for Pods and Containers. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

  2. The FinOps Foundation. (2024). Kubernetes Cost Allocation White Paper. https://www.finops.org/kubernetes-cost-allocation

  3. Kubernetes SIG Autoscaling. (2025). Cluster Autoscaler Documentation. https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

  4. Gartner. (2024). How to Realize Cost Savings After Migrating to Kubernetes. Gartner Research Publication.

  5. CNCF. (2025). Cloud Native Survey: Cost Optimization Practices. Cloud Native Computing Foundation.

Back to all articles