Hanso Group · Microsoft 365 Experts

Kubernetes Cost Optimization Strategies

November 10, 2024 • Julian Lindner • 23 minutes read

Kubernetes has become the de facto standard for container orchestration, enabling organizations to deploy and manage applications at scale. However, without proper cost management, Kubernetes clusters can lead to significant and often unnecessary cloud expenditure. In this article, we’ll explore practical strategies for optimizing Kubernetes costs without compromising application performance or reliability.

Understanding Kubernetes Cost Drivers

Before diving into optimization strategies, it’s essential to understand the primary factors that drive Kubernetes costs:

1. Compute Resources

Compute resources—CPU and memory—typically constitute the largest portion of Kubernetes costs. These costs are driven by:

Node size and count: The number and size of worker nodes in your cluster
Resource requests and limits: How much CPU and memory your workloads request and are allowed to use
Resource utilization: The actual usage compared to allocated resources
Instance types: The specific VM types used for your nodes (e.g., general purpose vs. compute-optimized)

2. Storage Costs

Storage costs include:

Persistent volumes: The size, performance tier, and number of persistent volumes
Storage classes: Different storage classes with varying performance characteristics and costs
Backup storage: Storage used for backup and disaster recovery

3. Network Costs

Network costs are often overlooked but can be significant:

Data transfer: Costs for data moving between availability zones, regions, or to the internet
Load balancers: Costs for load balancer provisioning and data processing
Network policies: Potential performance impacts and associated costs

4. Management Overhead

Additional costs include:

Control plane costs: Some managed Kubernetes services charge for the control plane
Monitoring and logging: Costs for storing and processing metrics and logs
CI/CD pipeline execution: Resources consumed during builds and deployments

Resource Right-Sizing Strategies

The foundation of Kubernetes cost optimization is ensuring workloads have the right amount of resources allocated—neither too much nor too little.

Resource Request Right-Sizing

Resource requests in Kubernetes determine the minimum amount of resources guaranteed to a container. Setting these correctly is crucial for cost optimization:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: example-app:1.0
        resources:
          requests:
            cpu: 100m    # 0.1 CPU cores
            memory: 256Mi
          limits:
            cpu: 500m    # 0.5 CPU cores
            memory: 512Mi

Best practices for setting resource requests:

Start with metrics: Base your resource requests on actual observed usage rather than guesswork. Monitor your applications to understand their resource consumption patterns.
Consider percentile-based allocation: Rather than provisioning for peak usage, consider using a high percentile (e.g., 95th) of observed usage as your baseline.
Account for application scaling: Consider how your application scales under load. Some applications scale CPU usage linearly with traffic, while others may have different scaling characteristics.

Implementing Vertical Pod Autoscaling

The Vertical Pod Autoscaler (VPA) automatically adjusts resource requests based on observed usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: example-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: example-app
  updatePolicy:
    updateMode: "Auto"  # or "Off" for recommendations only
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 50m
          memory: 128Mi
        maxAllowed:
          cpu: 1000m
          memory: 1Gi

VPA can be configured in three modes:

Auto: Automatically applies recommendations by restarting pods
Recreate: Similar to Auto, but all pods are restarted when changes are applied
Off: Provides recommendations without automatically applying them

For production workloads, it’s often wise to start with “Off” mode to review recommendations before applying them.

Setting Appropriate Resource Limits

Resource limits define the maximum resources a container can use. Setting these properly prevents resource hogging without unnecessarily constraining workloads:

CPU limits: CPU is a compressible resource, meaning when under contention, it can be throttled. Set CPU limits to allow for occasional bursts while preventing a single container from consuming all available CPU.
Memory limits: Memory is non-compressible—once allocated, it cannot be reclaimed without terminating the process. Set memory limits high enough to prevent OOMKill events but low enough to prevent a single pod from consuming all node memory.
Consider limit-to-request ratio: A common practice is to set limits at 2-3x the request level for CPU and 1.5-2x for memory.

Implementing Horizontal Pod Autoscaling

Horizontal Pod Autoscaler (HPA) adjusts the number of replicas based on metrics, complementing right-sizing efforts:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA enables cost-effective scaling by:

Scaling down during low traffic periods: Reducing the number of pods when demand is low
Scaling up during high traffic periods: Adding pods to handle increased load
Supporting custom metrics: Scaling based on application-specific metrics like queue length or request rate

Cluster Scaling and Optimization

Optimizing at the cluster level provides significant cost-saving opportunities.

Implementing Cluster Autoscaling

The Cluster Autoscaler automatically adjusts the number of nodes based on pod scheduling requirements:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
  namespace: kube-system
data:
  config.yaml: |
    scaleDownUtilizationThreshold: 0.5
    scaleDownUnneededTime: 5m
    scaleDownDelayAfterAdd: 5m
    scaleDownDelayAfterDelete: 0s
    scaleDownDelayAfterFailure: 3m
    scaleDownUnreadyTime: 20m

The Cluster Autoscaler works by:

Monitoring for pods that fail to schedule due to insufficient resources
Adding nodes when needed to accommodate unschedulable pods
Identifying and removing underutilized nodes when pods can be rescheduled to other nodes

Node Optimization Strategies

Selecting the right node types and configurations can significantly impact costs:

Leverage spot/preemptible instances: These instances offer significant discounts (often 60-90% compared to on-demand pricing) but can be reclaimed by the cloud provider with minimal notice. They’re ideal for stateless, fault-tolerant workloads.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 5
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.kubernetes.io/instance-type
                operator: In
                values:
                - spot
      tolerations:
      - key: "spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      containers:
      - name: batch-processor
        image: batch-processor:1.0

Use node taints and tolerations: Mark specific nodes for certain workloads to ensure efficient resource allocation:

# Node with a taint
kubectl taint nodes node1 workload=batch:NoSchedule

# Pod with a matching toleration
apiVersion: v1
kind: Pod
metadata:
  name: batch-job
spec:
  tolerations:
  - key: "workload"
    operator: "Equal"
    value: "batch"
    effect: "NoSchedule"
  containers:
  - name: batch-container
    image: batch-processor:1.0

Implement node pools for workload types: Create separate node pools optimized for different workload characteristics:
- General purpose: For standard web applications and services
- Compute-optimized: For CPU-intensive workloads
- Memory-optimized: For workloads with high memory requirements
- GPU nodes: For machine learning and other specialized workloads

Pod Scheduling Optimization

How pods are scheduled across nodes significantly affects resource utilization:

Pod affinity and anti-affinity: Control pod placement to optimize resource usage:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  template:
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - web-app
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-app
        image: web-app:1.0

This configuration encourages spreading pods across different nodes, improving availability while preventing resource contention.

Pod priority and preemption: Assign priorities to pods to ensure critical workloads get resources first:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "This priority class is for critical production workloads."
---
apiVersion: v1
kind: Pod
metadata:
  name: critical-service
spec:
  priorityClassName: high-priority
  containers:
  - name: critical-service
    image: critical-service:1.0

Topology spread constraints: Ensure pods are distributed across nodes, zones, or regions:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  template:
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: web-app
      containers:
      - name: web-app
        image: web-app:1.0

Workload-Specific Optimization

Different types of workloads benefit from different optimization approaches.

Batch and Job Workloads

Batch jobs and other non-continuous workloads offer unique optimization opportunities:

Use Jobs and CronJobs appropriately:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: log-analyzer
spec:
  schedule: "0 2 * * *"  # Run at 2 AM daily
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 86400  # Auto-delete after 24 hours
      template:
        spec:
          containers:
          - name: log-analyzer
            image: log-analyzer:1.0
            resources:
              requests:
                cpu: 2000m
                memory: 4Gi
          restartPolicy: OnFailure

Consider specialized node pools for batch processing:
- Use spot/preemptible instances for cost-effective batch processing
- Schedule batch jobs during off-peak hours when resource costs may be lower
- Implement node auto-provisioning for batch workloads

Optimize job parallelism and completion indexes:

apiVersion: batch/v1
kind: Job
metadata:
  name: data-processor
spec:
  parallelism: 5
  completions: 10
  template:
    spec:
      containers:
      - name: processor
        image: data-processor:1.0
        command: ["processor", "--chunk-index=$(JOB_COMPLETION_INDEX)"]
      restartPolicy: Never

Stateful Workloads

Stateful applications like databases require special consideration:

Optimize persistent volume claims:
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-storage
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard-ssd
  resources:
    requests:
      storage: 100Gi
```
- Choose the appropriate storage class based on performance requirements
- Start with smaller volumes and leverage volume expansion when needed
- Consider using volume snapshots for efficient backups

Enable pod disruption budgets for stateful workloads:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: database-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: database

Use appropriate StatefulSet configurations:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
spec:
  serviceName: "database"
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  template:
    spec:
      containers:
      - name: database
        image: database:1.0
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
        volumeMounts:
        - name: data
          mountPath: /var/lib/database
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard-ssd"
      resources:
        requests:
          storage: 100Gi

Service Meshes and API Gateways

Service meshes and API gateways can have significant resource implications:

Right-size proxy sidecars:

apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-sidecar-injector
data:
  values: |-
    pilot:
      resources:
        requests:
          cpu: 100m
          memory: 128Mi
        limits:
          cpu: 500m
          memory: 512Mi

Consider control plane costs:
- Use shared control planes across multiple clusters where appropriate
- Implement resource limits for control plane components
- Evaluate managed service mesh offerings versus self-managed deployments

Implementing Cost Visibility and Governance

Effective cost management requires visibility and governance mechanisms.

Cost Monitoring and Allocation

Implement tools and practices for cost visibility:

Use Kubernetes labels for cost allocation:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  labels:
    app: payment-service
    department: finance
    environment: production
    cost-center: cc-123456

Implement cost monitoring tools:
- Kubecost: Provides Kubernetes-native cost monitoring and allocation
- CloudHealth: Offers cross-cloud cost management
- AWS Cost Explorer, GCP Cost Management, Azure Cost Management: Cloud-provider-specific cost tools
Set up cost anomaly detection:

Configure alerts for unexpected cost increases or resource usage patterns that may indicate inefficiency or issues.

Implementing Cost Governance

Establish governance processes to ensure ongoing cost optimization:

Resource quotas for namespaces:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "20"
    services: "10"
    count/persistentvolumeclaims: "15"
    persistentvolumeclaims: "15"

Limit ranges for containers:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
  - default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 100m
      memory: 256Mi
    type: Container

Admission controllers for policy enforcement:

OPA Gatekeeper: Enforce policies on resource configurations
Kyverno: Policy management with audit and enforcement capabilities

Example Gatekeeper constraint template:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: requireresources
spec:
  crd:
    spec:
      names:
        kind: RequireResources
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package requireresources
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.requests
          msg := sprintf("Container %v does not have resource requests", [container.name])
        }

Applying the constraint:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RequireResources
metadata:
  name: require-resources
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]

Advanced Cost Optimization Techniques

For organizations with mature Kubernetes deployments, consider these advanced techniques.

Multi-Cluster and Multi-Cloud Strategies

Distribute workloads across multiple clusters or clouds for cost optimization:

Workload-specific clusters:
- Development/testing clusters with cost-optimized configurations
- Production clusters with high-availability configurations
- Batch processing clusters using spot/preemptible instances
Regional price arbitrage:
- Deploy non-latency-sensitive workloads in regions with lower costs
- Use global load balancing to route traffic to the most cost-effective regions
Cloud-specific optimizations:
- AWS Savings Plans or Reserved Instances
- GCP Committed Use Discounts
- Azure Reserved VM Instances

FinOps Practices for Kubernetes

Implement FinOps (Financial Operations) practices for ongoing optimization:

Regular cost reviews: Schedule weekly or monthly reviews of Kubernetes costs.
Showback and chargeback mechanisms: Implement systems to attribute costs to specific teams or departments.
Cost-aware CI/CD pipelines: Integrate cost estimation into your deployment processes to prevent costly configuration changes from reaching production.
Ephemeral environments: Create temporary environments for testing and development that are automatically deprovisioned when not in use.

Real-Time Cost Optimization

Implement systems for dynamic cost optimization:

Workload rescheduling based on spot market prices:

Use tools like AWS Spot Fleet or GCP preemptible instance managers to dynamically reschedule workloads based on current spot market prices.

Time-based scaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: time-based-scaling
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 5
        periodSeconds: 30
      selectPolicy: Max

Cost-aware application design:
- Implement graceful degradation capabilities
- Design applications to function with variable resources
- Build intelligence into applications to adapt to resource availability

Case Study: E-Commerce Platform Cost Optimization

To illustrate these principles in action, let’s examine how a fictional e-commerce company optimized their Kubernetes costs.

Initial State

The company operated a Kubernetes cluster with the following characteristics:

20 nodes (m5.2xlarge on AWS) running 24/7
All production services deployed with generous resource requests
No distinction between critical and non-critical workloads
Separate development and testing clusters with similar configurations
Monthly Kubernetes infrastructure cost: $25,000

Optimization Steps Implemented

Resource right-sizing:
- Analyzed actual resource usage using Prometheus and right-sized requests/limits
- Implemented VPA in recommendation mode, then applied suggestions after validation
- Result: 30% reduction in resource requests across the cluster
Cluster optimization:
- Implemented Cluster Autoscaler with appropriate settings
- Created separate node pools for different workload types
- Used spot instances for stateless workloads
- Result: Reduced average node count from 20 to 12, with dynamic scaling based on demand
Workload-specific optimizations:
- Moved batch processing jobs to off-peak hours
- Implemented pod anti-affinity for critical services
- Optimized storage classes for different workloads
- Result: Improved resource utilization and reduced storage costs by 25%
Governance and monitoring:
- Implemented Kubecost for detailed cost monitoring
- Established namespace quotas for different teams
- Created monthly cost review process
- Result: Greater cost awareness and prevention of resource sprawl

Results

After implementing these optimizations, the company achieved:

45% reduction in monthly Kubernetes infrastructure costs (from $25,000 to $13,750)
Improved application performance due to better resource allocation
Greater visibility into cost drivers
Sustainable governance process for maintaining optimizations

Conclusion

Kubernetes cost optimization is a continuous process that requires a combination of technical implementations, governance practices, and organizational awareness. By applying the strategies outlined in this article—resource right-sizing, cluster optimization, workload-specific optimizations, and cost governance—organizations can significantly reduce their Kubernetes costs while maintaining or even improving application performance and reliability.

Remember that effective cost optimization is not a one-time exercise but an ongoing practice. Cloud providers regularly introduce new instance types and pricing models, while application requirements evolve over time. Regular review and refinement of your cost optimization strategy will ensure sustainable savings and efficient resource utilization in your Kubernetes environments.

References

Kubernetes Documentation. (2025). Resource Management for Pods and Containers. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
The FinOps Foundation. (2024). Kubernetes Cost Allocation White Paper. https://www.finops.org/kubernetes-cost-allocation
Kubernetes SIG Autoscaling. (2025). Cluster Autoscaler Documentation. https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
Gartner. (2024). How to Realize Cost Savings After Migrating to Kubernetes. Gartner Research Publication.
CNCF. (2025). Cloud Native Survey: Cost Optimization Practices. Cloud Native Computing Foundation.

Back to all articles