KEDA Autoscaling
Background
KEDA (Kubernetes Event Driven Autoscaler) enables pod-level autoscaling based on custom metrics, complementing our existing Karpenter node-level scaling.
Standard Kubernetes HPA (Horizontal Pod Autoscaler) only supports CPU and memory metrics. KEDA extends this with:
- Custom Prometheus metrics: Scale based on any metric in your monitoring stack
- Scale-to-zero: Reduce costs by scaling idle workloads to zero replicas
- Event-driven scaling: React to queue lengths, request rates, or custom application metrics
This is particularly useful for:
- GPU workloads that should scale based on inference queue length
- Batch processing jobs that scale with pending work
- Cost optimization by scaling down unused resources
Important Limitations
KEDA is currently NOT supported by
helm-idp-advanced. You can only use KEDA with your own managed Helm charts where you define the ScaledObject resources yourself.
How It Works
- Define a ScaledObject that references your Deployment and specifies scaling triggers
- KEDA queries your metrics source (e.g., Prometheus) at regular intervals
- KEDA creates and manages an HPA based on the ScaledObject configuration
- Pods scale up/down based on the metric values and thresholds you define
Basic Example
Here’s a simple ScaledObject that scales a deployment based on a Prometheus metric:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-app-scaler
namespace: my-namespace
spec:
scaleTargetRef:
name: my-deployment # Name of the Deployment to scale
minReplicaCount: 0 # Enable scale-to-zero
maxReplicaCount: 10
cooldownPeriod: 60 # Seconds to wait before scaling down
pollingInterval: 15 # How often to check metrics
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus.monitoring:9090
query: sum(my_app_queue_length{namespace="my-namespace"})
threshold: "5" # Scale up when metric >= threshold
Prometheus Configuration
KEDA does not auto-discover Prometheus. You must specify the server address in each ScaledObject.
Use this Prometheus URL in your ScaledObjects:
http://kube-prometheus-stack-prometheus.monitoring:9090
Scaling Behavior
The number of replicas is calculated as:
replicas = ceil(metric_value / threshold)
For example, with threshold: "5":
- Metric value 0 → 0 replicas (if minReplicaCount is 0)
- Metric value 4 → 1 replica
- Metric value 12 → 3 replicas
- Metric value 50 → 10 replicas (capped at maxReplicaCount)
GPU Workload Example
For AI/ML workloads with GPU, you might scale based on inference queue length:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: inference-scaler
namespace: ai-prod
spec:
scaleTargetRef:
name: inference-server
minReplicaCount: 0
maxReplicaCount: 5
cooldownPeriod: 300 # 5 min cooldown for expensive GPU pods
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus.monitoring:9090
query: sum(inference_queue_pending_requests{namespace="ai-prod"})
threshold: "10"
Available Triggers
KEDA supports many trigger types beyond Prometheus:
| Trigger | Use Case |
|---|---|
| prometheus | Scale on any Prometheus metric |
| aws-sqs-queue | Scale based on SQS queue length |
| kafka | Scale based on Kafka consumer lag |
| cron | Scheduled scaling |
| cpu/memory | Standard resource metrics |
See KEDA Scalers Documentation for the full list.
Troubleshooting
Check ScaledObject status in ArgoCD
Open your application in ArgoCD and look for the ScaledObject resource. A healthy ScaledObject shows a green status.
You can also check the HPA that KEDA creates automatically - it will be named keda-hpa-<scaledobject-name>.
Check KEDA logs in Grafana
In Grafana, go to the Explore view and query Loki for KEDA operator logs:
{namespace="keda", app="keda-operator"}
Look for scaling events like:
{"level":"info","logger":"scaleexecutor","msg":"Successfully set ScaleTarget replicas count","New Replicas Count":3}
Common issues
ScaledObject shows error status in ArgoCD
- Verify Prometheus URL is correct
- Check that your PromQL query is valid in Grafana
- Ensure the metric exists in Prometheus
Pods not scaling to zero
- Verify
minReplicaCount: 0is set - Check
cooldownPeriodhasn’t been reached yet - Ensure metric value is actually 0
HPA conflict
- Don’t create both a ScaledObject and manual HPA for the same Deployment
- KEDA manages the HPA automatically
Additional Information
For questions or issues, contact the platform team via:
- Slack: Your onboarding channel or #idp-team