Kubiya Runner Helm Chart
Deploy and manage Kubiya Runners in your Kubernetes cluster using the official Helm chart.
The kubiya-runner Helm chart is the recommended way to deploy runners in production environments. It provides a fully configured, scalable deployment with monitoring and security best practices.
- Chart Name:
kubiya-runner
- Repository:
https://charts.kubiya.ai
- Current Version:
0.6.84
- App Version:
0.6.84
- Kubernetes Version:
>=1.19.0-0
Quick Start
# Add Kubiya Helm repository
helm repo add kubiya https://charts.kubiya.ai
helm repo update
# Install the runner
helm install my-runner kubiya/kubiya-runner \
--namespace kubiya \
--create-namespace \
--set organization="my-org" \
--set uuid="your-runner-uuid" \
--set nats.jwt="your-jwt-token"
Prerequisites
- Kubernetes: 1.19+
- Helm: 3.0+
- Resources:
- Minimum 4 CPU cores
- 8GB RAM
- 50GB storage
- Network: Outbound HTTPS access
- NATS Credentials: JWT tokens from Kubiya platform
- Registry Access: Access to Kubiya container registry
Architecture Overview
Components
Core Components
Agent Manager
- Image:
ghcr.io/kubiyabot/agent-manager:v0.1.14
- Purpose: Manages Kubiya agents lifecycle
- Key Features:
- Agent registration and discovery
- Health monitoring
- Automatic scaling
- NATS communication
- Image:
ghcr.io/kubiyabot/tool-manager:v0.3.17
- Purpose: Executes tools and integrations
- Key Features:
- Container-native tool execution
- SDK server integration
- Resource isolation
- Multi-language support
Kubiya Operator
- Image:
ghcr.io/kubiyabot/kubiya-operator:runner_v2
- Purpose: Manages operational aspects
- Key Features:
- Configuration management
- Secret rotation
- Component orchestration
Workflow Engine
- Image:
ghcr.io/kubiyabot/workflow-engine:main
- Purpose: Orchestrates workflow execution
- Key Features:
- DAG execution
- State management
- Event streaming
Monitoring Components
Grafana Alloy
- Image:
grafana/alloy:v1.5.1
- Purpose: Metrics collection and forwarding
- Features:
- Prometheus-compatible scraping
- Azure Managed Prometheus integration
- OTEL support
- Data processing pipelines
Kube State Metrics
- Image:
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0
- Purpose: Kubernetes resource metrics
- Scope: Namespace-limited by default
Configuration
Minimum Required Values
Create a values-override.yaml
file:
# Required: Organization configuration
organization: "my-company"
uuid: "679adc53-7068-4454-aa9f-16df30b14a50"
# Required: NATS configuration
nats:
jwt: "eyJ0eXAiOiJKV1QiLCJhbGc..." # Primary JWT token
secondJwt: "eyJ0eXAiOiJKV1QiLCJhbG..." # Secondary JWT token
subject: "kubiya.agents.my-company"
serverUrl: "nats://connect.ngs.global"
# Required for monitoring: Azure Prometheus
alloy:
alloy:
extraEnv:
- name: AZURE_REMOTE_WRITE_URL
value: "https://your-prometheus.azureprometheus.io/api/v1/write"
- name: AZURE_CLIENT_ID
value: "your-client-id"
- name: AZURE_CLIENT_SECRET
value: "your-client-secret"
- name: AZURE_TOKEN_URL
value: "https://login.microsoftonline.com/your-tenant/oauth2/v2.0/token"
# Optional: Resource configuration
resources:
agentManager:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
toolManager:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "4"
memory: "8Gi"
Advanced Configuration
Resource Management
# Configure resources for each component
resources:
agentManager:
requests: {cpu: "500m", memory: "1Gi"}
limits: {cpu: "2", memory: "4Gi"}
toolManager:
requests: {cpu: "1", memory: "2Gi"}
limits: {cpu: "4", memory: "8Gi"}
kubiyaOperator:
requests: {cpu: "250m", memory: "512Mi"}
limits: {cpu: "1", memory: "2Gi"}
workflowEngine:
requests: {cpu: "1", memory: "2Gi"}
limits: {cpu: "4", memory: "8Gi"}
alloy:
requests: {cpu: "100m", memory: "128Mi"}
limits: {cpu: "1", memory: "1Gi"}
Monitoring Configuration
# Alloy scrape intervals
alloy:
scrapeIntervals:
default: "60s"
runnerExporters: "60s"
alloyExporter: "60s"
blackboxExporter: "60s"
kubeStateMetrics: "60s"
cadvisor: "60s" # Disabled by default
# Enable additional metrics
monitoring:
enabled: true
serviceMonitor:
enabled: true
dashboards:
enabled: true
Security Configuration
# RBAC configuration
rbac:
create: true
# Optional: Grant cluster-wide permissions
adminClusterRole:
create: false # Set to true for full cluster access
# Pod Security
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
# Network Policies
networkPolicy:
enabled: true
ingress:
- from:
- namespaceSelector:
matchLabels:
name: kubiya
Storage Configuration
# Persistent storage for components
persistence:
toolManager:
enabled: true
size: "50Gi"
storageClass: "fast-ssd"
workflowEngine:
enabled: true
size: "100Gi"
storageClass: "standard"
Installation
Standard Installation
# Create namespace
kubectl create namespace kubiya
# Install with custom values
helm install my-runner kubiya/kubiya-runner \
--namespace kubiya \
--values values-override.yaml \
--wait
Production Installation
# Install with production settings
helm install prod-runner kubiya/kubiya-runner \
--namespace kubiya \
--values values-prod.yaml \
--set image.pullPolicy=IfNotPresent \
--set updateStrategy.type=RollingUpdate \
--set nodeSelector."node\.kubernetes\.io/purpose"=kubiya \
--timeout 10m \
--wait
Upgrade
# Upgrade existing installation
helm upgrade my-runner kubiya/kubiya-runner \
--namespace kubiya \
--values values-override.yaml \
--wait
Monitoring
Health Monitoring
The chart includes comprehensive health monitoring:
-
Cumulative Health Score: Composite score from 5 signals
- Tool Manager HTTP success rate
- Agent Manager HTTP success rate
- Pod ready status
- Pod restart stability
- Probe success rate
-
Health Thresholds:
- Healthy (Green): 100%
- Warning (Yellow): 99-99.9%
- Degraded (Orange): 90-98.9%
- Down (Red): 0-89.9%
Grafana Dashboards
Pre-built dashboards included:
- All Runners Health State: Overview of all runners
- Runner Components Health: Component-level metrics
- Tool Manager Performance: Request latency and throughput
- Agent Manager Metrics: Agent lifecycle metrics
- Alloy Monitoring: Metrics pipeline health
Alerts
Default alerts configured:
alerts:
- name: RunnerHealthDegraded
expr: 'runner_health_score < 90'
for: 5m
severity: warning
- name: ComponentDown
expr: 'up{job="kubiya-runner"} == 0'
for: 2m
severity: critical
Security
RBAC Permissions
The chart creates minimal RBAC permissions:
# Default namespace-scoped permissions
- apiGroups: [""]
resources: ["pods", "services", "configmaps", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
# Optional cluster-wide permissions (disabled by default)
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["*"]
Network Security
- All components run as non-root
- Network policies restrict inter-pod communication
- TLS enabled for registry communication
- Secrets mounted as volumes (not environment variables)
Troubleshooting
Common Issues
Debug Commands
# Get all runner resources
kubectl get all -n kubiya -l app.kubernetes.io/instance=my-runner
# View helm values
helm get values my-runner -n kubiya
# Test runner health
kubectl exec -n kubiya deploy/tool-manager -- \
curl -s http://localhost:8080/health
# Check component versions
kubectl get pods -n kubiya -o json | \
jq '.items[].spec.containers[].image' | sort | uniq
Maintenance
Backup
# Backup helm release
helm get values my-runner -n kubiya > runner-backup.yaml
# Backup persistent data
kubectl exec -n kubiya -c tool-manager \
deploy/tool-manager -- tar czf - /data | \
gzip > tool-manager-data.tar.gz
Scaling
# Horizontal scaling configuration
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
Updates
The chart includes an automatic image updater (deprecated):
# Disable automatic updates in production
imageUpdater:
enabled: false # Recommended for production
Migration Guide
From Previous Versions
# 1. Backup current installation
helm get values my-runner -n kubiya > backup.yaml
# 2. Review breaking changes
helm show readme kubiya/kubiya-runner --version 0.6.84
# 3. Update values file
# Add any new required values
# 4. Upgrade
helm upgrade my-runner kubiya/kubiya-runner \
--namespace kubiya \
--values values-updated.yaml \
--wait
Support
Next Steps
Create Runner in Platform
Go to Kubiya platform and create a new runner configuration
Download Values
Get the generated values.yaml with your credentials
Deploy Chart
Install the Helm chart in your cluster
Verify Health
Check the runner appears as healthy in the platform
Test Execution
Run a test workflow using your new runner