Kubiya Runner Helm Chart

Deploy and manage Kubiya Runners in your Kubernetes cluster using the official Helm chart.

The kubiya-runner Helm chart is the recommended way to deploy runners in production environments. It provides a fully configured, scalable deployment with monitoring and security best practices.

Chart Information

Chart Name: kubiya-runner
Repository: https://charts.kubiya.ai
Current Version: 0.6.84
App Version: 0.6.84
Kubernetes Version: >=1.19.0-0

Quick Start

# Add Kubiya Helm repository
helm repo add kubiya https://charts.kubiya.ai
helm repo update

# Install the runner
helm install my-runner kubiya/kubiya-runner \
  --namespace kubiya \
  --create-namespace \
  --set organization="my-org" \
  --set uuid="your-runner-uuid" \
  --set nats.jwt="your-jwt-token"

Prerequisites

Kubernetes: 1.19+
Helm: 3.0+
Resources:
- Minimum 4 CPU cores
- 8GB RAM
- 50GB storage
Network: Outbound HTTPS access
NATS Credentials: JWT tokens from Kubiya platform
Registry Access: Access to Kubiya container registry

Architecture Overview

Components

Core Components

Agent Manager

Image: ghcr.io/kubiyabot/agent-manager:v0.1.14
Purpose: Manages Kubiya agents lifecycle
Key Features:
- Agent registration and discovery
- Health monitoring
- Automatic scaling
- NATS communication

Tool Manager

Image: ghcr.io/kubiyabot/tool-manager:v0.3.17
Purpose: Executes tools and integrations
Key Features:
- Container-native tool execution
- SDK server integration
- Resource isolation
- Multi-language support

Kubiya Operator

Image: ghcr.io/kubiyabot/kubiya-operator:runner_v2
Purpose: Manages operational aspects
Key Features:
- Configuration management
- Secret rotation
- Component orchestration

Workflow Engine

Image: ghcr.io/kubiyabot/workflow-engine:main
Purpose: Orchestrates workflow execution
Key Features:
- DAG execution
- State management
- Event streaming

Monitoring Components

Grafana Alloy

Image: grafana/alloy:v1.5.1
Purpose: Metrics collection and forwarding
Features:
- Prometheus-compatible scraping
- Azure Managed Prometheus integration
- OTEL support
- Data processing pipelines

Kube State Metrics

Image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0
Purpose: Kubernetes resource metrics
Scope: Namespace-limited by default

Configuration

Minimum Required Values

Create a values-override.yaml file:

# Required: Organization configuration
organization: "my-company"
uuid: "679adc53-7068-4454-aa9f-16df30b14a50"

# Required: NATS configuration
nats:
  jwt: "eyJ0eXAiOiJKV1QiLCJhbGc..."  # Primary JWT token
  secondJwt: "eyJ0eXAiOiJKV1QiLCJhbG..."  # Secondary JWT token
  subject: "kubiya.agents.my-company"
  serverUrl: "nats://connect.ngs.global"

# Required for monitoring: Azure Prometheus
alloy:
  alloy:
    extraEnv:
      - name: AZURE_REMOTE_WRITE_URL
        value: "https://your-prometheus.azureprometheus.io/api/v1/write"
      - name: AZURE_CLIENT_ID
        value: "your-client-id"
      - name: AZURE_CLIENT_SECRET
        value: "your-client-secret"
      - name: AZURE_TOKEN_URL
        value: "https://login.microsoftonline.com/your-tenant/oauth2/v2.0/token"

# Optional: Resource configuration
resources:
  agentManager:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2"
      memory: "4Gi"
  toolManager:
    requests:
      cpu: "1"
      memory: "2Gi"
    limits:
      cpu: "4"
      memory: "8Gi"

Advanced Configuration

Resource Management

# Configure resources for each component
resources:
  agentManager:
    requests: {cpu: "500m", memory: "1Gi"}
    limits: {cpu: "2", memory: "4Gi"}
  
  toolManager:
    requests: {cpu: "1", memory: "2Gi"}
    limits: {cpu: "4", memory: "8Gi"}
  
  kubiyaOperator:
    requests: {cpu: "250m", memory: "512Mi"}
    limits: {cpu: "1", memory: "2Gi"}
  
  workflowEngine:
    requests: {cpu: "1", memory: "2Gi"}
    limits: {cpu: "4", memory: "8Gi"}
  
  alloy:
    requests: {cpu: "100m", memory: "128Mi"}
    limits: {cpu: "1", memory: "1Gi"}

Monitoring Configuration

# Alloy scrape intervals
alloy:
  scrapeIntervals:
    default: "60s"
    runnerExporters: "60s"
    alloyExporter: "60s"
    blackboxExporter: "60s"
    kubeStateMetrics: "60s"
    cadvisor: "60s"  # Disabled by default

# Enable additional metrics
monitoring:
  enabled: true
  serviceMonitor:
    enabled: true
  dashboards:
    enabled: true

Security Configuration

# RBAC configuration
rbac:
  create: true
  # Optional: Grant cluster-wide permissions
  adminClusterRole:
    create: false  # Set to true for full cluster access

# Pod Security
podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

# Network Policies
networkPolicy:
  enabled: true
  ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            name: kubiya

Storage Configuration

# Persistent storage for components
persistence:
  toolManager:
    enabled: true
    size: "50Gi"
    storageClass: "fast-ssd"
  
  workflowEngine:
    enabled: true
    size: "100Gi"
    storageClass: "standard"

Installation

Standard Installation

# Create namespace
kubectl create namespace kubiya

# Install with custom values
helm install my-runner kubiya/kubiya-runner \
  --namespace kubiya \
  --values values-override.yaml \
  --wait

Production Installation

# Install with production settings
helm install prod-runner kubiya/kubiya-runner \
  --namespace kubiya \
  --values values-prod.yaml \
  --set image.pullPolicy=IfNotPresent \
  --set updateStrategy.type=RollingUpdate \
  --set nodeSelector."node\.kubernetes\.io/purpose"=kubiya \
  --timeout 10m \
  --wait

Upgrade

# Upgrade existing installation
helm upgrade my-runner kubiya/kubiya-runner \
  --namespace kubiya \
  --values values-override.yaml \
  --wait

Monitoring

Health Monitoring

The chart includes comprehensive health monitoring:

Cumulative Health Score: Composite score from 5 signals
- Tool Manager HTTP success rate
- Agent Manager HTTP success rate
- Pod ready status
- Pod restart stability
- Probe success rate
Health Thresholds:
- Healthy (Green): 100%
- Warning (Yellow): 99-99.9%
- Degraded (Orange): 90-98.9%
- Down (Red): 0-89.9%

Grafana Dashboards

Pre-built dashboards included:

All Runners Health State: Overview of all runners
Runner Components Health: Component-level metrics
Tool Manager Performance: Request latency and throughput
Agent Manager Metrics: Agent lifecycle metrics
Alloy Monitoring: Metrics pipeline health

Alerts

Default alerts configured:

alerts:
  - name: RunnerHealthDegraded
    expr: 'runner_health_score < 90'
    for: 5m
    severity: warning
    
  - name: ComponentDown
    expr: 'up{job="kubiya-runner"} == 0'
    for: 2m
    severity: critical

Security

RBAC Permissions

The chart creates minimal RBAC permissions:

# Default namespace-scoped permissions
- apiGroups: [""]
  resources: ["pods", "services", "configmaps", "secrets"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]

# Optional cluster-wide permissions (disabled by default)
- apiGroups: [""]
  resources: ["persistentvolumeclaims"]
  verbs: ["*"]

Network Security

All components run as non-root
Network policies restrict inter-pod communication
TLS enabled for registry communication
Secrets mounted as volumes (not environment variables)

Troubleshooting

Common Issues

Installation Fails

Pods Not Starting

NATS Connection Issues

# Test NATS connectivity
kubectl exec -n kubiya deploy/agent-manager -- \
  nats-cli --server=$NATS_URL --creds=/nats/creds sub ">"

# Verify JWT token
kubectl get secret -n kubiya shared-secrets -o yaml

Metrics Not Appearing

# Check Alloy status
kubectl logs -n kubiya -l app.kubernetes.io/name=alloy

# Verify scrape targets
kubectl port-forward -n kubiya svc/alloy 12345:12345
# Visit http://localhost:12345/targets

Debug Commands

# Get all runner resources
kubectl get all -n kubiya -l app.kubernetes.io/instance=my-runner

# View helm values
helm get values my-runner -n kubiya

# Test runner health
kubectl exec -n kubiya deploy/tool-manager -- \
  curl -s http://localhost:8080/health

# Check component versions
kubectl get pods -n kubiya -o json | \
  jq '.items[].spec.containers[].image' | sort | uniq

Maintenance

Backup

# Backup helm release
helm get values my-runner -n kubiya > runner-backup.yaml

# Backup persistent data
kubectl exec -n kubiya -c tool-manager \
  deploy/tool-manager -- tar czf - /data | \
  gzip > tool-manager-data.tar.gz

Scaling

# Horizontal scaling configuration
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

Updates

The chart includes an automatic image updater (deprecated):

# Disable automatic updates in production
imageUpdater:
  enabled: false  # Recommended for production

Migration Guide

From Previous Versions

# 1. Backup current installation
helm get values my-runner -n kubiya > backup.yaml

# 2. Review breaking changes
helm show readme kubiya/kubiya-runner --version 0.6.84

# 3. Update values file
# Add any new required values

# 4. Upgrade
helm upgrade my-runner kubiya/kubiya-runner \
  --namespace kubiya \
  --values values-updated.yaml \
  --wait

Support

Documentation

Full platform documentation

GitHub Issues

Report chart issues

Discord

Community support

Enterprise Support

Priority support for enterprise

Next Steps

Create Runner in Platform

Go to Kubiya platform and create a new runner configuration

Download Values

Get the generated values.yaml with your credentials

Deploy Chart

Install the Helm chart in your cluster

Verify Health

Check the runner appears as healthy in the platform

Test Execution

Run a test workflow using your new runner

Introduction

Core Concepts

Workflows

Full Stack Agents

Frontend & UI

Platform & Tools

Framework Examples

Deployment

Tutorials

Resources

​Kubiya Runner Helm Chart

​Chart Information

​Quick Start

​Prerequisites

​Architecture Overview

​Components

​Core Components

​Agent Manager

​Tool Manager

​Kubiya Operator

​Workflow Engine

​Monitoring Components

​Grafana Alloy

​Kube State Metrics

​Configuration

​Minimum Required Values

​Advanced Configuration

​Resource Management

​Monitoring Configuration

​Security Configuration

​Storage Configuration

​Installation

​Standard Installation

​Production Installation

​Upgrade

​Monitoring

​Health Monitoring

​Grafana Dashboards

​Alerts

​Security

​RBAC Permissions

​Network Security

​Troubleshooting

​Common Issues

​Debug Commands

​Maintenance

​Backup

​Scaling

​Updates

​Migration Guide

​From Previous Versions

​Support

Documentation

GitHub Issues

Discord

Enterprise Support

​Next Steps

Kubiya Runner Helm Chart

Chart Information

Quick Start

Prerequisites

Architecture Overview

Components

Core Components

Agent Manager

Tool Manager

Kubiya Operator

Workflow Engine

Monitoring Components

Grafana Alloy

Kube State Metrics

Configuration

Minimum Required Values

Advanced Configuration

Resource Management

Monitoring Configuration

Security Configuration

Storage Configuration

Installation

Standard Installation

Production Installation

Upgrade

Monitoring

Health Monitoring

Grafana Dashboards

Alerts

Security

RBAC Permissions

Network Security

Troubleshooting

Common Issues

Debug Commands

Maintenance

Backup

Scaling

Updates

Migration Guide

From Previous Versions

Support

Next Steps