Skip to main content

Terraform Modules

The Kubiya Control Plane Provider includes pre-built Terraform modules to help you quickly deploy common infrastructure patterns. These modules encapsulate best practices and reduce boilerplate code.

Available Modules

engineering-org Module

A comprehensive module for creating a complete engineering organization setup. This module creates and manages all necessary resources for running AI agents in your organization. Module Source: kubiya/control-plane//modules/engineering-org

engineering-org Module

Overview

The engineering-org module provides a complete infrastructure-as-code solution for setting up your AI agent organization. It creates all necessary resources including environments, projects, teams, agents, skills, policies, worker queues, and jobs.

Features

  • Flexible Configuration: Use maps to create multiple instances of each resource type
  • Sensible Defaults: Ready-to-use configuration for quick setup
  • Easy Extension: Add or modify resources by updating variable maps
  • Complete Setup: Creates all resources for a production-ready organization
  • Automatic Dependencies: Handles relationships between resources automatically
  • Resource References: Reference resources by name for easy relationship management

Resources Created

This module can create:
Resource TypeDescription
EnvironmentsIsolated execution environments for agents and workers
ProjectsOrganizational units for grouping related work
TeamsGroups of agents working together with shared configuration
AgentsAI-powered automation agents with custom LLM configurations
SkillsReusable capabilities (shell, filesystem, docker) for agents
PoliciesOPA Rego policies for governance and security
Worker QueuesTask queues for managing worker distribution
JobsScheduled, webhook-triggered, or manual tasks

Usage

Minimal Example (Using Defaults)

The simplest way to use the module is with all defaults:
terraform {
  required_providers {
    controlplane = {
      source  = "kubiya/control-plane"
      version = "~> 1.0"
    }
  }
}

provider "controlplane" {
  # KUBIYA_CONTROL_PLANE_API_KEY environment variable
}

module "engineering_org" {
  source = "kubiya/control-plane//modules/engineering-org"
}

output "summary" {
  value = module.engineering_org.summary
}
This creates:
  • 1 production environment
  • 1 platform project
  • 1 devops team
  • 2 agents (deployer, monitor)
  • 2 skills (shell, filesystem)
  • 1 security policy
  • 1 default worker queue
  • 1 daily health check job

Custom Configuration Example

Customize the module by providing your own variable maps:
module "engineering_org" {
  source = "kubiya/control-plane//modules/engineering-org"

  # Multiple environments
  environments = {
    production = {
      description = "Production environment"
      settings = jsonencode({
        region         = "us-east-1"
        max_workers    = 20
        auto_scaling   = true
        retention_days = 90
      })
      execution_environment = jsonencode({
        env_vars = {
          LOG_LEVEL = "info"
          APP_ENV   = "production"
        }
      })
    }
    staging = {
      description = "Staging environment"
      settings = jsonencode({
        region         = "us-west-2"
        max_workers    = 10
        auto_scaling   = true
        retention_days = 30
      })
      execution_environment = jsonencode({
        env_vars = {
          LOG_LEVEL = "debug"
          APP_ENV   = "staging"
        }
      })
    }
  }

  # Multiple teams
  teams = {
    devops = {
      description   = "DevOps and platform engineering team"
      runtime       = "claude_code"
      configuration = jsonencode({
        max_agents        = 15
        enable_monitoring = true
      })
    }
    sre = {
      description   = "Site reliability engineering team"
      runtime       = "claude_code"
      configuration = jsonencode({
        max_agents = 10
      })
    }
  }

  # Multiple agents with team assignments
  agents = {
    deployer = {
      description   = "Production deployment agent"
      model_id      = "kubiya/claude-sonnet-4"
      runtime       = "claude_code"
      llm_config    = jsonencode({
        temperature = 0.3
        max_tokens  = 4000
      })
      capabilities  = ["kubernetes_deploy", "helm_deploy", "rollback"]
      configuration = jsonencode({
        max_retries     = 3
        timeout         = 900
        approval_needed = true
      })
      team_name     = "devops"  # References the devops team
    }
    monitor = {
      description   = "Monitoring and alerting agent"
      model_id      = "kubiya/claude-sonnet-4"
      runtime       = "claude_code"
      llm_config    = jsonencode({
        temperature = 0.5
        max_tokens  = 2000
      })
      capabilities  = ["metrics_collection", "alerting", "log_analysis"]
      configuration = jsonencode({
        check_interval = 60
        alert_channels = ["slack", "pagerduty"]
      })
      team_name     = "sre"  # References the sre team
    }
    incident_responder = {
      description   = "Incident response agent"
      model_id      = "kubiya/claude-sonnet-4"
      runtime       = "claude_code"
      llm_config    = jsonencode({
        temperature = 0.4
        max_tokens  = 3000
      })
      capabilities  = ["incident_management", "root_cause_analysis"]
      configuration = jsonencode({
        escalation_timeout = 600
      })
      team_name     = "sre"
    }
  }

  # Multiple skills
  skills = {
    shell = {
      description   = "Shell command execution"
      type          = "shell"
      enabled       = true
      configuration = jsonencode({
        allowed_commands = ["kubectl", "helm", "aws", "terraform"]
        timeout          = 600
        working_dir      = "/app"
      })
    }
    filesystem = {
      description   = "File system operations"
      type          = "file_system"
      enabled       = true
      configuration = jsonencode({
        allowed_paths = ["/app/configs", "/app/data"]
        max_file_size = 52428800  # 50MB
        operations    = ["read", "write", "list", "delete"]
      })
    }
    docker = {
      description   = "Docker container management"
      type          = "docker"
      enabled       = true
      configuration = jsonencode({
        allowed_registries = ["docker.io", "gcr.io", "ghcr.io"]
        max_containers     = 20
        network_mode       = "bridge"
      })
    }
  }

  # Multiple policies
  policies = {
    security = {
      description    = "Security policy for production"
      enabled        = true
      policy_content = <<-EOT
        package kubiya.security

        # Deny destructive operations without approval
        deny[msg] {
          input.operation = "delete"
          input.environment = "production"
          count(input.approvals) < 2
          msg := "Delete operations in production require at least 2 approvals"
        }

        # Require MFA for sensitive operations
        deny[msg] {
          input.operation = "deploy"
          input.environment = "production"
          not input.mfa_verified
          msg := "Production deployments require MFA verification"
        }
      EOT
      tags           = ["security", "production", "compliance"]
    }
    cost_control = {
      description    = "Cost control and resource limits"
      enabled        = true
      policy_content = <<-EOT
        package kubiya.cost

        # Limit instance sizes
        deny[msg] {
          input.action = "create_instance"
          input.instance_type = "x2.32xlarge"
          msg := "Instance type too large, maximum allowed is m5.2xlarge"
        }

        # Require cost tags
        deny[msg] {
          input.action = "create_resource"
          not input.tags.cost_center
          msg := "All resources must have a cost_center tag"
        }
      EOT
      tags           = ["cost", "governance", "finops"]
    }
  }

  # Multiple worker queues
  worker_queues = {
    production-primary = {
      environment_name   = "production"
      display_name       = "Production Primary Queue"
      description        = "Primary worker queue for production workloads"
      heartbeat_interval = 60
      max_workers        = 20
      tags               = ["production", "primary", "high-priority"]
      settings = {
        region   = "us-east-1"
        tier     = "production"
        priority = "high"
      }
    }
    production-batch = {
      environment_name   = "production"
      display_name       = "Production Batch Queue"
      description        = "Batch processing queue"
      heartbeat_interval = 120
      max_workers        = 10
      tags               = ["production", "batch"]
      settings = {
        region   = "us-east-1"
        tier     = "production"
        priority = "normal"
      }
    }
  }

  # Multiple jobs
  jobs = {
    health_check = {
      description     = "Daily health check"
      enabled         = true
      trigger_type    = "cron"
      cron_schedule   = "0 9 * * *"  # 9 AM UTC daily
      cron_timezone   = "UTC"
      planning_mode   = "predefined_agent"
      entity_type     = "agent"
      entity_name     = "monitor"  # References the monitor agent
      prompt_template = "Run daily health check for all production services"
      system_prompt   = "Check the health of all production services and report any issues"
      executor_type   = "auto"
      execution_env_vars = {
        CHECK_TYPE       = "comprehensive"
        ALERT_ON_FAILURE = "true"
      }
    }
    deployment_webhook = {
      description     = "Handle deployment webhook events"
      enabled         = true
      trigger_type    = "webhook"
      planning_mode   = "predefined_agent"
      entity_type     = "agent"
      entity_name     = "deployer"  # References the deployer agent
      prompt_template = "Deploy {{service}} version {{version}} to {{environment}}"
      system_prompt   = "Process deployment request and verify prerequisites"
      executor_type   = "environment"
      environment_name = "production"
      config = jsonencode({
        timeout = 1800  # 30 minutes
        retry_policy = {
          max_attempts = 3
          backoff      = "exponential"
        }
      })
    }
    incident_response = {
      description     = "Manual incident response"
      enabled         = true
      trigger_type    = "manual"
      planning_mode   = "predefined_agent"
      entity_type     = "agent"
      entity_name     = "incident_responder"
      prompt_template = "Handle incident: {{incident_id}} - {{description}}"
      system_prompt   = "Coordinate incident response and resolution"
      executor_type   = "auto"
      execution_secrets = ["pagerduty_token", "slack_webhook"]
    }
  }
}

# Outputs
output "summary" {
  value = module.engineering_org.summary
}

output "environment_ids" {
  value = module.engineering_org.environment_ids
}

output "agent_ids" {
  value = module.engineering_org.agent_ids
}

output "webhook_urls" {
  value     = module.engineering_org.job_webhook_urls
  sensitive = true
}

Module Inputs

environments

Map of environments to create. Type:
map(object({
  description           = string
  settings              = optional(string, null)  # JSON-encoded
  execution_environment = optional(string, null)  # JSON-encoded
}))
Default: Creates a production environment

projects

Map of projects to create. Type:
map(object({
  key         = string
  description = string
  settings    = optional(string, null)  # JSON-encoded
}))
Default: Creates a platform project

teams

Map of teams to create. Type:
map(object({
  description   = string
  runtime       = optional(string, "default")
  configuration = optional(string, null)  # JSON-encoded
}))
Default: Creates a devops team

agents

Map of agents to create. Type:
map(object({
  description   = string
  model_id      = optional(string, "gpt-4")
  runtime       = optional(string, "default")
  llm_config    = optional(string, null)        # JSON-encoded
  capabilities  = optional(list(string), [])
  configuration = optional(string, null)        # JSON-encoded
  team_name     = optional(string, null)
}))
Default: Creates deployer and monitor agents

skills

Map of skills to create. Type:
map(object({
  description   = string
  type          = string
  enabled       = optional(bool, true)
  configuration = optional(string, null)  # JSON-encoded
}))
Default: Creates shell and filesystem skills

policies

Map of policies to create. Type:
map(object({
  description    = string
  enabled        = optional(bool, true)
  policy_content = string
  tags           = optional(list(string), [])
}))
Default: Creates a security policy

worker_queues

Map of worker queues to create. Type:
map(object({
  environment_name   = string
  display_name       = string
  description        = string
  heartbeat_interval = optional(number, 60)
  max_workers        = optional(number, 10)
  tags               = optional(list(string), [])
  settings           = optional(map(string), {})
}))
Default: Creates a default production queue

jobs

Map of jobs to create. Type:
map(object({
  description        = string
  enabled            = optional(bool, true)
  trigger_type       = string  # "cron", "webhook", or "manual"
  cron_schedule      = optional(string, null)
  cron_timezone      = optional(string, "UTC")
  planning_mode      = string
  entity_type        = optional(string, null)
  entity_name        = optional(string, null)
  prompt_template    = string
  system_prompt      = optional(string, null)
  executor_type      = optional(string, "auto")
  environment_name   = optional(string, null)
  execution_env_vars = optional(map(string), {})
  execution_secrets  = optional(list(string), [])
  config             = optional(string, null)  # JSON-encoded
}))
Default: Creates a daily health check job

Module Outputs

Resource Collections

  • environments - Map of created environments with full details
  • projects - Map of created projects with full details
  • teams - Map of created teams with full details
  • agents - Map of created agents with full details
  • skills - Map of created skills with full details
  • policies - Map of created policies with full details
  • worker_queues - Map of created worker queues with full details
  • jobs - Map of created jobs with full details

ID Maps

  • environment_ids - Map of environment names to IDs
  • project_ids - Map of project names to IDs
  • team_ids - Map of team names to IDs
  • agent_ids - Map of agent names to IDs
  • skill_ids - Map of skill names to IDs
  • policy_ids - Map of policy names to IDs
  • worker_queue_ids - Map of worker queue names to IDs
  • job_ids - Map of job names to IDs

Special Outputs

  • job_webhook_urls - Map of webhook job names to webhook URLs (sensitive)
  • worker_queue_task_names - Map of queue names to task queue names for worker registration
  • summary - Count of all created resources

Resource Relationships

The module automatically handles dependencies between resources:

Team Assignment

Agents can reference teams using the team_name field:
teams = {
  devops = {
    description = "DevOps team"
    runtime     = "claude_code"
  }
}

agents = {
  deployer = {
    description = "Deployment agent"
    team_name   = "devops"  # References the devops team
    # ...
  }
}

Environment References

Worker queues reference environments, and jobs can reference environments:
environments = {
  production = {
    description = "Production environment"
  }
}

worker_queues = {
  prod-queue = {
    environment_name = "production"  # References the production environment
    # ...
  }
}

jobs = {
  deploy = {
    environment_name = "production"  # References the production environment
    # ...
  }
}

Entity References

Jobs reference agents or teams:
agents = {
  deployer = {
    description = "Deployment agent"
    # ...
  }
}

jobs = {
  deploy_job = {
    entity_type = "agent"
    entity_name = "deployer"  # References the deployer agent
    # ...
  }
}

Best Practices

1. Start with Defaults

Begin with the default configuration and customize incrementally:
module "engineering_org" {
  source = "kubiya/control-plane//modules/engineering-org"

  # Start with all defaults, then customize specific resources
  agents = {
    # Override default agents or add new ones
  }
}

2. Use Meaningful Resource Names

Map keys become part of resource names, so use clear, descriptive names:
environments = {
  production = {  # Good: clear and descriptive
    # ...
  }
  prod = {  # Avoid: abbreviated and unclear
    # ...
  }
}

3. Organize by Environment

Create separate environments for different stages:
environments = {
  production = {
    description = "Production environment"
    settings = jsonencode({
      max_workers    = 20
      retention_days = 90
    })
  }
  staging = {
    description = "Staging environment"
    settings = jsonencode({
      max_workers    = 10
      retention_days = 30
    })
  }
  development = {
    description = "Development environment"
    settings = jsonencode({
      max_workers    = 5
      retention_days = 7
    })
  }
}

4. Group Agents by Team

Organize agents into teams based on their function:
teams = {
  devops = { description = "DevOps team" }
  sre    = { description = "SRE team" }
  data   = { description = "Data team" }
}

agents = {
  deployer  = { team_name = "devops", # ... }
  monitor   = { team_name = "sre", # ... }
  pipeline  = { team_name = "data", # ... }
}

5. Implement Policies Early

Define governance policies from the start:
policies = {
  security = {
    description    = "Security policy"
    enabled        = true
    policy_content = "..."
    tags           = ["security", "required"]
  }
  cost_control = {
    description    = "Cost control policy"
    enabled        = true
    policy_content = "..."
    tags           = ["cost", "governance"]
  }
}

6. Create Dedicated Worker Queues

Use separate queues for different priorities:
worker_queues = {
  high-priority = {
    environment_name = "production"
    max_workers      = 20
    tags             = ["high-priority", "realtime"]
  }
  batch-processing = {
    environment_name = "production"
    max_workers      = 10
    tags             = ["batch", "background"]
  }
}

Advanced Patterns

Multi-Environment Setup

Create a complete multi-environment infrastructure:
module "engineering_org" {
  source = "kubiya/control-plane//modules/engineering-org"

  environments = {
    for env in ["production", "staging", "development"] :
    env => {
      description = "${title(env)} environment"
      settings = jsonencode({
        max_workers    = env == "production" ? 20 : (env == "staging" ? 10 : 5)
        retention_days = env == "production" ? 90 : (env == "staging" ? 30 : 7)
      })
    }
  }
}

Module Composition

Combine multiple module instances:
# Base infrastructure
module "base" {
  source = "kubiya/control-plane//modules/engineering-org"

  environments = { /* ... */ }
  teams        = { /* ... */ }
}

# Additional agents using base infrastructure
module "extended" {
  source = "kubiya/control-plane//modules/engineering-org"

  # Reference base infrastructure
  agents = {
    custom_agent = {
      description = "Custom agent"
      team_name   = "devops"  # Must exist in base module
      # ...
    }
  }

  # Don't recreate base resources
  environments  = {}
  teams         = {}
  skills        = {}
  policies      = {}
  worker_queues = {}
}

Troubleshooting

Resource Already Exists

If you see “resource already exists” errors:
# Import existing resources
terraform import 'module.engineering_org.controlplane_agent.this["deployer"]' agent-xxxxx

Dependency Errors

If you see dependency errors, verify resource names match:
agents = {
  deployer = {
    team_name = "devops"  # Must match a key in teams map
  }
}

teams = {
  devops = {  # Must match team_name in agents
    description = "DevOps team"
  }
}

JSON Encoding

Always use jsonencode() for configuration fields:
# Good
configuration = jsonencode({
  key = "value"
})

# Bad - will cause errors
configuration = "{\"key\":\"value\"}"

Next Steps