Agent Server Guide
The Kubiya Agent Server provides an OpenAI-compatible API that enables any AI system to create and execute workflows. It acts as a bridge between AI models and the Kubiya workflow engine.
Overview
The Agent Server:
- Provides an OpenAI-compatible
/v1/chat/completions
endpoint
- Supports multiple LLM providers (OpenAI, Anthropic, Together, Groq, Ollama)
- Streams workflow creation and execution in real-time
- Works with any OpenAI SDK or compatible client
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ AI Clients │────▶│ Agent Server │────▶│ MCP Server │
│ (Any OpenAI SDK)│ │ (HTTP/8000) │ │ (stdio) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ LLM Provider │ │ Kubiya API │
│ (GPT-4, Claude) │ │ (Execution) │
└──────────────────┘ └─────────────────┘
Starting the Server
Basic Usage
# Start with Together AI (default)
kubiya mcp agent --provider together --port 8000
# Start with OpenAI
kubiya mcp agent --provider openai --model gpt-4 --port 8000
# Start with Anthropic
kubiya mcp agent --provider anthropic --model claude-3-5-sonnet-20241022 --port 8000
Configuration Options
Option | Description | Default |
---|
--provider | LLM provider (openai, anthropic, together, groq, ollama) | Required |
--model | Specific model to use | Provider default |
--port | HTTP port to listen on | 8000 |
--host | Host to bind to | 0.0.0.0 |
--api-key | Provider API key (or use env var) | From environment |
Environment Variables
# Kubiya API (required for workflow execution)
export KUBIYA_API_KEY="your-kubiya-api-key"
# LLM Provider API Keys (based on provider)
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export TOGETHER_API_KEY="your-together-key"
export GROQ_API_KEY="your-groq-key"
# Optional configuration
export MCP_USE_ANONYMIZED_TELEMETRY=false # Disable telemetry
API Endpoints
Chat Completions
POST /v1/chat/completions
OpenAI-compatible chat endpoint for workflow generation.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="kubiya-workflow-agent",
messages=[
{"role": "user", "content": "Create a workflow to check system health"}
],
stream=True # Enable streaming
)
Discovery Endpoint
GET /discover
Returns server capabilities and configuration.
curl http://localhost:8000/discover
Response:
{
"name": "Kubiya MCP Agent Server",
"version": "1.0.0",
"capabilities": {
"streaming": true,
"models": ["kubiya-workflow-agent"],
"formats": ["sse", "vercel"]
}
}
Health Check
GET /health
Simple health check endpoint.
curl http://localhost:8000/health
# Returns: {"status": "healthy"}
The agent server supports two streaming formats:
1. Standard SSE (Server-Sent Events)
Default format compatible with OpenAI SDK:
data: {"choices":[{"delta":{"content":"Creating workflow..."}}]}
data: {"choices":[{"delta":{"content":"Step 1: Check CPU usage"}}]}
data: [DONE]
Special format for Vercel AI SDK compatibility:
0:"Creating workflow..."
0:" Step 1: Check CPU"
2:{"type":"step_running","step":"check-cpu","message":"Checking CPU usage"}
2:{"type":"step_complete","step":"check-cpu","output":"CPU: 45%"}
d:{"finishReason":"stop"}
To use Vercel format, add a header:
X-Streaming-Format: vercel
Provider Configuration
OpenAI
kubiya mcp agent --provider openai --model gpt-4 --port 8000
Models:
gpt-4
(default)
gpt-4-turbo
gpt-3.5-turbo
Anthropic
kubiya mcp agent --provider anthropic --model claude-3-5-sonnet-20241022 --port 8000
Models:
claude-3-5-sonnet-20241022
(default)
claude-3-opus-20240229
claude-3-haiku-20240307
Together AI
kubiya mcp agent --provider together --model "meta-llama/Llama-3.3-70B-Instruct-Turbo" --port 8000
Popular models:
meta-llama/Llama-3.3-70B-Instruct-Turbo
(default)
deepseek-ai/DeepSeek-V3
mistralai/Mixtral-8x7B-Instruct-v0.1
Groq
kubiya mcp agent --provider groq --model llama-3.3-70b-versatile --port 8000
Models:
llama-3.3-70b-versatile
(default)
mixtral-8x7b-32768
gemma-7b-it
Ollama (Local Models)
# First, ensure Ollama is running locally
ollama serve
# Then start the agent server
kubiya mcp agent --provider ollama --model llama3.2 --port 8000
Client Examples
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed"
)
# Simple request
response = client.chat.completions.create(
model="kubiya-workflow-agent",
messages=[
{"role": "user", "content": "Create a backup workflow"}
]
)
print(response.choices[0].message.content)
# Streaming request
stream = client.chat.completions.create(
model="kubiya-workflow-agent",
messages=[
{"role": "user", "content": "Create and execute a monitoring workflow"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
JavaScript/TypeScript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8000/v1',
apiKey: 'not-needed',
});
// Create workflow
const response = await client.chat.completions.create({
model: 'kubiya-workflow-agent',
messages: [
{ role: 'user', content: 'Create a CI/CD pipeline' }
],
stream: true,
});
// Handle streaming
for await (const chunk of response) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
cURL
# Simple request
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "kubiya-workflow-agent",
"messages": [
{"role": "user", "content": "Create a database backup workflow"}
]
}'
# Streaming request
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "kubiya-workflow-agent",
"messages": [
{"role": "user", "content": "Create and run a health check"}
],
"stream": true
}'
Vercel AI SDK
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
const result = await streamText({
model: openai('kubiya-workflow-agent', {
baseURL: 'http://localhost:8000/v1',
}),
messages: [
{
role: 'user',
content: 'Create a workflow to deploy my app',
},
],
// Automatically uses Vercel format
});
// Handle the stream
for await (const textPart of result.textStream) {
process.stdout.write(textPart);
}
// Get the final result
const finalText = await result.text;
Workflow Execution Events
When workflows are executed, the agent server streams real-time events:
Event Types
- step_running - Step has started execution
{
"type": "step_running",
"step": "check-disk",
"message": "Checking disk usage..."
}
- step_complete - Step finished successfully
{
"type": "step_complete",
"step": "check-disk",
"output": "Disk usage: 65%",
"duration": "1.2s"
}
- step_failed - Step encountered an error
{
"type": "step_failed",
"step": "deploy",
"error": "Connection timeout",
"will_retry": true
}
- workflow_complete - Entire workflow finished
{
"type": "workflow_complete",
"workflow": "system-check",
"status": "success",
"duration": "45s"
}
Parsing Events
import json
for chunk in stream:
content = chunk.choices[0].delta.content
if content and content.startswith("2:"): # Vercel format event
event = json.loads(content[2:])
if event["type"] == "step_complete":
print(f"✓ {event['step']}: {event['output']}")
Production Deployment
Docker
FROM python:3.11-slim
RUN pip install kubiya-workflow-sdk[all]
ENV KUBIYA_API_KEY=""
ENV TOGETHER_API_KEY=""
EXPOSE 8000
CMD ["kubiya", "mcp", "agent", "--provider", "together", "--port", "8000"]
Docker Compose
version: '3.8'
services:
agent-server:
image: kubiya/workflow-sdk:latest
command: kubiya mcp agent --provider anthropic --port 8000
ports:
- "8000:8000"
environment:
- KUBIYA_API_KEY=${KUBIYA_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubiya-agent-server
spec:
replicas: 3
selector:
matchLabels:
app: kubiya-agent
template:
metadata:
labels:
app: kubiya-agent
spec:
containers:
- name: agent-server
image: kubiya/workflow-sdk:latest
command: ["kubiya", "mcp", "agent"]
args: ["--provider", "together", "--port", "8000"]
ports:
- containerPort: 8000
env:
- name: KUBIYA_API_KEY
valueFrom:
secretKeyRef:
name: kubiya-secrets
key: api-key
- name: TOGETHER_API_KEY
valueFrom:
secretKeyRef:
name: together-secrets
key: api-key
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: kubiya-agent-server
spec:
selector:
app: kubiya-agent
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
Security Considerations
API Key Management
-
Never expose API keys in client code
- Set them as environment variables on the server
- Use secrets management in production
-
Network Security
- Use HTTPS in production (reverse proxy)
- Implement rate limiting
- Use firewall rules to restrict access
-
Authentication
- The agent server itself doesn’t require authentication
- API keys are used for the underlying services
- Consider adding an auth proxy for production
Example Nginx Configuration
server {
listen 443 ssl;
server_name agent.yourdomain.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
# SSE support
proxy_set_header X-Accel-Buffering no;
proxy_read_timeout 86400;
}
}
Monitoring and Logging
Metrics
The agent server logs key metrics:
- Request count and latency
- LLM API calls and tokens used
- Workflow creation/execution success rates
- Error rates by type
Structured Logging
# Enable debug logging
export LOG_LEVEL=DEBUG
kubiya mcp agent --provider together --port 8000
Log format:
2024-01-15 10:30:45,123 - INFO - Processing request with sse format: Create backup workflow
2024-01-15 10:30:46,456 - INFO - Workflow created: backup-databases
2024-01-15 10:30:47,789 - INFO - Execution started: exec-123456
Troubleshooting
Common Issues
-
“Model not specified” error
- Solution: Always provide
--model
or use provider defaults
-
“Failed to initialize MCP agent”
- Check API keys are set correctly
- Verify network connectivity to LLM provider
-
Streaming not working
- Ensure client supports SSE
- Check for proxy buffering issues
- Verify
Accept: text/event-stream
header
-
Workflow execution fails
- Verify
KUBIYA_API_KEY
is valid
- Check runner availability
- Review workflow syntax
Debug Mode
# Enable verbose logging
export LOG_LEVEL=DEBUG
export MCP_DEBUG=true
kubiya mcp agent --provider together --port 8000
Advanced Usage
Custom System Prompts
You can customize the agent’s behavior by modifying the system prompt:
# In your client code
messages = [
{
"role": "system",
"content": "You are an expert in creating monitoring workflows. Always include error handling and notifications."
},
{
"role": "user",
"content": "Create a comprehensive monitoring solution"
}
]
Workflow Templates
Pre-define common patterns:
# Request with context
messages = [
{
"role": "user",
"content": """
Create a workflow based on this template:
- Name: daily-backup-{timestamp}
- Runner: production-runner
- Steps:
1. Backup all PostgreSQL databases
2. Compress with timestamp
3. Upload to S3 with encryption
4. Verify upload integrity
5. Clean up old backups (>30 days)
6. Send Slack notification
"""
}
]
Batch Processing
Process multiple workflow requests:
async def create_workflows(descriptions):
tasks = []
for desc in descriptions:
task = client.chat.completions.create(
model="kubiya-workflow-agent",
messages=[{"role": "user", "content": desc}],
stream=False
)
tasks.append(task)
results = await asyncio.gather(*tasks)
return results
Next Steps