PodWarden
MCP Integration

Examples

Real-world examples of managing PodWarden infrastructure through AI assistants

These examples show what you can accomplish by connecting an AI assistant to PodWarden via MCP. The AI calls the appropriate tools automatically based on your natural language requests.

Getting an overview

You: "Give me an overview of my infrastructure."

The AI calls get_infrastructure_overview and returns a summary like:

You have 3 clusters with 12 hosts total. 8 hosts have GPUs (6x NVIDIA A100, 2x NVIDIA L40S) with 52 GPUs total. There are 15 stacks and 9 active deployments, 7 of which are deployed. The last deployment was 2 hours ago to the production cluster.

Finding GPU capacity

You: "I need to deploy a model that requires 4x A100 GPUs. Where can I put it?"

The AI calls find_gpu_capacity and analyzes the results:

Your prod-gpu cluster has 2 hosts with A100s:

  • gpu-node-01: 8x A100 80GB, 3 currently allocated → 5 available
  • gpu-node-02: 4x A100 80GB, 1 currently allocated → 3 available

gpu-node-01 can fit your 4-GPU workload. The staging-gpu cluster also has 4x A100 but they're all allocated.

Creating and deploying a workload

You: "Create a workload for vLLM serving Llama 3.1 70B. It needs 2 A100 GPUs, 64GB RAM, and 8 CPU cores. Then deploy it to the prod-inference cluster."

The AI calls multiple tools in sequence:

  1. create_stack — Creates the template with the specified resources
  2. list_clusters — Finds the prod-inference cluster ID
  3. create_deployment — Binds the stack to the cluster
  4. check_network_compatibility — Verifies network requirements match
  5. deploy_workload — Triggers the deployment

Done. Created stack "vllm-llama-70b" and deployed to prod-inference.

  • Image: vllm/vllm-openai:latest
  • Resources: 8 CPU, 64GB RAM, 2x GPU
  • Status: deploying (check back in a minute)

Troubleshooting a failed deployment

You: "The inference-api workload is failing. What's wrong?"

The AI calls troubleshoot_workload which aggregates data from multiple sources:

The inference-api workload on prod-gpu is in CrashLoopBackOff. Here's what I found:

Pod status: 4 restarts in the last 10 minutes Last log output:

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB
(GPU 0; 39.59 GiB total capacity; 38.12 GiB already allocated)

Root cause: The stack requests 1 GPU but the model needs more VRAM than a single GPU provides.

Suggested fix: Update the stack to request 2 GPUs, then redeploy.

You: "Do it."

The AI calls update_stack to set gpu_count: 2, then deploy_workload to redeploy.

Rolling back a deployment

You: "The latest deployment broke the API. Roll it back."

The AI calls list_deployments to find the deployment history, identifies the previous working version, and calls rollback_deployment:

Rolled back api-server from v2.4.1 (deployed 30 min ago) to v2.4.0. The Kubernetes deployment is updating now. Previous deployment status was failed — the rollback target v2.4.0 was succeeded.

Managing secrets

You: "Generate an SSH key pair called 'staging-deploy' and show me the public key."

The AI calls generate_ssh_key_pair then get_secret:

Generated SSH key pair:

  • Private key stored as staging-deploy_ssh_private
  • Public key stored as staging-deploy_ssh_public

Public key:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... podwarden-staging-deploy

Add this to your server's ~/.ssh/authorized_keys to allow PodWarden to connect.

Monitoring container logs

You: "Show me the last 50 lines of logs from the worker-pool workload."

The AI calls get_workload_logs:

Pod: worker-pool-7b9f4d6c8-xk2p9 (Running, 1/1 ready)

2026-02-28 03:15:22 INFO  Starting worker pool with 4 workers
2026-02-28 03:15:23 INFO  Connected to Redis at redis.mesh:6379
2026-02-28 03:15:23 INFO  Worker 0 ready, polling queue 'jobs'
2026-02-28 03:15:23 INFO  Worker 1 ready, polling queue 'jobs'
...
2026-02-28 03:44:01 INFO  Processed 1,247 jobs in last hour

Testing infrastructure connectivity

You: "Test the NFS storage connection and SMTP settings."

The AI calls test_storage_connection and test_smtp in parallel:

NFS (data-nfs): All checks passed

  • TCP port 2049: reachable (2ms)
  • RPC exports: /data/shared available
  • Mount + read/write: 450 MB/s write, 620 MB/s read

SMTP: Test email sent successfully to [email protected]

Checking network compatibility before deploy

You: "Can I deploy the public-api workload to the internal cluster?"

The AI calls check_network_compatibility:

Warning: Network mismatch detected.

  • The public-api workload requires public network access
  • The internal cluster only has mesh and lan network types
  • The workload won't be reachable from the internet on this cluster

Consider deploying to edge-cluster instead, which has public + mesh connectivity.

Examples