Real-world examples of managing PodWarden infrastructure through AI assistants
These examples show what you can accomplish by connecting an AI assistant to PodWarden via MCP. The AI calls the appropriate tools automatically based on your natural language requests.
Getting an overview
You: "Give me an overview of my infrastructure."
The AI calls get_infrastructure_overview and returns a summary like:
You have 3 clusters with 12 hosts total. 8 hosts have GPUs (6x NVIDIA A100, 2x NVIDIA L40S) with 52 GPUs total. There are 15 stacks and 9 active deployments, 7 of which are deployed. The last deployment was 2 hours ago to the production cluster.
Finding GPU capacity
You: "I need to deploy a model that requires 4x A100 GPUs. Where can I put it?"
The AI calls find_gpu_capacity and analyzes the results:
Your prod-gpu cluster has 2 hosts with A100s:
gpu-node-01: 8x A100 80GB, 3 currently allocated → 5 available
gpu-node-02: 4x A100 80GB, 1 currently allocated → 3 available
gpu-node-01 can fit your 4-GPU workload. The staging-gpu cluster also has 4x A100 but they're all allocated.
Creating and deploying a workload
You: "Create a workload for vLLM serving Llama 3.1 70B. It needs 2 A100 GPUs, 64GB RAM, and 8 CPU cores. Then deploy it to the prod-inference cluster."
The AI calls multiple tools in sequence:
create_stack — Creates the template with the specified resources
list_clusters — Finds the prod-inference cluster ID
create_deployment — Binds the stack to the cluster
check_network_compatibility — Verifies network requirements match
deploy_workload — Triggers the deployment
Done. Created stack "vllm-llama-70b" and deployed to prod-inference.
Image: vllm/vllm-openai:latest
Resources: 8 CPU, 64GB RAM, 2x GPU
Status: deploying (check back in a minute)
Troubleshooting a failed deployment
You: "The inference-api workload is failing. What's wrong?"
The AI calls troubleshoot_workload which aggregates data from multiple sources:
The inference-api workload on prod-gpu is in CrashLoopBackOff. Here's what I found:
Pod status: 4 restarts in the last 10 minutes
Last log output:
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB(GPU 0; 39.59 GiB total capacity; 38.12 GiB already allocated)
Root cause: The stack requests 1 GPU but the model needs more VRAM than a single GPU provides.
Suggested fix: Update the stack to request 2 GPUs, then redeploy.
You: "Do it."
The AI calls update_stack to set gpu_count: 2, then deploy_workload to redeploy.
Rolling back a deployment
You: "The latest deployment broke the API. Roll it back."
The AI calls list_deployments to find the deployment history, identifies the previous working version, and calls rollback_deployment:
Rolled back api-server from v2.4.1 (deployed 30 min ago) to v2.4.0. The Kubernetes deployment is updating now. Previous deployment status was failed — the rollback target v2.4.0 was succeeded.
Managing secrets
You: "Generate an SSH key pair called 'staging-deploy' and show me the public key."
The AI calls generate_ssh_key_pair then get_secret:
2026-02-28 03:15:22 INFO Starting worker pool with 4 workers2026-02-28 03:15:23 INFO Connected to Redis at redis.mesh:63792026-02-28 03:15:23 INFO Worker 0 ready, polling queue 'jobs'2026-02-28 03:15:23 INFO Worker 1 ready, polling queue 'jobs'...2026-02-28 03:44:01 INFO Processed 1,247 jobs in last hour
Testing infrastructure connectivity
You: "Test the NFS storage connection and SMTP settings."
The AI calls test_storage_connection and test_smtp in parallel: