PodWarden
Solutions

GPU Workload Orchestration

Schedule and manage GPU workloads across multiple nodes with PodWarden — AI inference, media transcoding, rendering, and more.

GPU Workload Orchestration

GPU workloads — AI inference, media transcoding, 3D rendering, computer vision — are increasingly common in homelabs and small teams. But scheduling GPU work across multiple machines is complex. PodWarden provides GPU-aware workload orchestration on K3s, making multi-GPU infrastructure manageable.

The GPU Scheduling Problem

Running GPU workloads on a single machine with Docker is straightforward: pass through the GPU device and run your container. But as soon as you have multiple GPU machines or multiple GPU workloads, problems emerge:

  • Resource conflicts: Two workloads competing for the same GPU
  • Manual placement: You decide which workload goes on which machine
  • No resource tracking: No central view of GPU utilization across your fleet
  • VRAM management: Workloads crash when they exceed available VRAM
  • No failover: If a GPU node goes down, workloads don't reschedule

How PodWarden Handles GPU Workloads

Hardware Discovery

When you provision a host with PodWarden, it automatically discovers GPU hardware — NVIDIA GPUs, their models, VRAM capacity, and driver versions. This information is visible in the dashboard and used for scheduling decisions.

Resource-Aware Scheduling

PodWarden's workload definitions include GPU resource requests:

  • GPU count: How many GPUs the workload needs
  • VRAM request: Minimum VRAM required

When you deploy a GPU workload, PodWarden schedules it on a node with sufficient GPU resources. If no node has available GPUs, the workload queues until resources free up — no silent failures or resource conflicts.

GPU Templates in the Catalog

PodWarden's template catalog includes pre-configured GPU workloads:

ApplicationGPU UseCategory
OllamaLLM inferenceAI
LocalAIMulti-model inferenceAI
Stable Diffusion WebUIImage generationAI
ComfyUIImage generation workflowsAI
PlexHardware transcodingMedia
JellyfinHardware transcodingMedia
Frigate NVRObject detectionSurveillance
WhisperSpeech-to-textAI

Each template comes with appropriate GPU resource requests, NVIDIA runtime configuration, and environment variables pre-configured.

Mixed Workload Clusters

Most infrastructure runs a mix of GPU and non-GPU workloads. PodWarden handles this naturally — GPU workloads are scheduled on GPU nodes, everything else goes on general-purpose nodes. You don't need separate clusters for GPU and non-GPU work.

Example cluster topology:

NodeGPUsWorkloads
node-1 (NUC)NoneHome Assistant, Pi-hole, PostgreSQL, Redis
node-2 (NAS)NoneJellyfin (CPU), Nextcloud, Immich (CPU)
node-3 (Workstation)RTX 4090Ollama, Stable Diffusion, Frigate NVR
node-4 (Server)2x RTX 3090Whisper, LocalAI, Plex (HW transcode)

PodWarden schedules each workload on the appropriate node based on resource requirements. If node-3 is fully utilized, a new GPU workload waits or goes to node-4 if it has capacity.

Multi-GPU Use Cases

AI Inference Serving

Run multiple LLM models across GPU nodes. Ollama on one GPU for general chat, a specialized model on another for code generation. PodWarden ensures each gets dedicated GPU resources without conflicts.

Media Processing Pipeline

Ingest → transcode → serve: Frigate captures video and runs object detection on GPU, Plex or Jellyfin uses hardware transcoding for streaming, and PodWarden schedules all of it across your GPU-capable nodes.

Render Farm

Distribute Blender or other rendering jobs across multiple GPU nodes. PodWarden's workload definitions support job-type workloads that run to completion and release GPU resources.

Getting Started with GPU Workloads

  1. Install NVIDIA drivers on your GPU hosts (before PodWarden provisioning)
  2. Provision GPU hosts into PodWarden — GPU hardware is auto-discovered
  3. Create a cluster including your GPU nodes
  4. Deploy GPU workloads from the template catalog or custom definitions
  5. Monitor GPU utilization from the PodWarden dashboard

PodWarden configures the NVIDIA container runtime and Kubernetes device plugins during provisioning — you don't need to manually set up GPU passthrough for K3s.

Why K3s for GPU Workloads

K3s provides several advantages over plain Docker for GPU workload management:

  • Resource scheduling: Workloads are placed on nodes with available GPUs automatically
  • Health checks: GPU workloads that crash are restarted automatically
  • Resource limits: Prevent workloads from consuming more GPU/VRAM than allocated
  • Rolling updates: Update GPU workload images without downtime
  • Multi-node: Distribute GPU workloads across your fleet from one control plane

PodWarden makes K3s GPU scheduling accessible without requiring deep Kubernetes knowledge — configure GPU requirements in the template or workload definition, and PodWarden handles the rest.