Schedule and manage GPU workloads across multiple nodes with PodWarden — AI inference, media transcoding, rendering, and more.

GPU Workload Orchestration

GPU workloads — AI inference, media transcoding, 3D rendering, computer vision — are increasingly common in homelabs and small teams. But scheduling GPU work across multiple machines is complex. PodWarden provides GPU-aware workload orchestration on K3s, making multi-GPU infrastructure manageable.

The GPU Scheduling Problem

Running GPU workloads on a single machine with Docker is straightforward: pass through the GPU device and run your container. But as soon as you have multiple GPU machines or multiple GPU workloads, problems emerge:

Resource conflicts: Two workloads competing for the same GPU
Manual placement: You decide which workload goes on which machine
No resource tracking: No central view of GPU utilization across your fleet
VRAM management: Workloads crash when they exceed available VRAM
No failover: If a GPU node goes down, workloads don't reschedule

How PodWarden Handles GPU Workloads

Hardware Discovery

When you provision a host with PodWarden, it automatically discovers GPU hardware — NVIDIA GPUs, their models, VRAM capacity, and driver versions. This information is visible in the dashboard and used for scheduling decisions.

Resource-Aware Scheduling

PodWarden's workload definitions include GPU resource requests:

GPU count: How many GPUs the workload needs
VRAM request: Minimum VRAM required

When you deploy a GPU workload, PodWarden schedules it on a node with sufficient GPU resources. If no node has available GPUs, the workload queues until resources free up — no silent failures or resource conflicts.

GPU Templates in the Catalog

PodWarden's template catalog includes pre-configured GPU workloads:

Application	GPU Use	Category
Ollama	LLM inference	AI
LocalAI	Multi-model inference	AI
Stable Diffusion WebUI	Image generation	AI
ComfyUI	Image generation workflows	AI
Plex	Hardware transcoding	Media
Jellyfin	Hardware transcoding	Media
Frigate NVR	Object detection	Surveillance
Whisper	Speech-to-text	AI

Each template comes with appropriate GPU resource requests, NVIDIA runtime configuration, and environment variables pre-configured.

Mixed Workload Clusters

Most infrastructure runs a mix of GPU and non-GPU workloads. PodWarden handles this naturally — GPU workloads are scheduled on GPU nodes, everything else goes on general-purpose nodes. You don't need separate clusters for GPU and non-GPU work.

Example cluster topology:

Node	GPUs	Workloads
node-1 (NUC)	None	Home Assistant, Pi-hole, PostgreSQL, Redis
node-2 (NAS)	None	Jellyfin (CPU), Nextcloud, Immich (CPU)
node-3 (Workstation)	RTX 4090	Ollama, Stable Diffusion, Frigate NVR
node-4 (Server)	2x RTX 3090	Whisper, LocalAI, Plex (HW transcode)

PodWarden schedules each workload on the appropriate node based on resource requirements. If node-3 is fully utilized, a new GPU workload waits or goes to node-4 if it has capacity.

Multi-GPU Use Cases

AI Inference Serving

Run multiple LLM models across GPU nodes. Ollama on one GPU for general chat, a specialized model on another for code generation. PodWarden ensures each gets dedicated GPU resources without conflicts.

Media Processing Pipeline

Ingest → transcode → serve: Frigate captures video and runs object detection on GPU, Plex or Jellyfin uses hardware transcoding for streaming, and PodWarden schedules all of it across your GPU-capable nodes.

Render Farm

Distribute Blender or other rendering jobs across multiple GPU nodes. PodWarden's workload definitions support job-type workloads that run to completion and release GPU resources.

Getting Started with GPU Workloads

Install NVIDIA drivers on your GPU hosts (before PodWarden provisioning)
Provision GPU hosts into PodWarden — GPU hardware is auto-discovered
Create a cluster including your GPU nodes
Deploy GPU workloads from the template catalog or custom definitions
Monitor GPU utilization from the PodWarden dashboard

PodWarden configures the NVIDIA container runtime and Kubernetes device plugins during provisioning — you don't need to manually set up GPU passthrough for K3s.

Why K3s for GPU Workloads

K3s provides several advantages over plain Docker for GPU workload management:

Resource scheduling: Workloads are placed on nodes with available GPUs automatically
Health checks: GPU workloads that crash are restarted automatically
Resource limits: Prevent workloads from consuming more GPU/VRAM than allocated
Rolling updates: Update GPU workload images without downtime
Multi-node: Distribute GPU workloads across your fleet from one control plane

PodWarden makes K3s GPU scheduling accessible without requiring deep Kubernetes knowledge — configure GPU requirements in the template or workload definition, and PodWarden handles the rest.