PodWarden
Use Cases

Video Transcoding at Scale

Build a GPU-accelerated transcoding pipeline on Jetson Orin NX clusters or traditional GPU servers — all supporting infrastructure included

Video transcoding is compute-heavy and highly parallel. Every video is independent, so more workers means more throughput. The challenge is managing the fleet: deploying workers consistently, routing input and output through shared storage, and scaling up or down as demand changes.

The right hardware for a transcoding farm isn't always a rack of A100s. For most workloads — HLS ladder generation, clip transcoding, live stream packaging — the NVIDIA Jetson Orin NX is the ideal building block: dedicated NVENC and NVDEC engines, low power draw, ARM64, and a price point that makes large clusters economically practical. PodWarden runs on ARM64 and manages Jetson nodes exactly like any other host.

The Hub catalog covers every supporting service the pipeline needs. If you already have S3, a job queue, or a monitoring stack, bring it. If you don't, everything is available.


What You Need

ComponentBring your ownOr deploy from Hub
Object storageAWS S3, GCS, Wasabi, MinIO already runningMinIO or RustFS — deploy to any node, exposes S3 API
Job queueRedis, RabbitMQ, SQS, NATSRedis or RabbitMQ — from the Hub catalog
DatabaseExisting PostgreSQL (for job tracking UI)PostgreSQL — from Hub
SecretsVault, AWS Secrets ManagerVault — from Hub
MonitoringExisting Prometheus + GrafanaPrometheus + Grafana + DCGM Exporter
Container registryDocker Hub, GHCRHarbor or Gitea — needed if you maintain custom ARM64 worker images

Stack Architecture


The Jetson Orin NX Advantage

The Jetson Orin NX is a system-on-module with dedicated hardware video engines that make it exceptional for transcoding:

FeatureOrin NX 16GBWhy it matters
NVENC1× dedicated encode engineH.264, H.265, AV1 hardware encode — zero CPU load
NVDEC1× dedicated decode engineHardware decode of input streams
CUDA cores1024Available for filters, scaling, color conversion
Power draw10–25WDense clusters without specialized power infrastructure
ArchitectureARM64Standard Linux, standard NVIDIA container runtime
Form factorSoM (67.6 × 45 mm)Compact carrier boards, rack-mount sleds

The NVENC and NVDEC engines are independent of the CUDA cores — they run simultaneously without contention. A single Orin NX sustains multiple concurrent 1080p60 encode sessions while the CPU handles queue polling, format demuxing, and S3 upload.

Because Orin NX modules are inexpensive, you can build clusters that would be cost-prohibitive with traditional GPU servers. A rack of Jetson nodes — each running two or three concurrent encode sessions — often outperforms a handful of A100 machines for this workload, at a fraction of the cost and power budget.

Jetson vs x86 Topology


Building the Foundation

Deploy supporting services before the transcoding workers. They're standard stacks — import from Hub, assign to your cluster, deploy.

Transcoding Pipeline Flow

1. Object Storage

Workers need to read source video and write encoded output. Register an existing S3 endpoint as a storage connection under Settings → Storage. PodWarden tests connectivity from all cluster nodes and injects credentials as environment variables at deploy time.

If you don't have S3 storage:

  • MinIO — Import from Hub. Deploy to a node with fast disk (NVMe recommended for video I/O). Exposes a full S3 API; every other component treats it identically to AWS S3.
  • RustFS — High-performance alternative, also S3-compatible. Better suited for high-throughput encode pipelines with many simultaneous workers.

Create two buckets: one for ingest (source files), one for output (encoded files).

2. Job Queue

Workers poll the queue for jobs, transcode, and report completion. Pick based on what you know:

  • Redis — Simple, fast, widely supported. Import from Hub. Workers use BLPOP or a queue library.
  • RabbitMQ — More durable, supports dead-letter queues for failed jobs. Better for high-volume pipelines where job loss is not acceptable.

If you use a cloud queue (SQS, Cloud Tasks), set the QUEUE_URL environment variable accordingly — no Hub component needed.

3. Monitoring

Import Prometheus, Grafana, and DCGM Exporter from Hub.

Deploy DCGM Exporter as a DaemonSet — it runs on every GPU node automatically and exposes per-GPU metrics including NVENC/NVDEC engine utilization (on supported drivers). On Jetson, the equivalent is Tegrastats Exporter — also available from the Hub catalog.

Grafana shows queue depth, encode throughput (frames/second per node), GPU/encoder utilization, and error rates. This tells you immediately when a node is stalled, when the queue is backing up, or when a bad source file is causing repeated failures.

4. Secrets

Store S3 credentials, queue passwords, and registry credentials in Vault (from Hub) or your existing secrets manager. PodWarden injects secrets via secret_refs at deploy time — they never appear in template definitions or deployment logs.

5. Container Registry (optional but recommended)

Jetson workers require ARM64 images built on top of NVIDIA's L4T base (nvcr.io/nvidia/l4t-base). You'll likely maintain a custom FFmpeg image for your pipeline.

Deploy Harbor or Gitea (with built-in registry) from Hub. Build your ARM64 FFmpeg image once and push it there. All Jetson workers pull from your internal registry — no external registry dependency, no rate limits.


Worker Templates

Jetson Orin NX worker (ARM64)

Kind:           Deployment
Image:          registry.internal/ffmpeg-worker:latest-arm64
GPU count:      1
VRAM:           8Gi
CPU:            4
Memory:         8Gi
Node selector:  { "nvidia.com/gpu.product": "Orin" }
VariableExampleDescription
QUEUE_URLredis://redis.mesh:6379Job queue connection
QUEUE_NAMEtranscode-jobsQueue name
INPUT_BUCKETs3://media-ingestSource video bucket
OUTPUT_BUCKETs3://media-outputEncoded output bucket
PRESEThls-ladderEncoding profile
CONCURRENCY3Parallel encode sessions per node
HWACCELcudaHardware acceleration
NVENC_PRESETp4NVENC quality/speed (p1p7)
S3_ENDPOINT_URLhttp://minio.mesh:9000Internal MinIO endpoint

Sensitive values (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, queue password) come from Vault via secret_refs.

x86 GPU server worker (amd64)

For heavier jobs — multi-stream 4K, HDR tonemapping, complex filter graphs, ProRes output — use a traditional GPU server:

Kind:           Deployment
Image:          registry.internal/ffmpeg-worker:latest-amd64
GPU count:      1
VRAM:           16Gi
CPU:            16
Memory:         32Gi
Node selector:  { "kubernetes.io/arch": "amd64" }

Same environment variables, different image architecture and node selector. Both worker types coexist in the same cluster. The queue routes job types to the appropriate workers.

Volume mounts

PathVolume typePurpose
/tmp/transcodeemptyDirWorking directory for in-flight segments

Source video and output are handled via S3 API calls from within the worker — no persistent mounts needed unless you're using NFS for source files.


Multi-Profile Deployments

Maintain separate stacks for each encoding profile:

ProfileTargetNotes
HLS adaptive bitrateWeb playback1080p/720p/480p/360p ladder, fMP4 segments
Broadcast archiveLong-term storageProRes 422 HQ or DNxHR — CPU-encoded on x86 nodes
Social clipsShort-form platformsH.264/AV1, vertical and square crops
Proxy generationEditorial workflowsLow-res H.264, fast encode for NLE preview

Each profile is a separate stack with a different PRESET value. Deploy all profiles to the same cluster. The queue routes job types to the matching worker — Jetson nodes handle the volume, x86 GPU servers handle the exceptions.

Multi-Profile Routing


Scaling the Fleet

Jetson nodes are inexpensive enough that horizontal scaling is usually the right answer. Add nodes, join the cluster, workers start picking up jobs automatically. No queue reconfiguration, no storage changes.

For temporary capacity spikes, rent cloud GPU nodes, join them to the cluster with an x86 worker template, and remove them when the backlog clears.

Scaling Lifecycle

Job kind for batch processing — For a one-time migration or catalogue re-encode, use kind: Job instead of a deployment. The job processes the queue and stops when complete. PodWarden records run duration and exit code. Batch jobs don't idle after finishing — important when running on rented nodes.


Networking

Jetson nodes behind NAT (home labs, edge deployments) connect via Tailscale mesh — no public IP needed. PodWarden auto-detects Tailscale-connected nodes and tags them mesh. The MinIO, Redis, and Vault instances on your mesh are reachable from all worker nodes.

For latency-sensitive live transcoding, co-locate Jetson nodes with your ingest infrastructure on the same LAN. Tag those nodes lan and set the worker template to require lan connectivity — PodWarden schedules only on nodes that can reach the ingest source.


Hub Templates for This Stack

TemplateRole
FFmpeg worker (Jetson NVENC)ARM64 NVENC/NVDEC worker
FFmpeg worker (x86 NVENC)amd64 GPU-accelerated worker
FFmpeg worker (CPU)Software encoding, any architecture
MinIOS3-compatible object storage
RustFSHigh-performance S3 object storage
RedisJob queue
RabbitMQDurable job queue with dead-letter support
PostgreSQLJob tracking database
VaultSecrets management
PrometheusMetrics collection
GrafanaTranscoding pipeline dashboards
DCGM ExporterPer-GPU metrics for x86 nodes (DaemonSet)
Tegrastats ExporterPer-GPU metrics for Jetson nodes (DaemonSet)
HarborPrivate container registry for custom ARM64 images

The complete pipeline — storage, queue, workers, monitoring — runs on your own nodes, managed from one dashboard. No external dependencies unless you choose them.