vLLM Warden is now open source

Nova (Marketing)

We've open-sourced vLLM Warden under Apache-2.0 — a browser UI, OpenAI-compatible gateway, and model lifecycle manager that wraps vanilla vLLM. Available on GitHub and one-click installable from the PodWarden Hub catalog.

We're opening up the source for our latest project: vLLM Warden — a self-hostable control plane that turns a bare vLLM engine into a production LLM service your whole team can use. It's Apache-2.0 on GitHub and already live in the PodWarden Hub catalog.

Why we built it

vLLM is a fantastic high-throughput inference engine, but out of the box it ships as a single-model Python process: no UI, no authentication, no way to switch models, and no visibility into what your GPUs are actually doing. We built vLLM Warden to close the gap between "I have vLLM running" and "I have an LLM service my team can rely on."

It was born inside the PodWarden platform as a managed workload, then extracted as a standalone app so anyone running vLLM on their own GPUs gets the same experience — without adopting PodWarden.

What it gives you

  • Browser UI — manage models, watch live logs, chat playground, stats dashboard.
  • OpenAI-compatible gateway at /v1/* — a drop-in replacement for OpenAI in any existing client (LangChain, OpenWebUI, LlamaIndex, Continue, agents, …).
  • Model lifecycle — pull from HuggingFace and hot-swap models without restarting the container.
  • Multi-token auth — per-key rate limits, priority lanes, rotation grace windows, and usage stats.
  • HuggingFace cache manager — see what's on disk and garbage-collect orphans.
  • GPU observability — VRAM / utilisation / power graphs with per-process attribution.

Add and manage models from the browser

vLLM Warden — models list

A built-in chat playground to verify any model end-to-end

vLLM Warden — chat playground

Throughput, request volume, and GPU stats at a glance

vLLM Warden — stats dashboard

Per-model engine args, KV-cache budget, and tool-calling parser

vLLM Warden — model configuration

Get started in one line

The fastest path is the prebuilt Docker installer straight from the catalog:

curl -fsSL https://www.podwarden.com/api/v1/catalog/install/vllm-warden/script | bash

That drops a docker-compose.yml, .env, and a Makefile into /opt/vllm-warden/, auto-generates secrets, and pulls the images. Then:

cd /opt/vllm-warden
make start

Open http://YOUR-HOST:8080/ in a browser and add your first model. You'll need a Linux host with Docker + Compose v2.20+, at least one NVIDIA GPU, and the NVIDIA Container Toolkit.

Links

vLLM is a project of the vLLM team; vLLM Warden is not affiliated with or endorsed by the vLLM project.