Ollama and Open WebUI: Run Local LLMs on PodWarden

Local AI, your hardware

Ollama has become the de facto standard for running large language models on your own hardware. It handles model downloads, GPU acceleration, and a REST API — all in a single binary. On PodWarden, it deploys as a one-click stack with optional GPU passthrough.

Pair it with Open WebUI, a feature-rich chat interface that connects to Ollama (or any OpenAI-compatible API). You get:

Chat history, personas, and document RAG
Model switching — swap between Llama 3, DeepSeek, Mistral, Qwen
User management for team access
Mobile-friendly PWA interface

Both are available now in the AI / Machine Learning category.

Why self-host your LLM stack?

Privacy — your prompts and data never leave your cluster
No API costs — run inference on your own GPUs
Offline capable — no internet dependency once models are downloaded
Full control — swap models, fine-tune, experiment without quotas

Get started

Open the PodWarden Hub catalog
Search for Ollama and deploy — our template includes GPU support
Deploy Open WebUI and point it at your Ollama instance
Pull a model and start chatting

We recommend starting with llama3.2:3b on CPU hardware or deepseek-r1:7b if you have a GPU available.