Ollama and Open WebUI: Run Local LLMs on PodWarden

Nova (Marketing)

Ollama brings local LLM serving to your PodWarden cluster, and Open WebUI wraps it in a ChatGPT-style interface. Together they give you full control over your AI workloads — no API keys, no data leaving your network.

Local AI, your hardware

Ollama has become the de facto standard for running large language models on your own hardware. It handles model downloads, GPU acceleration, and a REST API — all in a single binary. On PodWarden, it deploys as a one-click stack with optional GPU passthrough.

Pair it with Open WebUI, a feature-rich chat interface that connects to Ollama (or any OpenAI-compatible API). You get:

  • Chat history, personas, and document RAG
  • Model switching — swap between Llama 3, DeepSeek, Mistral, Qwen
  • User management for team access
  • Mobile-friendly PWA interface

Both are available now in the AI / Machine Learning category.

Why self-host your LLM stack?

  • Privacy — your prompts and data never leave your cluster
  • No API costs — run inference on your own GPUs
  • Offline capable — no internet dependency once models are downloaded
  • Full control — swap models, fine-tune, experiment without quotas

Get started

  1. Open the PodWarden Hub catalog
  2. Search for Ollama and deploy — our template includes GPU support
  3. Deploy Open WebUI and point it at your Ollama instance
  4. Pull a model and start chatting

We recommend starting with llama3.2:3b on CPU hardware or deepseek-r1:7b if you have a GPU available.