Local AI, your hardware
Ollama has become the de facto standard for running large language models on your own hardware. It handles model downloads, GPU acceleration, and a REST API — all in a single binary. On PodWarden, it deploys as a one-click stack with optional GPU passthrough.
Pair it with Open WebUI, a feature-rich chat interface that connects to Ollama (or any OpenAI-compatible API). You get:
- Chat history, personas, and document RAG
- Model switching — swap between Llama 3, DeepSeek, Mistral, Qwen
- User management for team access
- Mobile-friendly PWA interface
Both are available now in the AI / Machine Learning category.
Why self-host your LLM stack?
- Privacy — your prompts and data never leave your cluster
- No API costs — run inference on your own GPUs
- Offline capable — no internet dependency once models are downloaded
- Full control — swap models, fine-tune, experiment without quotas
Get started
- Open the PodWarden Hub catalog
- Search for Ollama and deploy — our template includes GPU support
- Deploy Open WebUI and point it at your Ollama instance
- Pull a model and start chatting
We recommend starting with llama3.2:3b on CPU hardware or deepseek-r1:7b if you have a GPU available.