PodWarden
User ManualPodWarden

Provisioning

Server provisioning jobs with status tracking and real-time logs

PodWarden provisioning page
Provisioning jobs with real-time status tracking and Ansible log viewer

What you see

URL: /provisioning

The provisioning page lists all server provisioning jobs. Each job represents an Ansible playbook run that installs K3s, configures networking, and sets up a host to join a cluster. Jobs are created when you click Provision on a host.

Fields / columns

ColumnDescription
HostThe target host being provisioned
Job typeThe type of provisioning operation (e.g. install-k3s, join-cluster, wipe)
StatusCurrent job state (see status badges below)
StartedTimestamp when the job began
DurationElapsed time since the job started, or total duration if complete

Available actions

ActionWhereWhat it does
CancelJob rowStops a running provisioning job. The host may be left in a partially configured state
View logsJob rowOpens the real-time log viewer showing Ansible playbook output line by line

Status badges

BadgeMeaning
queuedJob is waiting to start
runningAnsible playbook is currently executing
completedProvisioning finished successfully
failedProvisioning encountered an error (check logs for details)
cancelledJob was manually cancelled before completion

What provisioning does

When you provision a host, PodWarden runs an Ansible playbook that:

  1. Gathers hardware facts -- CPU, RAM, disk, GPU, network interfaces
  2. Installs base packages -- system dependencies and Docker
  3. Installs GPU drivers -- NVIDIA drivers and container toolkit (if GPU detected)
  4. Configures networking -- selects the right connection path and flannel interface
  5. Installs K3s agent -- joins the host to the target cluster
  6. Sets up NAT proxy -- if needed for mesh nodes connecting to LAN-based control planes
  7. Configures GPU runtime -- NVIDIA containerd runtime for K3s (if GPU detected)

Networking decisions

PodWarden automatically determines how the new node connects to the cluster:

  • LAN nodes connect via the control plane LAN IP
  • Mesh-only nodes (Tailscale only, no LAN) connect via the control plane Tailscale IP
  • Dual-network nodes (LAN + mesh) prefer the LAN path

The flannel overlay interface is selected to match the connection path -- tailscale0 for mesh connections, the LAN interface (e.g. eth0) for LAN connections.

For mesh nodes joining a cluster whose control plane advertises a LAN address, PodWarden sets up a NAT proxy so that kubectl logs and kubectl exec work correctly. See the Networking guide for details.

Log messages

Key log messages during provisioning:

Log messageMeaning
Pre-warming Tailscale tunnelEstablishing mesh connection before joining
SSH pre-flight check failedHost is unreachable via SSH — job fails immediately instead of waiting for Ansible timeout
Flannel interface: tailscale0Pod overlay uses the mesh tunnel
Flannel interface: eth0Pod overlay uses the LAN interface
NAT proxy setupRedirecting agent tunnel through mesh (mixed network)
K3s agent is activeNode successfully joined the cluster

Wipe

Wiping a host reverses provisioning -- it uninstalls K3s, removes the NAT proxy service if present, and resets the host to its discovered state. The host remains in PodWarden inventory but is no longer part of any cluster.

Log viewer

Clicking View logs opens a panel that streams Ansible output in real time. Logs include:

  • Task names and statuses (ok, changed, failed, skipped)
  • Command output from the remote host
  • Error messages and stack traces on failure
  • Play recap summary at the end

Logs are retained after the job completes for later review.

Troubleshooting

SSH pre-flight check

Before running Ansible, PodWarden tests SSH connectivity to the target host with a 10-second timeout. If the host is unreachable, the job fails immediately with a clear "Host unreachable via SSH" message instead of waiting for Ansible's longer timeout.

This check runs before all provisioning operations: provision, wipe, and control plane bootstrap. If you see this error, verify:

  • The host is powered on and connected to the network
  • Tailscale is running on the host (for mesh connections)
  • The SSH key is authorized in root@<host> authorized_keys

Provisioning fails with connection refused

The target host may not be reachable. For mesh nodes, the Tailscale tunnel may not be established. PodWarden attempts to pre-warm the tunnel, but if Tailscale is down on the host, provisioning will fail. Verify the host is online and Tailscale is running.

Node joins but kubectl logs returns 502

This means the K3s agent tunnel cannot reach the control plane advertised address. If the node is mesh-only and the control plane advertises a LAN IP, the NAT proxy should handle this automatically. If it was not set up (e.g. provisioned with an older version), wipe and re-provision the host.

K3s agent not starting

Check provisioning logs for errors during the K3s install step. Common causes include failed token fetch, TCP connectivity test failure, or conflicting state from a previous installation. Try wiping first, then re-provisioning.

Pods on mesh nodes can't resolve DNS

This means the flannel VXLAN overlay between the mesh node and LAN nodes isn't working. Pod-to-pod traffic (including DNS to CoreDNS) requires working VXLAN tunnels. If the node was provisioned before the VXLAN fix was available, wipe and re-provision the host. Check provisioning logs for "VXLAN mesh fix" messages. See the Networking guide for details.

Force reinstall

If a host has a partial K3s installation from a failed attempt, PodWarden can force a reinstall. It handles missing uninstall scripts gracefully, falling back to manual cleanup before installing fresh.

Related docs

  • Hosts -- Servers that provisioning jobs target
  • Clusters -- Clusters that hosts join after provisioning
  • Networking -- Network types and the NAT proxy for mixed networks
  • Dashboard -- Provisioning status overview
  • Architecture -- How PodWarden uses Ansible for provisioning
Provisioning