PodWarden Cloud
CatalogCase StudiesNewsDocsGitHubEarly Adopter

PodWarden — Fleet operations as a product

CatalogNewsDocumentationGitHubEarly Adopter|Terms of ServicePrivacy PolicyAcceptable Use
CatalogMonitoringDCGM Exporter
DCGM Exporter

DCGM Exporter

NVIDIA

Learn how to self-host
Install with PodWardenLearn how to deploy with PodWarden

NVIDIA DCGM Exporter exposes GPU telemetry data as Prometheus metrics, providing per-GPU visibility into utilization, memory usage, temperature, power draw, and encoder/decoder engine activity. Deploy as a DaemonSet to monitor every GPU node in your cluster.

MonitoringAI / Machine LearningFreeApprovedAudited·12.5M4.7K1y ago1.2K deploys
#gpu-monitoring##nvidia#metrics#telemetry#gpu###
Learn how to self-host
Learn how to deploy with PodWarden

About

DCGM Exporter is a Prometheus-compatible metrics exporter for NVIDIA GPUs, built on top of NVIDIA's Data Center GPU Manager (DCGM). It exposes over 40 GPU telemetry metrics — utilization, memory, temperature, power draw, ECC errors, and more — at an HTTP endpoint that Prometheus…

Deployment Options

1 stack

You might also like

Grafana

Grafana

Monitoring

Vector

Vector

Monitoring

Prometheus

Prometheus

Monitoring

Tegrastats Exporter

Tegrastats Exporter

Monitoring

Datadog agent

Datadog agent

Monitoring

GrafanaLoki

GrafanaLoki

Monitoring

Requirements

100m
128Mi
GPU 1x
9400

Stacks

DCGM ExporterService

Author

NVIDIA

Project page

Tags

#gpu-monitoring##nvidia#metrics#telemetry#gpu###
How to deploy with PodWardenSelf-hosting guide