Monitoring

Host and service visibility with a small footprint.

Last reviewed: April 2026 Primary source: Prometheus Public surface: status.shellr.net
Why this matters Monitoring is not there to look impressive. It exists so someone reviewing or operating the platform can answer three questions quickly: Is the host healthy? Are the services reachable? Where should I look first?

Prometheus

Collects host and service metrics with intentionally short retention and size limits.

Grafana

Shared UI for metrics and logs, protected behind Nginx rather than exposed directly.

Node Exporter

Provides host-level metrics for CPU, memory, disk, and filesystem behavior.

Uptime Kuma

Used for availability checks and a dedicated status surface on its own hostname.

cAdvisor

Supplies container-level CPU and memory metrics so Grafana can show workload behavior instead of host metrics alone.

Alertmanager

Receives Prometheus alerts and keeps alert routing simple, even while notification receivers remain intentionally minimal.

Responder Path

What gets checked first.

  • Open status.shellr.net to confirm whether the issue is public and service-wide.
  • Open Grafana for host saturation, container CPU and memory, and public response-time trends.
  • Use Prometheus targets to verify whether exporters and health endpoints are still up.
  • Use Loki for short-term logs when a metric spike needs immediate context.

Alert Coverage

What the platform watches automatically.

  • Host availability through Node Exporter reachability.
  • High CPU, low memory, and low disk thresholds on the VM.
  • Public-service availability from the Uptime Kuma metrics stream.
  • Recent container restart activity through cAdvisor metrics.