Skip to content

Observability

Quidnug exposes rich operational telemetry at /metrics and health at /api/health. First-party dashboards and alert rules ship in-repo.

Hit GET /metrics for the full exposition. Key metric families:

MetricTypeMeaning
quidnug_tx_accepted_total{type}counterAccepted transactions by type.
quidnug_tx_rejected_total{type,code}counterRejections by type and error code.
quidnug_block_tier_total{tier}counterBlocks classified per PoT tier.
quidnug_trust_query_depthhistogramDepth of successful trust queries.
quidnug_trust_query_duration_secondshistogramEnd-to-end query latency.
quidnug_gossip_outbound_total{kind}counterMessages pushed per gossip kind.
quidnug_gossip_inbound_total{kind,src}counterMessages received by kind and source.
quidnug_guardian_recovery_inflightgaugeRecoveries currently inside a time-lock window.
quidnug_epoch_rotations_total{quid}counterRotations observed per subject.
quidnug_nonce_ledger_sizegaugeSize of the nonce ledger (per-signer highest).
quidnug_http_request_duration_secondshistogramAPI latency per handler.

Import deploy/observability/grafana-dashboard.json as-is. It covers the full quidnug_* family with rows for:

  • Traffic, accepted vs rejected, per type.
  • Trust queries, depth, latency, cache hit rate.
  • Consensus, tier distribution, orphaned blocks.
  • Gossip, outbound push, inbound receive, peer health.
  • Guardians, active quorums, recoveries in flight, vetoes.
  • Resources, Go runtime, HTTP p50/p95/p99 by handler.

deploy/observability/prometheus-alerts.yml ships production-ready rules:

  • TxRejectRate, rejection rate > X% over 5 min.
  • TrustQueryP99Slow, p99 query > N seconds.
  • PeerUnhealthy, fewer than K healthy peers.
  • GuardianRecoveryStuck, recovery held in time-lock > window × 1.5.
  • NodeStarved, rate limiter rejecting a non-trivial fraction of legitimate traffic.

Tune thresholds to your deployment, they ship with reasonable defaults for a three-node consortium.

  • GET /api/health, liveness + readiness. Returns details about the trust engine, nonce ledger, and peer connectivity.
  • GET /api/info, version, feature flags, supported domains.

Logs are JSON when LOG_LEVEL is not pretty. Ship them to any log aggregator; the event and quidnug_tx_* fields are stable across versions.

Start at the metrics dashboard. The rejection counters broken down by code are almost always where you’ll find the story, see error codes and FAQ.