Troubleshooting

Monitor reports down but the site is up for me

  1. Check which regions failed. Open the monitor — if only one region failed, it's a network blip, not your service. Configure "alert when N regions agree" if you don't want single-region pages.
  2. Look at the response. The check detail shows status code, headers, and body excerpt. Often it's a Cloudflare/CDN error page or a Connection reset from a stuck pod.
  3. Run an ad-hoc check. happy monitors check <name> runs immediately from all 6 regions. Compare against curl from your laptop to spot geo-fencing.
  4. Try happy speed-test <url> for a one-off comparison without creating a monitor.

Alerts aren't being delivered

  1. Channel paused? Alerts → Channels — channels with 50+ consecutive failures auto-pause. Resume after fixing.
  2. Quiet hours? Check the rule — quiet hours suppress non-critical alerts during the configured window.
  3. Rule scope? Rules can be scoped to specific monitors. A new monitor without a rule won't alert. The "Default" rule covers all monitors unless you've narrowed it.
  4. Webhook 4xx/5xx? Alerts → Log → click delivery → response body — most webhook failures are auth or signature mismatches.
  5. Slack missing_scope? Reinstall — usually means a new scope was added since you first installed the app. See Slack integration.

Slack channel picker is empty

The bot needs to be a member of channels to list them (private channels are always invisible). Either:

  • Re-run OAuth and pick the channel during install (auto-invites the bot), or
  • /invite @happyuptime in the channel from Slack.

On-call is paging the wrong person

Most likely an override or a fixed shift is in effect. The resolver order is:

text
Override > Fixed Shift > Rotation (gated by Restriction)

Check On-Call → [schedule] → Timeline for the relevant time window — it shows the resolved person and what rule won.

If it's a timezone problem: layers store TZ explicitly. A schedule in America/New_York with handoff at 9 AM hands off at 9 AM ET regardless of the responder's local time.

Config push: "no diff"

The local YAML matches remote state. To force a re-apply (rarely needed), use --force. To see what would push, happy config push --dry-run.

Config push: "would delete N resources"

happy config push deletes resources in remote state that are missing from the local file. This is the point of config-as-code, but it's destructive. Always --dry-run first. To prevent deletion of a specific resource, add a # managed: external comment above it in remote state and it'll be skipped.

Visual regression alerts on every check

Your baseline is stale. Monitor → Screenshots → Replace baseline to update it. Tweak the threshold (default 0.05 = 5% pixel diff) under the monitor's screenshot settings.

Status page shows "no data"

Components need to be linked to monitors. Status pages → [page] → Components → click component → Linked monitor. The component status is computed from the monitor — if the monitor's never been checked, the component shows "unknown."

My CLI command says "401 Unauthorized"

bash
happy whoami

Confirms the token in use. Common causes:

  • Token from HAPPYUPTIME_API_KEY env var overriding ~/.happyuptime/config.json
  • Token revoked from Dashboard → Settings → API Keys
  • Token's project no longer exists (re-run happy login)

Still stuck?

Email support with the affected monitor/incident/schedule ID and a rough time window. We can pull check history and alert logs from our side.

Ask a question... ⌘I