Troubleshooting
Diagnose monitor false positives, alert silence, Slack issues, and config errors.
Troubleshooting
Monitor reports down but the site is up for me
- Check which regions failed. Open the monitor — if only one region failed, it's a network blip, not your service. Configure "alert when N regions agree" if you don't want single-region pages.
- Look at the response. The check detail shows status code, headers, and body excerpt. Often it's a Cloudflare/CDN error page or a
Connection resetfrom a stuck pod. - Run an ad-hoc check.
happy monitors check <name>runs immediately from all 6 regions. Compare againstcurlfrom your laptop to spot geo-fencing. - Try
happy speed-test <url>for a one-off comparison without creating a monitor.
Alerts aren't being delivered
- Channel paused? Alerts → Channels — channels with 50+ consecutive failures auto-pause. Resume after fixing.
- Quiet hours? Check the rule — quiet hours suppress non-critical alerts during the configured window.
- Rule scope? Rules can be scoped to specific monitors. A new monitor without a rule won't alert. The "Default" rule covers all monitors unless you've narrowed it.
- Webhook 4xx/5xx? Alerts → Log → click delivery → response body — most webhook failures are auth or signature mismatches.
- Slack
missing_scope? Reinstall — usually means a new scope was added since you first installed the app. See Slack integration.
Slack channel picker is empty
The bot needs to be a member of channels to list them (private channels are always invisible). Either:
- Re-run OAuth and pick the channel during install (auto-invites the bot), or
/invite @happyuptimein the channel from Slack.
On-call is paging the wrong person
Most likely an override or a fixed shift is in effect. The resolver order is:
textOverride > Fixed Shift > Rotation (gated by Restriction)
Check On-Call → [schedule] → Timeline for the relevant time window — it shows the resolved person and what rule won.
If it's a timezone problem: layers store TZ explicitly. A schedule in America/New_York with handoff at 9 AM hands off at 9 AM ET regardless of the responder's local time.
Config push: "no diff"
The local YAML matches remote state. To force a re-apply (rarely needed), use --force. To see what would push, happy config push --dry-run.
Config push: "would delete N resources"
happy config push deletes resources in remote state that are missing from the local file. This is the point of config-as-code, but it's destructive. Always --dry-run first. To prevent deletion of a specific resource, add a # managed: external comment above it in remote state and it'll be skipped.
Visual regression alerts on every check
Your baseline is stale. Monitor → Screenshots → Replace baseline to update it. Tweak the threshold (default 0.05 = 5% pixel diff) under the monitor's screenshot settings.
Status page shows "no data"
Components need to be linked to monitors. Status pages → [page] → Components → click component → Linked monitor. The component status is computed from the monitor — if the monitor's never been checked, the component shows "unknown."
My CLI command says "401 Unauthorized"
bashhappy whoami
Confirms the token in use. Common causes:
- Token from
HAPPYUPTIME_API_KEYenv var overriding~/.happyuptime/config.json - Token revoked from Dashboard → Settings → API Keys
- Token's project no longer exists (re-run
happy login)
Still stuck?
Email support with the affected monitor/incident/schedule ID and a rough time window. We can pull check history and alert logs from our side.