Real-Time Monitoring

Availability monitoring without blind spots — checks every 30 seconds from 14 global locations.

See How It Works View Incident Report
How It Works

30-Second Checks That Catch Problems Before Your Users Do

UptimeGuard probes your endpoints every 30 seconds from 14 monitoring stations across North America, Europe, Asia, and Oceania. Each probe validates HTTP status codes, TLS certificate expiry, DNS resolution, and response times — then aggregates the results into a single live dashboard.

HTTP & HTTPS Probes

Full request/response cycle validation. We check status codes (expecting 200–299), verify TLS handshake completion, inspect certificate chains, and measure server response time to the first byte. If your API at api.example.com/v2/health returns a 502, UptimeGuard flags it within 30 seconds and fires your configured alert.

DNS Resolution Monitoring

Every probe resolves your domain through public DNS resolvers (Cloudflare 1.1.1.1, Google 8.8.8.8, and Level3 4.2.2.1). Mismatches between resolvers trigger a DNS inconsistency alert — catching split-horizon issues before they cascade into outages.

Response-Time Thresholds

Set custom thresholds per endpoint. If cdn.example.com normally responds in 45ms but spikes to 2,100ms for three consecutive checks, UptimeGuard raises a performance degradation warning even though the HTTP status is still 200.

Multi-Location Redundancy

With stations in New York, Frankfurt, Tokyo, Sydney, São Paulo, Mumbai, London, Singapore, Seattle, Dublin, Seoul, Toronto, Melbourne, and Dallas, UptimeGuard distinguishes between a global outage and a regional routing issue. You see exactly which locations are affected and which remain healthy.

Each check generates a structured event: timestamp, source IP, resolver used, DNS resolution time, TCP handshake duration, TLS negotiation time, server processing time, total round-trip time, HTTP status, and response payload size. These events feed the live dashboard and the incident timeline — every data point indexed and searchable for the full retention period (up to 24 months on Enterprise plans).

Live Dashboard & Incident Report

What You See When Something Goes Wrong

Here is a real incident from the UptimeGuard system on March 12, 2025, affecting one of our monitored customers — a SaaS platform called Meridian Analytics.

Incident Timeline

14:07:30 UTC — Check from Tokyo (station JP-TKY-01) returns HTTP 503 for api.meridiananalytics.io/v1/query. Response time: 8,420ms. Previous check at 14:07:00 was normal (200 OK, 112ms).

14:08:00 UTC — Checks from Seoul (KR-SEL-01) and Singapore (SG-SIN-02) also return 503. UptimeGuard classifies this as a regional outage (Asia-Pacific).

14:08:30 UTC — Checks from Frankfurt (DE-FRA-03) and New York (US-NYC-01) remain healthy (200 OK, 95ms and 138ms respectively). UptimeGuard confirms the issue is isolated to APAC edge nodes.

14:09:00 UTC — PagerDuty alert dispatched to Meridian's on-call engineer, Arjun Patel. Slack notification sent to #incidents channel with full diagnostic payload.

14:15:00 UTC — Meridian's engineering team identifies a misconfigured CDN cache purge that flooded their APAC origin servers. They roll back the purge job.

14:18:00 UTC — All APAC stations return 200 OK. UptimeGuard marks the incident as resolved. Total downtime: 10 minutes 30 seconds. Three monitoring locations affected.

Dashboard Snapshot

Status Overview (at 14:09 UTC):

Global Uptime: 98.7% (degraded from 99.97%)

Healthy Locations: 11 / 14

Affected Locations: Tokyo, Seoul, Singapore

Avg Response Time (healthy): 127ms

Avg Response Time (affected): 8,240ms or timeout

Alert Routing:

• Primary: PagerDuty → Arjun Patel (on-call rotation B)

• Secondary: Slack #incidents (after 60s escalation)

• Tertiary: SMS to +1-415-555-0192 (after 5 min, not triggered)

Post-Incident Report: Automatically generated and delivered to ops@meridiananalytics.io at 14:25 UTC. Includes full event log, affected endpoints, recovery timeline, and recommendations.

This is what real-time monitoring delivers: detection within 30 seconds, intelligent regional classification, automated multi-channel alerting, and a complete audit trail — all without manual intervention. Meridian's team resolved the incident in under 11 minutes because UptimeGuard told them exactly what was broken, where, and how long it had been happening.

Frequently Asked Questions

Real-Time Monitoring — FAQ

Why 30 seconds and not more frequently?

Thirty seconds strikes the balance between detection speed and operational cost. Checking every 5–10 seconds would generate up to 1,728 checks per endpoint per day per location — overwhelming both your servers and our infrastructure. At 30-second intervals, you get 288 checks per day per location, which is sufficient to detect outages within a minute while keeping resource usage reasonable. Enterprise customers can request 15-second intervals for critical endpoints at an additional cost.

What happens if a monitoring station itself goes down?

Each of our 14 stations is a redundant cluster of machines behind an anycast IP. If one node fails, traffic shifts to the next available node within the same location. If an entire location becomes unreachable (e.g., a datacenter power failure), UptimeGuard's self-monitoring system detects the gap and adjusts uptime calculations accordingly — that station's data is excluded from availability percentages during the station outage period, so your uptime score is never unfairly penalized.

Can I monitor internal or private endpoints?

UptimeGuard monitors only publicly reachable endpoints. If your service sits behind a firewall or VPN, you can deploy a lightweight UptimeGuard Bridge agent on a bastion host. The agent forwards health-check results to our platform, giving you the same dashboard experience and alerting for internal services. Bridge agents are available on Linux, macOS, and Windows.

How far back can I see historical data?

Free and Starter plans retain 30 days of check history. Professional plans retain 12 months. Enterprise plans retain up to 24 months. All plans include real-time access to the last 72 hours of raw check events (full diagnostic payloads). Historical data is available through the dashboard UI and the REST API for custom reporting and SIEM integration.

Will UptimeGuard's checks affect my server performance?

Each check is a single HTTP GET request — equivalent to one user hitting your endpoint. At 30-second intervals from 14 locations, that's roughly 0.67 requests per second per endpoint. For any production server handling more than a few dozen concurrent requests, this load is negligible. If you're concerned, point your UptimeGuard check to a dedicated /health or /ping endpoint that returns a minimal response without querying your database or running business logic.

What types of alerts can I configure?

UptimeGuard supports alerts via email, SMS, Slack, PagerDuty, Opsgenie, Victor Ops, webhook POST, and Discord. You can set different alert rules per endpoint: for example, critical HTTP errors (5xx) trigger immediate PagerDuty escalation, while response-time degradations above 2 seconds trigger a Slack warning. Alert policies support escalation chains, maintenance windows, and quiet hours to prevent alert fatigue.

```

Real-Time Monitoring

Availability monitoring without blind spots — checks every 30 seconds from 14 global locations.

See How It Works View Incident Report
How It Works

30-Second Checks That Catch Problems Before Your Users Do

UptimeGuard probes your endpoints every 30 seconds from 14 monitoring stations across North America, Europe, Asia, and Oceania. Each probe validates HTTP status codes, TLS certificate expiry, DNS resolution, and response times — then aggregates the results into a single live dashboard.

HTTP & HTTPS Probes

Full request/response cycle validation. We check status codes (expecting 200–299), verify TLS handshake completion, inspect certificate chains, and measure server response time to the first byte. If your API at api.example.com/v2/health returns a 502, UptimeGuard flags it within 30 seconds and fires your configured alert.

DNS Resolution Monitoring

Every probe resolves your domain through public DNS resolvers (Cloudflare 1.1.1.1, Google 8.8.8.8, and Level3 4.2.2.1). Mismatches between resolvers trigger a DNS inconsistency alert — catching split-horizon issues before they cascade into outages.

Response-Time Thresholds

Set custom thresholds per endpoint. If cdn.example.com normally responds in 45ms but spikes to 2,100ms for three consecutive checks, UptimeGuard raises a performance degradation warning even though the HTTP status is still 200.

Multi-Location Redundancy

With stations in New York, Frankfurt, Tokyo, Sydney, São Paulo, Mumbai, London, Singapore, Seattle, Dublin, Seoul, Toronto, Melbourne, and Dallas, UptimeGuard distinguishes between a global outage and a regional routing issue. You see exactly which locations are affected and which remain healthy.

Each check generates a structured event: timestamp, source IP, resolver used, DNS resolution time, TCP handshake duration, TLS negotiation time, server processing time, total round-trip time, HTTP status, and response payload size. These events feed the live dashboard and the incident timeline — every data point indexed and searchable for the full retention period (up to 24 months on Enterprise plans).

Live Dashboard & Incident Report

What You See When Something Goes Wrong

Here is a real incident from the UptimeGuard system on March 12, 2025, affecting one of our monitored customers — a SaaS platform called Meridian Analytics.

Incident Timeline

14:07:30 UTC — Check from Tokyo (station JP-TKY-01) returns HTTP 503 for api.meridiananalytics.io/v1/query. Response time: 8,420ms. Previous check at 14:07:00 was normal (200 OK, 112ms).

14:08:00 UTC — Checks from Seoul (KR-SEL-01) and Singapore (SG-SIN-02) also return 503. UptimeGuard classifies this as a regional outage (Asia-Pacific).

14:08:30 UTC — Checks from Frankfurt (DE-FRA-03) and New York (US-NYC-01) remain healthy (200 OK, 95ms and 138ms respectively). UptimeGuard confirms the issue is isolated to APAC edge nodes.

14:09:00 UTC — PagerDuty alert dispatched to Meridian's on-call engineer, Arjun Patel. Slack notification sent to #incidents channel with full diagnostic payload.

14:15:00 UTC — Meridian's engineering team identifies a misconfigured CDN cache purge that flooded their APAC origin servers. They roll back the purge job.

14:18:00 UTC — All APAC stations return 200 OK. UptimeGuard marks the incident as resolved. Total downtime: 10 minutes 30 seconds. Three monitoring locations affected.

Dashboard Snapshot

Status Overview (at 14:09 UTC):

Global Uptime: 98.7% (degraded from 99.97%)

Healthy Locations: 11 / 14

Affected Locations: Tokyo, Seoul, Singapore

Avg Response Time (healthy): 127ms

Avg Response Time (affected): 8,240ms or timeout

Alert Routing:

• Primary: PagerDuty → Arjun Patel (on-call rotation B)

• Secondary: Slack #incidents (after 60s escalation)

• Tertiary: SMS to +1-415-555-0192 (after 5 min, not triggered)

Post-Incident Report: Automatically generated and delivered to ops@meridiananalytics.io at 14:25 UTC. Includes full event log, affected endpoints, recovery timeline, and recommendations.

This is what real-time monitoring delivers: detection within 30 seconds, intelligent regional classification, automated multi-channel alerting, and a complete audit trail — all without manual intervention. Meridian's team resolved the incident in under 11 minutes because UptimeGuard told them exactly what was broken, where, and how long it had been happening.

Frequently Asked Questions

Real-Time Monitoring — FAQ

Why 30 seconds and not more frequently?

Thirty seconds strikes the balance between detection speed and operational cost. Checking every 5–10 seconds would generate up to 1,728 checks per endpoint per day per location — overwhelming both your servers and our infrastructure. At 30-second intervals, you get 288 checks per day per location, which is sufficient to detect outages within a minute while keeping resource usage reasonable. Enterprise customers can request 15-second intervals for critical endpoints at an additional cost.

What happens if a monitoring station itself goes down?

Each of our 14 stations is a redundant cluster of machines behind an anycast IP. If one node fails, traffic shifts to the next available node within the same location. If an entire location becomes unreachable (e.g., a datacenter power failure), UptimeGuard's self-monitoring system detects the gap and adjusts uptime calculations accordingly — that station's data is excluded from availability percentages during the station outage period, so your uptime score is never unfairly penalized.

Can I monitor internal or private endpoints?

UptimeGuard monitors only publicly reachable endpoints. If your service sits behind a firewall or VPN, you can deploy a lightweight UptimeGuard Bridge agent on a bastion host. The agent forwards health-check results to our platform, giving you the same dashboard experience and alerting for internal services. Bridge agents are available on Linux, macOS, and Windows.

How far back can I see historical data?

Free and Starter plans retain 30 days of check history. Professional plans retain 12 months. Enterprise plans retain up to 24 months. All plans include real-time access to the last 72 hours of raw check events (full diagnostic payloads). Historical data is available through the dashboard UI and the REST API for custom reporting and SIEM integration.

Will UptimeGuard's checks affect my server performance?

Each check is a single HTTP GET request — equivalent to one user hitting your endpoint. At 30-second intervals from 14 locations, that's roughly 0.67 requests per second per endpoint. For any production server handling more than a few dozen concurrent requests, this load is negligible. If you're concerned, point your UptimeGuard check to a dedicated /health or /ping endpoint that returns a minimal response without querying your database or running business logic.

What types of alerts can I configure?

UptimeGuard supports alerts via email, SMS, Slack, PagerDuty, Opsgenie, Victor Ops, webhook POST, and Discord. You can set different alert rules per endpoint: for example, critical HTTP errors (5xx) trigger immediate PagerDuty escalation, while response-time degradations above 2 seconds trigger a Slack warning. Alert policies support escalation chains, maintenance windows, and quiet hours to prevent alert fatigue.