Server Check: Quick Guide to Verifying Uptime and Health
What it is
A concise process for confirming a server is reachable, responsive, and performing within expected parameters—covering uptime, basic service availability, and key health indicators.
Why it matters
Verifying uptime and health quickly detects outages, performance degradation, and configuration issues before they affect users or services.
Quick checklist (steps)
- Ping / ICMP: Verify basic reachability.
- Port check: Confirm critical ports (e.g., 22, 80, 443, DB ports) are listening.
- HTTP(S) request: Ensure web services return correct status codes and expected content.
- Service status: Check process/service managers (systemd, nginx, postgres) are running.
- Resource usage: Inspect CPU, memory, disk I/O, and disk space (look for >80% usage).
- Logs: Scan recent logs for errors or repeated warnings.
- Latency & error rates: Measure response times and recent error rates from application or reverse-proxy metrics.
- Dependency checks: Verify dependent services (DB, cache, external APIs) are reachable.
- Security basics: Confirm TLS certificate validity and that no critical ports are exposed unintentionally.
- Alerting check: Ensure monitoring/alerting systems are operational and notifications can be sent.
Tools to use (examples)
- CLI: ping, curl, ss/netstat, systemctl, top/htop, df, journalctl
- Monitoring: Prometheus + Alertmanager, Grafana, Nagios, Zabbix, Datadog
- Uptime services: UptimeRobot, StatusCake, Pingdom
Quick commands (examples)
- Ping:
ping -c 4 server.example.com - HTTP:
curl -I https://server.example.com - Port:
nc -zv server.example.com 22 - Disk:
df -h / - Service:
systemctl status nginx
When to escalate
- Repeated failed checks or service restarts.
- High CPU/memory with no clear cause.
- Disk nearing full or I/O saturation.
- Significant error spikes or degraded response times affecting users.
Minimal routine
- Manual quick check: daily for critical systems.
- Automated: continuous monitors with alerts for outages and thresholds.
If you want, I can produce a one-page runnable checklist script (bash) that performs these checks and outputs a summarized status.
Leave a Reply