What Network Monitoring Covers
Network monitoring is broader than most teams realize. It covers three distinct layers:
- Device availability monitoring. Is the router/switch/firewall reachable? ICMP ping checks and SNMP polling confirm device health.
- Performance monitoring. How much bandwidth is in use? What is the latency and packet loss on each link? Are there interface errors?
- Flow monitoring. Which applications and hosts are consuming bandwidth? NetFlow/sFlow data reveals traffic patterns and top talkers.
The 7 Core Network Metrics to Monitor
1. Bandwidth Utilization
Bandwidth utilization measures what percentage of your link's capacity is in use. Most links start degrading in quality well before they hit 100% — TCP congestion control algorithms begin dropping packets at high utilization to signal senders to slow down.
| Utilization | Status | Action |
|---|---|---|
| < 60% | Healthy | No action needed |
| 60-80% | Warning | Plan capacity increase |
| 80-90% | High | Immediate capacity review |
| > 90% | Critical | Page on-call — degradation active |
2. Packet Loss
Packet loss is one of the most damaging network conditions for application performance. Even 1% packet loss can reduce TCP throughput by 20-50% because TCP interprets loss as a congestion signal and aggressively reduces its sending rate.
- 0% loss: Expected state for most links
- 0.1-0.5% loss: Acceptable for best-effort internet links, investigate if on MPLS or fiber
- > 1% loss: Alert threshold — TCP performance degrades significantly
- > 5% loss: Page immediately — major application impact
3. Latency (Round-Trip Time)
Latency is the time for a packet to travel from source to destination and back. High latency multiplies the impact of packet loss (because TCP retransmit timers scale with RTT) and directly degrades user experience for interactive applications.
Alert thresholds vary by link type: <1ms (LAN), <5ms (metro fiber), <50ms (domestic WAN), <150ms (transcontinental), <300ms (intercontinental). Alert when latency doubles from baseline.
4. Jitter
Jitter is variance in latency — the difference between consecutive packet arrival times. Jitter is especially destructive for real-time applications: VoIP calls become choppy, video conferences stutter, and gaming becomes unplayable.
Target: <10ms jitter for voice/video. Alert at >20ms. Above 50ms, VoIP quality becomes unacceptable.
5. Interface Error Rate
Interface errors — CRC errors, alignment errors, late collisions — indicate physical layer problems. A small number of errors on a busy interface may be noise, but a rising error rate signals cable issues, duplex mismatches, or failing hardware.
# Check interface errors via SNMP
snmpget -v2c -c public 192.168.1.1 \
ifInErrors.1 ifOutErrors.1 \
ifInDiscards.1 ifOutDiscards.1
# Or via Cisco CLI
show interface GigabitEthernet0/0 | include errors|dropped6. Device CPU and Memory
High CPU on a router or switch doesn't just mean the device is working hard — it means packet forwarding may be impacted. When CPU spikes above 70-80%, routers may start dropping packets destined for the CPU (control plane traffic) or may slow down software-based forwarding.
- Alert at: 70% CPU sustained > 5 minutes
- Page at: 90% CPU on any network device
- Memory: alert at 80% utilization (OOM on a router causes reboots)
7. Uptime / Availability
Device availability is the most basic network metric — is the device reachable? ICMP ping tests and SNMP polling both verify reachability. Track uptime percentage over time to identify flapping devices (devices that repeatedly go up and down).
Monitor your network uptime with Better Stack
Better Stack runs synthetic checks from 30+ global locations. Detect network-level failures before your users do — HTTP, TCP, ping, and DNS checks in one dashboard.
Try Better Stack Free →SNMP: The Foundation of Network Monitoring
SNMP (Simple Network Management Protocol) is how most network monitoring tools collect metrics from devices. Nearly every enterprise router, switch, and firewall supports SNMP.
How SNMP Works
SNMP defines a hierarchy of data objects called MIBs (Management Information Bases). Each metric is identified by an OID (Object Identifier). Your monitoring tool polls devices at regular intervals, reading OID values via SNMP GET requests.
| SNMP Version | Security | Recommendation |
|---|---|---|
| SNMPv1 | Community string (plaintext) | Avoid — no encryption |
| SNMPv2c | Community string (plaintext) | Acceptable for internal networks |
| SNMPv3 | Auth + encryption | Required for internet-facing devices |
Key SNMP OIDs for Network Monitoring
| Metric | OID |
|---|---|
| Interface input octets | 1.3.6.1.2.1.2.2.1.10 |
| Interface output octets | 1.3.6.1.2.1.2.2.1.16 |
| Interface in errors | 1.3.6.1.2.1.2.2.1.14 |
| Interface in discards | 1.3.6.1.2.1.2.2.1.13 |
| System uptime | 1.3.6.1.2.1.1.3.0 |
| CPU utilization (Cisco) | 1.3.6.1.4.1.9.2.1.58.0 |
Network Monitoring Tool Comparison
| Tool | Best For | Pricing | SNMP |
|---|---|---|---|
| Better Stack | Uptime + API monitoring with alerting | Free tier / $20+/mo | Via integrations |
| PRTG | SMB/mid-market on-prem networks | $2,149/yr (500 sensors) | Native SNMP v1/2/3 |
| LibreNMS | Open-source SNMP monitoring | Free (self-hosted) | Full SNMP support |
| Zabbix | Enterprise open-source monitoring | Free (self-hosted) | Full SNMP support |
| SolarWinds NPM | Large enterprise networks | $3,000+/yr | Full SNMP + MIB browser |
| Datadog NPM | Cloud-native / DevOps teams | $5/host/mo add-on | Via agent |
Setting Up a Network Monitoring Stack
Option 1: Open Source (Prometheus + SNMP Exporter + Grafana)
The most flexible option for infrastructure teams comfortable with self-hosted tooling:
# Install SNMP exporter
docker run -d -p 9116:9116 \
-v /path/to/snmp.yml:/etc/snmp_exporter/snmp.yml \
prom/snmp-exporter
# Prometheus scrape config
scrape_configs:
- job_name: 'snmp'
static_configs:
- targets: ['192.168.1.1', '192.168.1.2']
metrics_path: /snmp
params:
module: [if_mib]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: localhost:9116Option 2: SaaS (Better Stack)
For teams without the bandwidth to self-host monitoring infrastructure, Better Stack provides synthetic network checks (ping, HTTP, TCP port checks) from multiple global locations with on-call alerting built in. Easier to set up and maintain than a self-hosted Prometheus stack.
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your network infrastructure goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your network infrastructure + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
Network Monitoring Alert Strategy
Network monitoring generates a lot of data. Without a good alert strategy, you'll either miss critical incidents or drown in noise. Key principles:
- Use sustained thresholds, not momentary spikes. Alert when bandwidth exceeds 80% for 5+ minutes, not on a single sample. Networks are bursty by nature.
- Baseline before alerting. Capture normal utilization patterns over 2-4 weeks before setting thresholds. A link that normally runs at 70% should have a different alert threshold than one at 30%.
- Correlate alerts. A single device going down is an incident. Five devices in the same building going down simultaneously is a power or upstream provider incident — route to a different escalation path.
- Alert on rate of change. A link that jumps from 20% to 80% in 60 seconds is more alarming than one that reaches 80% gradually over an hour.
- Separate notification channels by severity. Flapping device on non-critical segment → ticket. Core link packet loss > 1% → page.
Network Monitoring for Cloud-Native Teams
Cloud infrastructure shifts the network monitoring challenge. You don't own routers and switches anymore — you work with VPCs, security groups, load balancers, and CDNs. The metrics change:
| Traditional | Cloud-Native Equivalent |
|---|---|
| Interface bandwidth utilization | VPC flow logs, EC2 network I/O |
| Router availability | NAT Gateway / Load Balancer health |
| SNMP device polling | CloudWatch / Azure Monitor metrics |
| NetFlow analysis | VPC flow logs → S3/CloudWatch Logs |
| WAN link packet loss | Synthetic monitoring (HTTP/TCP checks from multiple regions) |
For cloud-native teams, synthetic monitoring is the most practical form of network monitoring: run HTTP and TCP checks from multiple geographic locations, and you'll catch regional network degradation before your users do.