What is network monitoring?

Network monitoring is the continuous process of collecting, analyzing, and alerting on metrics from your network infrastructure — including routers, switches, firewalls, and links. It tracks availability (is the device reachable?), performance (how much bandwidth is in use?), and health (are errors or drops occurring?) to detect and resolve problems before they cause application downtime.

What are the most important network metrics to monitor?

The most critical network metrics are: (1) Bandwidth utilization — percentage of link capacity in use; (2) Packet loss — percentage of packets dropped in transit; (3) Latency (RTT) — round-trip time between nodes; (4) Jitter — variance in latency, critical for VoIP/video; (5) Error rate — interface errors and discards per second; (6) Device availability (uptime) — is the device reachable via ping/SNMP?; (7) CPU/memory on network devices — high CPU on a router degrades forwarding performance.

What is SNMP monitoring?

SNMP (Simple Network Management Protocol) is the industry-standard protocol for collecting metrics from network devices (routers, switches, firewalls). SNMP monitoring tools poll devices at regular intervals to retrieve OID (Object Identifier) values like interface traffic, CPU load, and error counts. Most enterprise network monitoring tools — PRTG, Nagios, Zabbix, LibreNMS — use SNMP as their primary collection method.

What is a good alert threshold for network bandwidth utilization?

Alert at 70-80% utilization sustained over 5 minutes — this gives you enough time to respond before users notice degradation. Page immediately at 90%+ utilization, as TCP will start exhibiting significant packet loss and retransmits at this point. For critical WAN links, consider alerting at 60% if you do not have burstable capacity.

What is the best network monitoring tool in 2026?

The best network monitoring tool depends on your scale and stack. For SMBs and mid-market, PRTG and LibreNMS are the most cost-effective full-featured options. For enterprise, SolarWinds NPM and Cisco DNA Center provide deep vendor integration. For cloud-native infrastructure, Datadog Network Performance Monitoring or Better Stack with synthetic checks gives the best observability. Open-source: Zabbix + Grafana is powerful but requires operational investment.

Network Monitoring Guide: Tools, Metrics & Best Practices (2026)

What Network Monitoring Covers

Network monitoring is broader than most teams realize. It covers three distinct layers:

Device availability monitoring. Is the router/switch/firewall reachable? ICMP ping checks and SNMP polling confirm device health.
Performance monitoring. How much bandwidth is in use? What is the latency and packet loss on each link? Are there interface errors?
Flow monitoring. Which applications and hosts are consuming bandwidth? NetFlow/sFlow data reveals traffic patterns and top talkers.

The 7 Core Network Metrics to Monitor

1. Bandwidth Utilization

Bandwidth utilization measures what percentage of your link's capacity is in use. Most links start degrading in quality well before they hit 100% — TCP congestion control algorithms begin dropping packets at high utilization to signal senders to slow down.

Utilization	Status	Action
< 60%	Healthy	No action needed
60-80%	Warning	Plan capacity increase
80-90%	High	Immediate capacity review
> 90%	Critical	Page on-call — degradation active

2. Packet Loss

Packet loss is one of the most damaging network conditions for application performance. Even 1% packet loss can reduce TCP throughput by 20-50% because TCP interprets loss as a congestion signal and aggressively reduces its sending rate.

0% loss: Expected state for most links
0.1-0.5% loss: Acceptable for best-effort internet links, investigate if on MPLS or fiber
> 1% loss: Alert threshold — TCP performance degrades significantly
> 5% loss: Page immediately — major application impact

3. Latency (Round-Trip Time)

Latency is the time for a packet to travel from source to destination and back. High latency multiplies the impact of packet loss (because TCP retransmit timers scale with RTT) and directly degrades user experience for interactive applications.

Alert thresholds vary by link type: <1ms (LAN), <5ms (metro fiber), <50ms (domestic WAN), <150ms (transcontinental), <300ms (intercontinental). Alert when latency doubles from baseline.

4. Jitter

Jitter is variance in latency — the difference between consecutive packet arrival times. Jitter is especially destructive for real-time applications: VoIP calls become choppy, video conferences stutter, and gaming becomes unplayable.

Target: <10ms jitter for voice/video. Alert at >20ms. Above 50ms, VoIP quality becomes unacceptable.

5. Interface Error Rate

Interface errors — CRC errors, alignment errors, late collisions — indicate physical layer problems. A small number of errors on a busy interface may be noise, but a rising error rate signals cable issues, duplex mismatches, or failing hardware.

# Check interface errors via SNMP
snmpget -v2c -c public 192.168.1.1 \
  ifInErrors.1 ifOutErrors.1 \
  ifInDiscards.1 ifOutDiscards.1

# Or via Cisco CLI
show interface GigabitEthernet0/0 | include errors|dropped

6. Device CPU and Memory

High CPU on a router or switch doesn't just mean the device is working hard — it means packet forwarding may be impacted. When CPU spikes above 70-80%, routers may start dropping packets destined for the CPU (control plane traffic) or may slow down software-based forwarding.

Alert at: 70% CPU sustained > 5 minutes
Page at: 90% CPU on any network device
Memory: alert at 80% utilization (OOM on a router causes reboots)

7. Uptime / Availability

Device availability is the most basic network metric — is the device reachable? ICMP ping tests and SNMP polling both verify reachability. Track uptime percentage over time to identify flapping devices (devices that repeatedly go up and down).

📡

Recommended

Monitor your network uptime with Better Stack

Better Stack runs synthetic checks from 30+ global locations. Detect network-level failures before your users do — HTTP, TCP, ping, and DNS checks in one dashboard.

Try Better Stack Free →

SNMP: The Foundation of Network Monitoring

SNMP (Simple Network Management Protocol) is how most network monitoring tools collect metrics from devices. Nearly every enterprise router, switch, and firewall supports SNMP.

How SNMP Works

SNMP defines a hierarchy of data objects called MIBs (Management Information Bases). Each metric is identified by an OID (Object Identifier). Your monitoring tool polls devices at regular intervals, reading OID values via SNMP GET requests.

SNMP Version	Security	Recommendation
SNMPv1	Community string (plaintext)	Avoid — no encryption
SNMPv2c	Community string (plaintext)	Acceptable for internal networks
SNMPv3	Auth + encryption	Required for internet-facing devices

Key SNMP OIDs for Network Monitoring

Metric	OID
Interface input octets	1.3.6.1.2.1.2.2.1.10
Interface output octets	1.3.6.1.2.1.2.2.1.16
Interface in errors	1.3.6.1.2.1.2.2.1.14
Interface in discards	1.3.6.1.2.1.2.2.1.13
System uptime	1.3.6.1.2.1.1.3.0
CPU utilization (Cisco)	1.3.6.1.4.1.9.2.1.58.0

Network Monitoring Tool Comparison

Tool	Best For	Pricing	SNMP
Better Stack	Uptime + API monitoring with alerting	Free tier / $20+/mo	Via integrations
PRTG	SMB/mid-market on-prem networks	$2,149/yr (500 sensors)	Native SNMP v1/2/3
LibreNMS	Open-source SNMP monitoring	Free (self-hosted)	Full SNMP support
Zabbix	Enterprise open-source monitoring	Free (self-hosted)	Full SNMP support
SolarWinds NPM	Large enterprise networks	$3,000+/yr	Full SNMP + MIB browser
Datadog NPM	Cloud-native / DevOps teams	$5/host/mo add-on	Via agent

Setting Up a Network Monitoring Stack

Option 1: Open Source (Prometheus + SNMP Exporter + Grafana)

The most flexible option for infrastructure teams comfortable with self-hosted tooling:

# Install SNMP exporter
docker run -d -p 9116:9116 \
  -v /path/to/snmp.yml:/etc/snmp_exporter/snmp.yml \
  prom/snmp-exporter

# Prometheus scrape config
scrape_configs:
  - job_name: 'snmp'
    static_configs:
      - targets: ['192.168.1.1', '192.168.1.2']
    metrics_path: /snmp
    params:
      module: [if_mib]
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: localhost:9116

Option 2: SaaS (Better Stack)

For teams without the bandwidth to self-host monitoring infrastructure, Better Stack provides synthetic network checks (ping, HTTP, TCP port checks) from multiple global locations with on-call alerting built in. Easier to set up and maintain than a self-hosted Prometheus stack.

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your network infrastructure goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your network infrastructure + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Network Monitoring Alert Strategy

Network monitoring generates a lot of data. Without a good alert strategy, you'll either miss critical incidents or drown in noise. Key principles:

Use sustained thresholds, not momentary spikes. Alert when bandwidth exceeds 80% for 5+ minutes, not on a single sample. Networks are bursty by nature.
Baseline before alerting. Capture normal utilization patterns over 2-4 weeks before setting thresholds. A link that normally runs at 70% should have a different alert threshold than one at 30%.
Correlate alerts. A single device going down is an incident. Five devices in the same building going down simultaneously is a power or upstream provider incident — route to a different escalation path.
Alert on rate of change. A link that jumps from 20% to 80% in 60 seconds is more alarming than one that reaches 80% gradually over an hour.
Separate notification channels by severity. Flapping device on non-critical segment → ticket. Core link packet loss > 1% → page.

Network Monitoring for Cloud-Native Teams

Cloud infrastructure shifts the network monitoring challenge. You don't own routers and switches anymore — you work with VPCs, security groups, load balancers, and CDNs. The metrics change:

Traditional	Cloud-Native Equivalent
Interface bandwidth utilization	VPC flow logs, EC2 network I/O
Router availability	NAT Gateway / Load Balancer health
SNMP device polling	CloudWatch / Azure Monitor metrics
NetFlow analysis	VPC flow logs → S3/CloudWatch Logs
WAN link packet loss	Synthetic monitoring (HTTP/TCP checks from multiple regions)

For cloud-native teams, synthetic monitoring is the most practical form of network monitoring: run HTTP and TCP checks from multiple geographic locations, and you'll catch regional network degradation before your users do.

Network Monitoring Guide: Metrics, Tools & Best Practices (2026)

⚡ The Network Monitoring Blind Spot