Network Baselining & Documentation
1. What Is a Network Baseline?
A network baseline is a snapshot of how your network performs under normal, everyday operating conditions. It records metrics such as bandwidth utilisation, CPU and memory loads on devices, latency between key nodes, error rates, and traffic patterns — all measured during a representative period of typical activity.
Once you know what "normal" looks like, any deviation from that norm immediately stands out as a potential anomaly: a sudden spike in CPU usage, an unexpected rise in broadcast traffic, or latency that doubles overnight. Without a baseline, you are comparing your network against nothing — making it nearly impossible to distinguish a degraded network from one that simply always ran that way.
| Baseline Metric | Why It Matters |
|---|---|
| Bandwidth utilisation (%) | Reveals whether links are being driven near capacity, helping justify upgrades before users feel the impact |
| CPU & memory on routers/switches | High CPU can indicate routing loops, DoS attacks, or misconfigured processes; memory exhaustion can cause crashes |
| Round-trip latency (RTT) | Establishes expected delay between sites; a sudden increase points to congestion, routing changes, or hardware faults |
| Packet loss rate | Even 1–2 % loss severely degrades TCP throughput and VoIP quality; baseline exposes hidden intermittent issues |
| Error counters (CRC, input/output) | Physical-layer faults like a bad cable or duplex mismatch appear as growing error counters on interfaces |
| Top talkers / top protocols | Shows which hosts and applications dominate traffic — critical for QoS planning and detecting rogue activity |
| Broadcast & multicast rates | Excessively high broadcast rates can saturate a VLAN and degrade all hosts in that broadcast domain |
Related pages: SNMP Overview | SNMP Versions | NetFlow Overview | NetFlow Monitoring | Ping | Traceroute | Syslog | Troubleshooting Methodology | show interfaces | show ip route | show running-config
2. Tools Used for Network Baselining
Several complementary tools are used together to build a complete picture of normal network behaviour. Each tool captures a different layer or dimension of network activity.
2.1 SNMP (Simple Network Management Protocol)
SNMP polls managed devices (routers, switches, servers) at regular intervals and retrieves counters stored in the device's MIB (Management Information Base). These counters include interface byte counts, error counters, CPU utilisation, memory free, and dozens of other variables. A Network Management System (NMS) such as PRTG, LibreNMS, or Cacti graphs these counters over time, making trends immediately visible.
| SNMP Version | Security | Baseline Use |
|---|---|---|
| SNMPv1 | Community string (plaintext) | Legacy only — avoid in new deployments |
| SNMPv2c | Community string (plaintext) | Common in labs and small environments |
| SNMPv3 | Authentication + encryption | Recommended for production baselining. See: SNMP Versions |
Key SNMP OIDs useful for baselining include ifInOctets /
ifOutOctets (interface traffic), ifInErrors /
ifOutErrors (interface errors), and sysUpTime
(device uptime — resets reveal unplanned reboots).
2.2 NetFlow (and IPFIX / sFlow)
While SNMP tells you how much traffic crossed an interface, NetFlow tells you who sent it, where it was going, and which protocol it was. A Cisco router or switch enabled for NetFlow exports flow records (source IP, destination IP, ports, protocol, byte count) to a NetFlow collector such as ntopng, SolarWinds NTA, or Elastic Stack.
During baselining, NetFlow data reveals the top-talker hosts, dominant applications (HTTP, voice, backup jobs), and time-of-day traffic patterns. This information is indispensable for QoS policy design and for detecting data exfiltration or internal port scans.
| Flow Technology | Vendor / Standard | Key Feature |
|---|---|---|
| NetFlow v5 | Cisco proprietary | Fixed format, widely supported by collectors |
| NetFlow v9 | Cisco (template-based) | Flexible templates; supports IPv6 and MPLS fields |
| IPFIX | IETF standard (RFC 7011) | Vendor-neutral; based on NetFlow v9 design |
| sFlow | RFC 3176 | Packet sampling — lower overhead on high-speed links |
2.3 Ping
Ping uses ICMP Echo Request / Echo Reply to measure round-trip time (RTT) and packet loss between two endpoints. During baselining, scheduled ping tests from a management host to key nodes — default gateways, DNS servers, WAN endpoints, and servers — establish expected RTT and loss figures.
Example: if baseline pings from a branch router to HQ show an average RTT of 12 ms with 0 % loss, a future reading of 80 ms with 5 % loss is an unambiguous anomaly worthy of investigation.
! Cisco extended ping — useful for baseline testing
Router# ping 10.1.1.1 repeat 100 size 1400 timeout 2
Type escape sequence to abort.
Sending 100, 1400-byte ICMP Echos to 10.1.1.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 10/12/18 ms
2.4 Traceroute
Traceroute (or tracert on Windows)
maps the hop-by-hop path packets take from source to destination, reporting the
RTT at each hop. A baseline traceroute documents the expected path and per-hop
latency figures. If a future traceroute shows a new hop, a changed path, or
dramatically increased latency at a specific hop, this immediately localises the
problem to a segment of the network.
! Cisco traceroute
Router# traceroute 8.8.8.8
Type escape sequence to abort.
Tracing the route to 8.8.8.8
1 192.168.1.1 2 msec 1 msec 2 msec
2 10.0.0.1 8 msec 9 msec 8 msec
3 203.0.113.1 12 msec 11 msec 12 msec
4 8.8.8.8 14 msec 13 msec 14 msec
2.5 Syslog
Syslog collects log messages from all network devices onto a central syslog server. During baselining, reviewing logs helps document the normal rate of informational and debug messages. Later, a sudden flood of severity-3 (Error) or severity-2 (Critical) messages deviating from baseline log rates flags an event requiring immediate attention.
2.6 show Commands on Cisco IOS
Manual snapshots of Cisco IOS show commands capture point-in-time
state and counters that feed into documentation. Key commands for baselining:
| Command | What It Baselines |
|---|---|
show interfaces |
Input/output rates, error counters, duplex, speed, resets |
show ip interface brief |
Interface status and IP assignment overview |
show ip route |
Routing table state — documents expected routes |
show processes cpu |
CPU utilisation per process — reveals high-CPU processes at baseline |
show processes memory |
Memory usage — documents normal free memory levels |
show version |
IOS version, uptime, hardware model — essential inventory data |
show running-config |
Full configuration snapshot for change comparison |
show logging |
Recent log messages and logging configuration |
3. What a Network Baseline Should Capture
A thorough baseline covers four main categories of information.
3.1 Inventory & Topology Documentation
Before measuring performance, you must document what exists. Inventory records
include device hostnames, models, IOS/firmware versions, serial numbers, physical
locations, management IP addresses, and interface-to-neighbour mappings. Tools
such as show cdp neighbors detail
and show lldp neighbors detail
automate discovery of directly connected devices.
A logical topology diagram showing IP addressing, VLANs, and routing domains should accompany the inventory. This diagram becomes the reference when a fault requires rapid identification of affected segments.
3.2 Performance Metrics (the Statistical Baseline)
Collect performance data over a representative period — typically one to two full business weeks — to capture both peak and off-peak behaviour. Measuring only during quiet periods will make normal business-hours traffic look like an anomaly. Metrics to collect:
| Category | Metric | Collection Tool |
|---|---|---|
| Utilisation | Interface bandwidth %, CPU %, memory % | SNMP / NMS graphs |
| Latency | RTT (ms) between key node pairs | Ping, IP SLA |
| Loss | Packet loss % per link | Ping, IP SLA |
| Errors | CRC, input errors, output drops | SNMP, show interfaces |
| Traffic composition | Top protocols, top talkers, applications | NetFlow / IPFIX |
| Events | Syslog message rate, severity distribution | Syslog server |
3.3 Configuration Snapshots
A baseline is not only about performance numbers. Archiving the running configuration of every device — routers, switches, firewalls, and wireless controllers — at a known-good state means any future unauthorised or accidental change can be detected by a simple diff against the baseline archive. Tools like RANCID, Oxidized, and Cisco DNA Center automate configuration archiving and change detection.
3.4 Availability Records
Track uptime and downtime for every critical device and link. SNMP traps and
syslog messages triggered by interface state changes (line protocol up/down)
feed into availability calculations. A baseline availability record of 99.9 % on
a core link makes a month with 97 % availability immediately reportable as a
deviation requiring RCA (Root Cause Analysis).
4. Why Baselines Are Essential for Anomaly Detection
Anomaly detection is fundamentally a comparison exercise: current behaviour vs expected behaviour. The baseline defines "expected." Without it, the following scenarios are difficult or impossible to detect reliably:
| Anomaly Type | Baseline Metric That Reveals It | Possible Root Cause |
|---|---|---|
| Sudden bandwidth spike | Interface utilisation exceeds baseline peak | Backup job misconfigured, malware, new application |
| Increased latency | RTT to a site doubles vs baseline average | Congestion, routing change, failing WAN circuit |
| Packet loss on a link | Loss % rises from 0 % baseline to 2 %+ | Duplex mismatch, bad cable, failing transceiver |
| CPU spike on a router | CPU % far above baseline idle/average | Routing loop, DoS attack, excessive debug left on |
| Unknown top talker | NetFlow shows new host consuming large share | Rogue device, compromised host, data exfiltration |
| Route flap | Syslog message rate spikes; new prefixes in routing table | Unstable BGP/OSPF neighbour, physical link issues |
| High broadcast rate | Broadcast counter exceeds baseline in a VLAN | Broadcast storm, spanning-tree loop, ARP flood |
Related pages: Troubleshooting Methodology | Troubleshooting Connectivity | NetFlow Monitoring | SNMP Traps | Syslog Severity Levels | Debug Commands
5. Cisco IP SLA — Automated Baselining of Latency and Loss
Cisco IP SLA (Service Level Agreement) is a built-in IOS feature that continuously generates synthetic test traffic (ICMP, UDP, TCP, HTTP) between network devices and records RTT, jitter, and loss statistics. Unlike manual ping tests, IP SLA runs 24/7 in the background, logging results to the device's history table and optionally triggering SNMP traps or syslog messages when thresholds are exceeded.
! Configure IP SLA ICMP echo — baseline RTT to 10.1.1.1
Router(config)# ip sla 1
Router(config-ip-sla)# icmp-echo 10.1.1.1 source-interface GigabitEthernet0/0
Router(config-ip-sla-echo)# frequency 60
Router(config-ip-sla-echo)# exit
Router(config)# ip sla schedule 1 life forever start-time now
! Verify
Router# show ip sla statistics 1
IPSLAs Latest Operation Statistics
IPSLA operation id: 1
Latest RTT: 12 milliseconds
Latest operation start time: *03:15:22.345 UTC
Latest operation return code: OK
Number of successes: 1440
Number of failures: 0
Operation time to live: Forever
IP SLA data can be polled by SNMP using the CISCO-RTTMON-MIB,
feeding directly into NMS dashboards for long-term baseline graphing.
6. How to Build a Network Baseline — Step-by-Step Process
Follow this structured process to create a baseline that is actually useful for future anomaly detection and troubleshooting.
| Step | Action | Tools / Output |
|---|---|---|
| 1 | Define scope — which devices, links, and services are in scope | Network diagram, device inventory list |
| 2 | Collect inventory — hostname, model, IOS version, IP addresses | show version, CDP/LLDP, spreadsheet |
| 3 | Archive configurations at known-good state | RANCID, Oxidized, TFTP backup |
| 4 | Enable SNMP on all devices; configure NMS polling | SNMPv3, PRTG / LibreNMS / Cacti — see SNMP v2c/v3 Configuration Lab |
| 5 | Enable NetFlow export on key routers/switches | NetFlow v9 / IPFIX → collector — see NetFlow Monitoring |
| 6 | Configure IP SLA probes for critical paths | Cisco IP SLA, syslog threshold alerts |
| 7 | Centralise syslog from all devices | Syslog server (rsyslog, Graylog, Splunk) |
| 8 | Run collection for 1–2 full business weeks | NMS dashboards, flow reports |
| 9 | Analyse and document normal ranges (avg, peak, off-peak) | Spreadsheet, baseline report document |
| 10 | Set thresholds / alerts in NMS for deviation from baseline | SNMP thresholds, IP SLA reactions, syslog filters |
| 11 | Schedule periodic baseline reviews (quarterly / after changes) | Change management process, updated baseline report |
7. Network Documentation Best Practices
A baseline is only as useful as the documentation that captures and preserves it. Good documentation habits ensure that findings remain accessible and actionable — especially during a late-night outage when memory cannot be relied upon.
| Documentation Element | Best Practice |
|---|---|
| Network diagrams | Maintain both physical (rack layout, cabling) and logical (IP, VLAN, routing) diagrams; keep them version-controlled |
| IP address management (IPAM) | Track every assigned IP, subnet, VLAN, and gateway in a tool such as phpIPAM or Infoblox — never rely on memory or sticky notes |
| Change log | Record every configuration change with date, author, reason, and rollback procedure — a baseline after an undocumented change is meaningless |
| Baseline report | A dated document containing normal metric ranges per device and link, stored alongside the configuration archives |
| Escalation procedures | Document what to do when a specific threshold is breached — who to call, which runbook to follow |
8. Baseline Tools — Quick Reference Summary
| Tool | Layer / Function | Primary Baseline Use | NetsTuts Page |
|---|---|---|---|
| SNMP | Application — device polling | Interface counters, CPU, memory, uptime | SNMP Overview |
| NetFlow / IPFIX | Application — flow export | Traffic composition, top talkers, protocols | NetFlow Overview |
| Ping / ICMP | Network — reachability & RTT | Latency and loss between key endpoints | Ping |
| Traceroute | Network — path discovery | Hop-by-hop path and per-hop latency | Traceroute |
| Syslog | Application — event logging | Event rate, severity distribution, interface flaps | Syslog |
| Cisco IP SLA | Network — synthetic probes | Continuous RTT, jitter, loss with threshold alerts | IP SLA Lab |
| Wireshark / tcpdump | Data Link/Network — packet capture | Deep inspection of anomalous traffic patterns | Wireshark | tcpdump |
| show commands (IOS) | Device — point-in-time snapshots | Interface stats, routing table, CPU/memory at baseline | show interfaces |
Related pages: SNMP Versions | SNMP Community Strings | NetFlow Monitoring | debug ip packet | show ip protocols | IP SLA Syslog Alerting Lab