Network Baselining & Documentation

1. What Is a Network Baseline?

A network baseline is a snapshot of how your network performs under normal, everyday operating conditions. It records metrics such as bandwidth utilisation, CPU and memory loads on devices, latency between key nodes, error rates, and traffic patterns — all measured during a representative period of typical activity.

Once you know what "normal" looks like, any deviation from that norm immediately stands out as a potential anomaly: a sudden spike in CPU usage, an unexpected rise in broadcast traffic, or latency that doubles overnight. Without a baseline, you are comparing your network against nothing — making it nearly impossible to distinguish a degraded network from one that simply always ran that way.

Baseline Metric Why It Matters
Bandwidth utilisation (%) Reveals whether links are being driven near capacity, helping justify upgrades before users feel the impact
CPU & memory on routers/switches High CPU can indicate routing loops, DoS attacks, or misconfigured processes; memory exhaustion can cause crashes
Round-trip latency (RTT) Establishes expected delay between sites; a sudden increase points to congestion, routing changes, or hardware faults
Packet loss rate Even 1–2 % loss severely degrades TCP throughput and VoIP quality; baseline exposes hidden intermittent issues
Error counters (CRC, input/output) Physical-layer faults like a bad cable or duplex mismatch appear as growing error counters on interfaces
Top talkers / top protocols Shows which hosts and applications dominate traffic — critical for QoS planning and detecting rogue activity
Broadcast & multicast rates Excessively high broadcast rates can saturate a VLAN and degrade all hosts in that broadcast domain

Related pages: SNMP Overview | SNMP Versions | NetFlow Overview | NetFlow Monitoring | Ping | Traceroute | Syslog | Troubleshooting Methodology | show interfaces | show ip route | show running-config

2. Tools Used for Network Baselining

Several complementary tools are used together to build a complete picture of normal network behaviour. Each tool captures a different layer or dimension of network activity.

2.1 SNMP (Simple Network Management Protocol)

SNMP polls managed devices (routers, switches, servers) at regular intervals and retrieves counters stored in the device's MIB (Management Information Base). These counters include interface byte counts, error counters, CPU utilisation, memory free, and dozens of other variables. A Network Management System (NMS) such as PRTG, LibreNMS, or Cacti graphs these counters over time, making trends immediately visible.

SNMP Version Security Baseline Use
SNMPv1 Community string (plaintext) Legacy only — avoid in new deployments
SNMPv2c Community string (plaintext) Common in labs and small environments
SNMPv3 Authentication + encryption Recommended for production baselining. See: SNMP Versions

Key SNMP OIDs useful for baselining include ifInOctets / ifOutOctets (interface traffic), ifInErrors / ifOutErrors (interface errors), and sysUpTime (device uptime — resets reveal unplanned reboots).

2.2 NetFlow (and IPFIX / sFlow)

While SNMP tells you how much traffic crossed an interface, NetFlow tells you who sent it, where it was going, and which protocol it was. A Cisco router or switch enabled for NetFlow exports flow records (source IP, destination IP, ports, protocol, byte count) to a NetFlow collector such as ntopng, SolarWinds NTA, or Elastic Stack.

During baselining, NetFlow data reveals the top-talker hosts, dominant applications (HTTP, voice, backup jobs), and time-of-day traffic patterns. This information is indispensable for QoS policy design and for detecting data exfiltration or internal port scans.

Flow Technology Vendor / Standard Key Feature
NetFlow v5 Cisco proprietary Fixed format, widely supported by collectors
NetFlow v9 Cisco (template-based) Flexible templates; supports IPv6 and MPLS fields
IPFIX IETF standard (RFC 7011) Vendor-neutral; based on NetFlow v9 design
sFlow RFC 3176 Packet sampling — lower overhead on high-speed links

2.3 Ping

Ping uses ICMP Echo Request / Echo Reply to measure round-trip time (RTT) and packet loss between two endpoints. During baselining, scheduled ping tests from a management host to key nodes — default gateways, DNS servers, WAN endpoints, and servers — establish expected RTT and loss figures.

Example: if baseline pings from a branch router to HQ show an average RTT of 12 ms with 0 % loss, a future reading of 80 ms with 5 % loss is an unambiguous anomaly worthy of investigation.

! Cisco extended ping — useful for baseline testing
Router# ping 10.1.1.1 repeat 100 size 1400 timeout 2
Type escape sequence to abort.
Sending 100, 1400-byte ICMP Echos to 10.1.1.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 10/12/18 ms

2.4 Traceroute

Traceroute (or tracert on Windows) maps the hop-by-hop path packets take from source to destination, reporting the RTT at each hop. A baseline traceroute documents the expected path and per-hop latency figures. If a future traceroute shows a new hop, a changed path, or dramatically increased latency at a specific hop, this immediately localises the problem to a segment of the network.

! Cisco traceroute
Router# traceroute 8.8.8.8
Type escape sequence to abort.
Tracing the route to 8.8.8.8
  1  192.168.1.1    2 msec  1 msec  2 msec
  2  10.0.0.1       8 msec  9 msec  8 msec
  3  203.0.113.1   12 msec 11 msec 12 msec
  4  8.8.8.8       14 msec 13 msec 14 msec

2.5 Syslog

Syslog collects log messages from all network devices onto a central syslog server. During baselining, reviewing logs helps document the normal rate of informational and debug messages. Later, a sudden flood of severity-3 (Error) or severity-2 (Critical) messages deviating from baseline log rates flags an event requiring immediate attention.

2.6 show Commands on Cisco IOS

Manual snapshots of Cisco IOS show commands capture point-in-time state and counters that feed into documentation. Key commands for baselining:

Command What It Baselines
show interfaces Input/output rates, error counters, duplex, speed, resets
show ip interface brief Interface status and IP assignment overview
show ip route Routing table state — documents expected routes
show processes cpu CPU utilisation per process — reveals high-CPU processes at baseline
show processes memory Memory usage — documents normal free memory levels
show version IOS version, uptime, hardware model — essential inventory data
show running-config Full configuration snapshot for change comparison
show logging Recent log messages and logging configuration

3. What a Network Baseline Should Capture

A thorough baseline covers four main categories of information.

3.1 Inventory & Topology Documentation

Before measuring performance, you must document what exists. Inventory records include device hostnames, models, IOS/firmware versions, serial numbers, physical locations, management IP addresses, and interface-to-neighbour mappings. Tools such as show cdp neighbors detail and show lldp neighbors detail automate discovery of directly connected devices.

A logical topology diagram showing IP addressing, VLANs, and routing domains should accompany the inventory. This diagram becomes the reference when a fault requires rapid identification of affected segments.

3.2 Performance Metrics (the Statistical Baseline)

Collect performance data over a representative period — typically one to two full business weeks — to capture both peak and off-peak behaviour. Measuring only during quiet periods will make normal business-hours traffic look like an anomaly. Metrics to collect:

Category Metric Collection Tool
Utilisation Interface bandwidth %, CPU %, memory % SNMP / NMS graphs
Latency RTT (ms) between key node pairs Ping, IP SLA
Loss Packet loss % per link Ping, IP SLA
Errors CRC, input errors, output drops SNMP, show interfaces
Traffic composition Top protocols, top talkers, applications NetFlow / IPFIX
Events Syslog message rate, severity distribution Syslog server

3.3 Configuration Snapshots

A baseline is not only about performance numbers. Archiving the running configuration of every device — routers, switches, firewalls, and wireless controllers — at a known-good state means any future unauthorised or accidental change can be detected by a simple diff against the baseline archive. Tools like RANCID, Oxidized, and Cisco DNA Center automate configuration archiving and change detection.

3.4 Availability Records

Track uptime and downtime for every critical device and link. SNMP traps and syslog messages triggered by interface state changes (line protocol up/down) feed into availability calculations. A baseline availability record of 99.9 % on a core link makes a month with 97 % availability immediately reportable as a deviation requiring RCA (Root Cause Analysis).

4. Why Baselines Are Essential for Anomaly Detection

Anomaly detection is fundamentally a comparison exercise: current behaviour vs expected behaviour. The baseline defines "expected." Without it, the following scenarios are difficult or impossible to detect reliably:

Anomaly Type Baseline Metric That Reveals It Possible Root Cause
Sudden bandwidth spike Interface utilisation exceeds baseline peak Backup job misconfigured, malware, new application
Increased latency RTT to a site doubles vs baseline average Congestion, routing change, failing WAN circuit
Packet loss on a link Loss % rises from 0 % baseline to 2 %+ Duplex mismatch, bad cable, failing transceiver
CPU spike on a router CPU % far above baseline idle/average Routing loop, DoS attack, excessive debug left on
Unknown top talker NetFlow shows new host consuming large share Rogue device, compromised host, data exfiltration
Route flap Syslog message rate spikes; new prefixes in routing table Unstable BGP/OSPF neighbour, physical link issues
High broadcast rate Broadcast counter exceeds baseline in a VLAN Broadcast storm, spanning-tree loop, ARP flood

Related pages: Troubleshooting Methodology | Troubleshooting Connectivity | NetFlow Monitoring | SNMP Traps | Syslog Severity Levels | Debug Commands

5. Cisco IP SLA — Automated Baselining of Latency and Loss

Cisco IP SLA (Service Level Agreement) is a built-in IOS feature that continuously generates synthetic test traffic (ICMP, UDP, TCP, HTTP) between network devices and records RTT, jitter, and loss statistics. Unlike manual ping tests, IP SLA runs 24/7 in the background, logging results to the device's history table and optionally triggering SNMP traps or syslog messages when thresholds are exceeded.

! Configure IP SLA ICMP echo — baseline RTT to 10.1.1.1
Router(config)# ip sla 1
Router(config-ip-sla)# icmp-echo 10.1.1.1 source-interface GigabitEthernet0/0
Router(config-ip-sla-echo)# frequency 60
Router(config-ip-sla-echo)# exit
Router(config)# ip sla schedule 1 life forever start-time now

! Verify
Router# show ip sla statistics 1
IPSLAs Latest Operation Statistics
IPSLA operation id: 1
        Latest RTT: 12 milliseconds
Latest operation start time: *03:15:22.345 UTC
Latest operation return code: OK
Number of successes: 1440
Number of failures: 0
Operation time to live: Forever

IP SLA data can be polled by SNMP using the CISCO-RTTMON-MIB, feeding directly into NMS dashboards for long-term baseline graphing.

6. How to Build a Network Baseline — Step-by-Step Process

Follow this structured process to create a baseline that is actually useful for future anomaly detection and troubleshooting.

Step Action Tools / Output
1 Define scope — which devices, links, and services are in scope Network diagram, device inventory list
2 Collect inventory — hostname, model, IOS version, IP addresses show version, CDP/LLDP, spreadsheet
3 Archive configurations at known-good state RANCID, Oxidized, TFTP backup
4 Enable SNMP on all devices; configure NMS polling SNMPv3, PRTG / LibreNMS / Cacti — see SNMP v2c/v3 Configuration Lab
5 Enable NetFlow export on key routers/switches NetFlow v9 / IPFIX → collector — see NetFlow Monitoring
6 Configure IP SLA probes for critical paths Cisco IP SLA, syslog threshold alerts
7 Centralise syslog from all devices Syslog server (rsyslog, Graylog, Splunk)
8 Run collection for 1–2 full business weeks NMS dashboards, flow reports
9 Analyse and document normal ranges (avg, peak, off-peak) Spreadsheet, baseline report document
10 Set thresholds / alerts in NMS for deviation from baseline SNMP thresholds, IP SLA reactions, syslog filters
11 Schedule periodic baseline reviews (quarterly / after changes) Change management process, updated baseline report

7. Network Documentation Best Practices

A baseline is only as useful as the documentation that captures and preserves it. Good documentation habits ensure that findings remain accessible and actionable — especially during a late-night outage when memory cannot be relied upon.

Documentation Element Best Practice
Network diagrams Maintain both physical (rack layout, cabling) and logical (IP, VLAN, routing) diagrams; keep them version-controlled
IP address management (IPAM) Track every assigned IP, subnet, VLAN, and gateway in a tool such as phpIPAM or Infoblox — never rely on memory or sticky notes
Change log Record every configuration change with date, author, reason, and rollback procedure — a baseline after an undocumented change is meaningless
Baseline report A dated document containing normal metric ranges per device and link, stored alongside the configuration archives
Escalation procedures Document what to do when a specific threshold is breached — who to call, which runbook to follow

8. Baseline Tools — Quick Reference Summary

Tool Layer / Function Primary Baseline Use NetsTuts Page
SNMP Application — device polling Interface counters, CPU, memory, uptime SNMP Overview
NetFlow / IPFIX Application — flow export Traffic composition, top talkers, protocols NetFlow Overview
Ping / ICMP Network — reachability & RTT Latency and loss between key endpoints Ping
Traceroute Network — path discovery Hop-by-hop path and per-hop latency Traceroute
Syslog Application — event logging Event rate, severity distribution, interface flaps Syslog
Cisco IP SLA Network — synthetic probes Continuous RTT, jitter, loss with threshold alerts IP SLA Lab
Wireshark / tcpdump Data Link/Network — packet capture Deep inspection of anomalous traffic patterns Wireshark | tcpdump
show commands (IOS) Device — point-in-time snapshots Interface stats, routing table, CPU/memory at baseline show interfaces

Related pages: SNMP Versions | SNMP Community Strings | NetFlow Monitoring | debug ip packet | show ip protocols | IP SLA Syslog Alerting Lab

Practice Quiz – Network Baselining & Documentation

1. What is the primary purpose of establishing a network baseline?

Correct answer is B. A network baseline documents what normal looks like — performance metrics, traffic patterns, latency, and device resource utilisation during typical operations. Without this reference, it is impossible to objectively determine whether current behaviour is degraded or simply how the network always operated. The baseline is the foundation of all anomaly detection.

2. Which tool provides the most detailed visibility into which applications and hosts are generating traffic on a link?

Correct answer is C. NetFlow (and its standards-based equivalent IPFIX) exports flow records containing source IP, destination IP, transport-layer ports, and byte counts. This reveals exactly who is talking to whom, using which protocol, and how much data was transferred. SNMP can only report total byte counts on an interface — it cannot break down traffic by application or host. Ping and traceroute measure reachability and path, not traffic composition.

3. A network engineer notices that a router's CPU is at 90 % but cannot determine whether this is a problem. What fundamental network management practice would have made this determination straightforward?

Correct answer is A. Without a baseline, 90 % CPU could be normal (e.g., during a scheduled backup window) or catastrophic (e.g., a routing loop). A baseline that documents the router typically runs at 15 % average and peaks at 40 % during business hours immediately tells the engineer that 90 % is a serious anomaly requiring investigation. This is the core value of baselining.

4. How does traceroute differ from ping in its contribution to network baselining?

Correct answer is D. Ping provides an aggregate end-to-end RTT and loss figure — useful for detecting that something is wrong between source and destination. Traceroute adds granularity by showing every intermediate hop, its identity (IP address / hostname), and its RTT contribution. A baseline traceroute documents the expected path. Future traceroutes that show a new intermediate hop, a missing hop, or excessive latency at a specific hop immediately localise the problem without guesswork.

5. Which SNMP version is recommended for production network baselining and why?

Correct answer is B. SNMPv3 is the only SNMP version that supports both message authentication (MD5 or SHA — verifying the source) and encryption (DES or AES — protecting the payload). SNMPv1 and SNMPv2c transmit community strings and MIB data in plaintext, making them vulnerable to interception. In a production environment, an attacker capturing SNMP traffic could obtain community strings and use them to read or modify device configurations. SNMPv3 eliminates this risk.

6. A baseline established on a Sunday morning shows very low bandwidth utilisation. Why would this be a poor baseline for anomaly detection on weekday afternoons?

Correct answer is C. A baseline must be representative of all normal operating conditions — including peak business hours. Measuring only during quiet periods produces artificially low normal ranges. When weekday afternoon traffic hits its normal peak, the monitoring system would generate false-positive alerts because the current utilisation exceeds the (unrealistically low) baseline threshold. Best practice is to collect data over one to two full business weeks to capture both peak and off-peak patterns.

7. What is the advantage of Cisco IP SLA over manual ping testing for baselining latency?

Correct answer is A. IP SLA automates the continuous measurement of network performance metrics (RTT, jitter, packet loss) and logs results to the device's history table. Unlike a manual ping — which is a one-time test that captures only the current moment — IP SLA runs indefinitely at a configurable frequency (e.g., every 60 seconds) and builds a statistical history. It can also send SNMP traps or syslog messages when configured thresholds are breached, enabling proactive alerting rather than reactive discovery.

8. NetFlow data suddenly shows a previously unknown internal host sending large volumes of traffic to an external IP address at 2 AM. Why is a baseline essential to classifying this as an anomaly?

Correct answer is D. NetFlow shows you what is happening right now — but without context, it cannot tell you whether what is happening is normal. A baseline that captures normal top-talker hosts, expected traffic volumes during off-hours, and known application patterns allows the engineer to immediately recognise that this host was not a top talker before, and that 2 AM large-volume external transfers are not part of any documented backup job. This context transforms raw flow data into an actionable security alert.

← Back to Home