Network Baselining & Documentation

Baseline Metric	Why It Matters
Bandwidth utilisation (%)	Reveals whether links are being driven near capacity, helping justify upgrades before users feel the impact
CPU & memory on routers/switches	High CPU can indicate routing loops, DoS attacks, or misconfigured processes; memory exhaustion can cause crashes
Round-trip latency (RTT)	Establishes expected delay between sites; a sudden increase points to congestion, routing changes, or hardware faults
Packet loss rate	Even 1–2 % loss severely degrades TCP throughput and VoIP quality; baseline exposes hidden intermittent issues
Error counters (CRC, input/output)	Physical-layer faults like a bad cable or duplex mismatch appear as growing error counters on interfaces
Top talkers / top protocols	Shows which hosts and applications dominate traffic — critical for QoS planning and detecting rogue activity
Broadcast & multicast rates	Excessively high broadcast rates can saturate a VLAN and degrade all hosts in that broadcast domain

SNMP Version	Security	Baseline Use
SNMPv1	Community string (plaintext)	Legacy only — avoid in new deployments
SNMPv2c	Community string (plaintext)	Common in labs and small environments
SNMPv3	Authentication + encryption	Recommended for production baselining. See: SNMP Versions

Flow Technology	Vendor / Standard	Key Feature
NetFlow v5	Cisco proprietary	Fixed format, widely supported by collectors
NetFlow v9	Cisco (template-based)	Flexible templates; supports IPv6 and MPLS fields
IPFIX	IETF standard (RFC 7011)	Vendor-neutral; based on NetFlow v9 design
sFlow	RFC 3176	Packet sampling — lower overhead on high-speed links

Command	What It Baselines
`show interfaces`	Input/output rates, error counters, duplex, speed, resets
`show ip interface brief`	Interface status and IP assignment overview
`show ip route`	Routing table state — documents expected routes
`show processes cpu`	CPU utilisation per process — reveals high-CPU processes at baseline
`show processes memory`	Memory usage — documents normal free memory levels
`show version`	IOS version, uptime, hardware model — essential inventory data
`show running-config`	Full configuration snapshot for change comparison
`show logging`	Recent log messages and logging configuration

Category	Metric	Collection Tool
Utilisation	Interface bandwidth %, CPU %, memory %	SNMP / NMS graphs
Latency	RTT (ms) between key node pairs	Ping, IP SLA
Loss	Packet loss % per link	Ping, IP SLA
Errors	CRC, input errors, output drops	SNMP, show interfaces
Traffic composition	Top protocols, top talkers, applications	NetFlow / IPFIX
Events	Syslog message rate, severity distribution	Syslog server

Anomaly Type	Baseline Metric That Reveals It	Possible Root Cause
Sudden bandwidth spike	Interface utilisation exceeds baseline peak	Backup job misconfigured, malware, new application
Increased latency	RTT to a site doubles vs baseline average	Congestion, routing change, failing WAN circuit
Packet loss on a link	Loss % rises from 0 % baseline to 2 %+	Duplex mismatch, bad cable, failing transceiver
CPU spike on a router	CPU % far above baseline idle/average	Routing loop, DoS attack, excessive debug left on
Unknown top talker	NetFlow shows new host consuming large share	Rogue device, compromised host, data exfiltration
Route flap	Syslog message rate spikes; new prefixes in routing table	Unstable BGP/OSPF neighbour, physical link issues
High broadcast rate	Broadcast counter exceeds baseline in a VLAN	Broadcast storm, spanning-tree loop, ARP flood

Step	Action	Tools / Output
1	Define scope — which devices, links, and services are in scope	Network diagram, device inventory list
2	Collect inventory — hostname, model, IOS version, IP addresses	`show version`, CDP/LLDP, spreadsheet
3	Archive configurations at known-good state	RANCID, Oxidized, TFTP backup
4	Enable SNMP on all devices; configure NMS polling	SNMPv3, PRTG / LibreNMS / Cacti — see SNMP v2c/v3 Configuration Lab
5	Enable NetFlow export on key routers/switches	NetFlow v9 / IPFIX → collector — see NetFlow Monitoring
6	Configure IP SLA probes for critical paths	Cisco IP SLA, syslog threshold alerts
7	Centralise syslog from all devices	Syslog server (rsyslog, Graylog, Splunk)
8	Run collection for 1–2 full business weeks	NMS dashboards, flow reports
9	Analyse and document normal ranges (avg, peak, off-peak)	Spreadsheet, baseline report document
10	Set thresholds / alerts in NMS for deviation from baseline	SNMP thresholds, IP SLA reactions, syslog filters
11	Schedule periodic baseline reviews (quarterly / after changes)	Change management process, updated baseline report

Documentation Element	Best Practice
Network diagrams	Maintain both physical (rack layout, cabling) and logical (IP, VLAN, routing) diagrams; keep them version-controlled
IP address management (IPAM)	Track every assigned IP, subnet, VLAN, and gateway in a tool such as phpIPAM or Infoblox — never rely on memory or sticky notes
Change log	Record every configuration change with date, author, reason, and rollback procedure — a baseline after an undocumented change is meaningless
Baseline report	A dated document containing normal metric ranges per device and link, stored alongside the configuration archives
Escalation procedures	Document what to do when a specific threshold is breached — who to call, which runbook to follow

Tool	Layer / Function	Primary Baseline Use	NetsTuts Page
SNMP	Application — device polling	Interface counters, CPU, memory, uptime	SNMP Overview
NetFlow / IPFIX	Application — flow export	Traffic composition, top talkers, protocols	NetFlow Overview
Ping / ICMP	Network — reachability & RTT	Latency and loss between key endpoints	Ping
Traceroute	Network — path discovery	Hop-by-hop path and per-hop latency	Traceroute
Syslog	Application — event logging	Event rate, severity distribution, interface flaps	Syslog
Cisco IP SLA	Network — synthetic probes	Continuous RTT, jitter, loss with threshold alerts	IP SLA Lab
Wireshark / tcpdump	Data Link/Network — packet capture	Deep inspection of anomalous traffic patterns	Wireshark \| tcpdump
show commands (IOS)	Device — point-in-time snapshots	Interface stats, routing table, CPU/memory at baseline	show interfaces

What is the primary purpose of establishing a network baseline?

A. To configure SNMP community strings on all devices B. To document normal network behaviour so that future deviations can be identified as anomalies C. To replace the need for network monitoring tools D. To archive running configurations only

Correct answer is B. A network baseline documents what normal looks like — performance metrics, traffic patterns, latency, and device resource utilisation during typical operations. Without this reference, it is impossible to objectively determine whether current behaviour is degraded or simply how the network always operated. The baseline is the foundation of all anomaly detection.

Which tool provides the most detailed visibility into which applications and hosts are generating traffic on a link?

A. Ping B. SNMP C. NetFlow D. Traceroute

Correct answer is C. NetFlow (and its standards-based equivalent IPFIX) exports flow records containing source IP, destination IP, transport-layer ports, and byte counts. This reveals exactly who is talking to whom, using which protocol, and how much data was transferred. SNMP can only report total byte counts on an interface — it cannot break down traffic by application or host. Ping and traceroute measure reachability and path, not traffic composition.

A network engineer notices that a router's CPU is at 90 % but cannot determine whether this is a problem. What fundamental network management practice would have made this determination straightforward?

A. Having an established baseline showing normal CPU utilisation ranges for that router B. Rebooting the router to reset CPU counters C. Upgrading the router's IOS version D. Disabling all debug commands

Correct answer is A. Without a baseline, 90 % CPU could be normal (e.g., during a scheduled backup window) or catastrophic (e.g., a routing loop). A baseline that documents the router typically runs at 15 % average and peaks at 40 % during business hours immediately tells the engineer that 90 % is a serious anomaly requiring investigation. This is the core value of baselining.

How does traceroute differ from ping in its contribution to network baselining?

A. Traceroute measures bandwidth; ping measures latency B. They are functionally identical for baselining purposes C. Ping maps the path; traceroute tests end-to-end loss only D. Ping baselines end-to-end RTT and loss; traceroute baselines the hop-by-hop path and per-hop latency, allowing future path deviations to be detected

Correct answer is D. Ping provides an aggregate end-to-end RTT and loss figure — useful for detecting that something is wrong between source and destination. Traceroute adds granularity by showing every intermediate hop, its identity (IP address / hostname), and its RTT contribution. A baseline traceroute documents the expected path. Future traceroutes that show a new intermediate hop, a missing hop, or excessive latency at a specific hop immediately localise the problem without guesswork.

Which SNMP version is recommended for production network baselining and why?

A. SNMPv1 — it is the most widely supported across legacy devices B. SNMPv3 — it provides authentication and encryption, protecting sensitive device data from eavesdropping and tampering C. SNMPv2c — the community string is sufficient security D. SNMP version does not matter for baselining

Correct answer is B. SNMPv3 is the only SNMP version that supports both message authentication (MD5 or SHA — verifying the source) and encryption (DES or AES — protecting the payload). SNMPv1 and SNMPv2c transmit community strings and MIB data in plaintext, making them vulnerable to interception. In a production environment, an attacker capturing SNMP traffic could obtain community strings and use them to read or modify device configurations. SNMPv3 eliminates this risk.

A baseline established on a Sunday morning shows very low bandwidth utilisation. Why would this be a poor baseline for anomaly detection on weekday afternoons?

A. Sunday measurements are technically inaccurate B. SNMP polling does not work on weekends C. The baseline would not capture peak business-hours traffic, so normal weekday afternoon utilisation would falsely appear as an anomaly D. Baselines should only be collected on Fridays

Correct answer is C. A baseline must be representative of all normal operating conditions — including peak business hours. Measuring only during quiet periods produces artificially low normal ranges. When weekday afternoon traffic hits its normal peak, the monitoring system would generate false-positive alerts because the current utilisation exceeds the (unrealistically low) baseline threshold. Best practice is to collect data over one to two full business weeks to capture both peak and off-peak patterns.

What is the advantage of Cisco IP SLA over manual ping testing for baselining latency?

A. IP SLA runs continuously in the background 24/7, building a historical record of RTT and loss, and can trigger automatic alerts when thresholds are exceeded — manual pings only provide a point-in-time snapshot B. IP SLA uses a different protocol than ICMP, making it more accurate C. Manual pings are always more accurate than IP SLA D. IP SLA only works on Cisco Catalyst switches

Correct answer is A. IP SLA automates the continuous measurement of network performance metrics (RTT, jitter, packet loss) and logs results to the device's history table. Unlike a manual ping — which is a one-time test that captures only the current moment — IP SLA runs indefinitely at a configurable frequency (e.g., every 60 seconds) and builds a statistical history. It can also send SNMP traps or syslog messages when configured thresholds are breached, enabling proactive alerting rather than reactive discovery.

NetFlow data suddenly shows a previously unknown internal host sending large volumes of traffic to an external IP address at 2 AM. Why is a baseline essential to classifying this as an anomaly?

A. Without a baseline, the NMS cannot display NetFlow data B. NetFlow only records anomalous traffic automatically C. Baselines configure ACLs to block unusual traffic D. The baseline documents normal top-talker behaviour and expected traffic patterns; without it, the engineer cannot distinguish a new legitimate backup job from data exfiltration

Correct answer is D. NetFlow shows you what is happening right now — but without context, it cannot tell you whether what is happening is normal. A baseline that captures normal top-talker hosts, expected traffic volumes during off-hours, and known application patterns allows the engineer to immediately recognise that this host was not a top talker before, and that 2 AM large-volume external transfers are not part of any documented backup job. This context transforms raw flow data into an actionable security alert.

1. What Is a Network Baseline?

2. Tools Used for Network Baselining

2.1 SNMP (Simple Network Management Protocol)

2.2 NetFlow (and IPFIX / sFlow)

2.3 Ping

2.4 Traceroute

2.5 Syslog

2.6 show Commands on Cisco IOS

3. What a Network Baseline Should Capture

3.1 Inventory & Topology Documentation

3.2 Performance Metrics (the Statistical Baseline)

3.3 Configuration Snapshots

3.4 Availability Records

4. Why Baselines Are Essential for Anomaly Detection

5. Cisco IP SLA — Automated Baselining of Latency and Loss

6. How to Build a Network Baseline — Step-by-Step Process

7. Network Documentation Best Practices

8. Baseline Tools — Quick Reference Summary

Practice Quiz – Network Baselining & Documentation