Network Troubleshooting Methodology
1. Why Methodology Matters
When a network problem occurs, the instinct is to act immediately — to try something, anything, that might fix it. This reactive, unstructured approach — sometimes called "random troubleshooting" — leads to wasted time, additional problems, and the same issue recurring because the root cause was never identified.
A systematic troubleshooting methodology provides a structured process: gather symptoms, isolate the problem to a specific layer or device, test a hypothesis, implement a fix, and verify the resolution. Every experienced network engineer develops a methodology — the approaches described in this page are the formal frameworks that underpin those instincts. They are tested in the CCNA exam and applied daily in production networks.
| Random Troubleshooting | Systematic Troubleshooting |
|---|---|
| Try random fixes until something works | Gather information, form a hypothesis, test it |
| May introduce new problems (rebooting a working device) | Changes are deliberate and targeted — minimal risk |
| Root cause often never identified | Root cause always identified — prevents recurrence |
| Undocumented — knowledge lost after the incident | Documented — builds institutional knowledge |
| Escalation is chaotic — no clear starting point | Clear handoff — documented state of investigation |
Related pages: Related pages: Troubleshooting Connectivity | OSI Model | TCP/IP Model | ping Command | traceroute Command | Debug Commands | show interfaces | show ip route | Wireshark | End-to-End Troubleshooting Scenario Lab
2. The Structured Troubleshooting Process
Regardless of which specific approach (top-down, bottom-up, etc.) is used, all systematic troubleshooting follows the same underlying process. This process is based on the scientific method applied to network engineering.
Structured troubleshooting process — universal framework:
Step 1: DEFINE THE PROBLEM
─────────────────────────────────────────────────────────────────────
- What exactly is not working? (specific symptom, not vague report)
- Who is affected? (one user, one site, everyone?)
- When did it start? (time of first occurrence)
- Has anything changed recently? (new config, maintenance, hardware swap)
- Is it intermittent or constant?
- Gather output: ping results, error messages, show command output
Step 2: GATHER INFORMATION
─────────────────────────────────────────────────────────────────────
- Review relevant show commands on affected devices
- Check syslog for error messages around the time of the problem
- Check SNMP alerts / NMS dashboard
- Review recent change log — what was changed and when?
- Reproduce the problem if possible (proves it is real and consistent)
Step 3: ANALYSE THE INFORMATION
─────────────────────────────────────────────────────────────────────
- What is the baseline? (what should the output look like?)
- What is abnormal in the collected output?
- Which OSI layer is the problem most likely at?
- What devices are in the path between source and destination?
Step 4: FORM A HYPOTHESIS
─────────────────────────────────────────────────────────────────────
- State a specific, testable theory: "The problem is a missing route
on R2 because show ip route on R2 does not show 10.20.0.0/24"
- Rank hypotheses by likelihood — test the most probable first
Step 5: TEST THE HYPOTHESIS
─────────────────────────────────────────────────────────────────────
- Design a test that either confirms or eliminates the hypothesis
- Test one variable at a time — changing multiple things simultaneously
makes it impossible to know which change fixed the problem
Step 6: IMPLEMENT THE SOLUTION
─────────────────────────────────────────────────────────────────────
- Apply the fix (add the missing route, correct the misconfiguration)
- Have a rollback plan ready before making changes
Step 7: VERIFY AND DOCUMENT
─────────────────────────────────────────────────────────────────────
- Confirm the problem is fully resolved (end-to-end verification)
- Check that no new problems were introduced
- Document: root cause, fix applied, verification steps, prevention measures
3. The OSI Model as a Troubleshooting Framework
The OSI model provides the foundation for all systematic network troubleshooting approaches. By associating symptoms with specific OSI layers, an engineer can narrow the investigation to a specific protocol, technology, or device type without exhaustively checking everything.
| OSI Layer | Name | Technologies / Protocols | Common Problem Symptoms | Key Diagnostic Tools |
|---|---|---|---|---|
| 7 | Application | HTTP, HTTPS, DNS, DHCP, FTP, SMTP, SSH | Application works for some users but not others; browser error messages; authentication failures; specific application cannot connect | curl, web browser, application logs,
nslookup, dig |
| 6 | Presentation | SSL/TLS, encryption, data encoding | SSL certificate errors; encryption negotiation failures; data corruption / garbled output | openssl, certificate inspection tools |
| 5 | Session | NetBIOS, RPC, SIP, H.323 | Sessions drop unexpectedly; cannot establish or maintain session; VoIP call setup failures | Application logs, Wireshark session analysis |
| 4 | Transport | TCP, UDP | TCP connection resets; sessions time out; high retransmission rate; ACL blocking specific ports | netstat, Wireshark,
show ip access-lists |
| 3 | Network | IP, ICMP, OSPF, EIGRP, BGP, ACLs, NAT | Ping fails but Layer 2 works; routing loop; wrong route in table; NAT misconfiguration; ACL blocking traffic | ping, traceroute,
show ip route, show ip protocols |
| 2 | Data Link | Ethernet, 802.11, VLANs, STP, ARP, MAC address table | Ping to default gateway fails; interface up but no Layer 3 connectivity; STP loop; duplex mismatch; VLAN misconfiguration | show interfaces,
show mac address-table,
show spanning-tree,
show vlan |
| 1 | Physical | Cables, connectors, transceivers, NICs, ports | Interface "down/down"; no link light; high error counters; intermittent connectivity; CRC errors | show interfaces (CRC, input errors),
cable tester, optical power meter, visual inspection |
See: OSI Model | OSI Layer Functions | TCP/IP Model
4. Bottom-Up Troubleshooting
Bottom-up troubleshooting starts at Layer 1 (Physical) and works upward through the OSI stack layer by layer until the problem is found. The core principle: each layer depends on the layers below it. If Layer 1 has a fault, fixing it might resolve what appeared to be a Layer 3 problem — no point investigating routing if the cable is faulty.
Bottom-Up Workflow
Bottom-up troubleshooting — start at Layer 1, work upward: Layer 1 — Physical: Is the cable connected? Is the link up? ─────────────────────────────────────────────────────────── Check: show interfaces Gi0/0 Look for: "GigabitEthernet0/0 is down, line protocol is down" Check: interface counters — input errors, CRC, giants, runts Check: cable, connector, SFP, media converter → If Layer 1 is OK, proceed to Layer 2. Layer 2 — Data Link: Is the correct VLAN assigned? Is STP OK? ─────────────────────────────────────────────────────────────── Check: show vlan brief — is the port in the correct VLAN? Check: show spanning-tree — is the port in forwarding state? Check: show mac address-table — does the switch know the MAC? Check: show interfaces — duplex/speed mismatch? → If Layer 2 is OK, proceed to Layer 3. Layer 3 — Network: Is there a valid IP route? ───────────────────────────────────────────────────────────── Check: show ip route — does a route to the destination exist? Check: ping <default-gateway> — can the device reach its gateway? Check: show ip interface brief — is the IP address correct? Check: show ip protocols — is the routing protocol running? → If Layer 3 is OK, proceed to Layer 4. Layer 4 — Transport: Is an ACL blocking the port? ───────────────────────────────────────────────────────────── Check: show ip access-lists — is traffic being denied? Check: Is the specific TCP/UDP port open? (telnet <ip> <port>) → If Layer 4 is OK, proceed to Layer 7 (application). Layer 7 — Application: Is the service running? ───────────────────────────────────────────────────────────── Check: Is the server listening on the expected port? Check: Is DNS resolving correctly? Check: Application logs for errors
When to Use Bottom-Up
| Use Bottom-Up When... | Reason |
|---|---|
| Physical or Data Link layer problems are suspected (new cable run, hardware change, port flapping) | Layer 1/2 problems are common and inexpensive to check — eliminate them first |
| The problem is a complete connectivity failure (no ping, interface down/down) | Total failure usually starts at the physical layer — working upward is logical |
| The symptom is unfamiliar and the layer is unknown | Systematic layer-by-layer approach ensures nothing is missed when the problem is ambiguous |
5. Top-Down Troubleshooting
Top-down troubleshooting starts at Layer 7 (Application) and works downward through the OSI stack. The rationale: the user experiences an application problem — if the application works but an underlying layer is marginal, the problem still manifests at the top. Starting at the application layer tests the entire stack end-to-end immediately.
Top-Down Workflow
Top-down troubleshooting — start at Layer 7, work downward:
Layer 7 — Application: Does the application work?
─────────────────────────────────────────────────────────────────
Test: Open a browser and navigate to the web server.
Result: "ERR_CONNECTION_REFUSED" or timeout.
→ Application layer is failing. Is it a server issue or a network issue?
Test: Can the user ping the server IP? (bypasses DNS and app)
Layer 3 — Network: Can IP reach the server?
─────────────────────────────────────────────────────────────────
Test: ping 192.168.10.5 from the client.
Result: Ping succeeds (100% success, correct RTT).
→ Layer 3 is fine. The problem is at Layer 4 or above.
Layer 4 — Transport: Is the service port reachable?
─────────────────────────────────────────────────────────────────
Test: telnet 192.168.10.5 80 (test TCP port 80 reachability)
Result: Connection refused / timeout.
→ Either the server is not listening on port 80, or an ACL
is blocking port 80 between client and server.
Check ACL: show ip access-lists on routers in the path.
Found: access-list 110 deny tcp any any eq 80
Root cause: ACL is blocking HTTP traffic.
Fix: Remove or modify the ACL entry.
When to Use Top-Down
| Use Top-Down When... | Reason |
|---|---|
| The problem is application-specific (web works but FTP fails; email broken but browsing OK) | Application-specific problems often have application or transport layer causes — no need to check cables |
| The user has clearly described an application failure ("I cannot open Outlook") rather than total connectivity loss | Partial failures usually indicate upper-layer issues — lower layers are likely working |
| Network Layer 1/2/3 is known to be operational (users can ping, access some services) | No need to re-verify lower layers that are visibly working — start where the problem is |
6. Divide-and-Conquer Troubleshooting
Divide-and-conquer (also called half-splitting) starts in the middle of the OSI stack — typically at Layer 3 with a ping test — and uses the result to eliminate half the stack immediately. If Layer 3 works, the problem is at Layer 4 or above. If Layer 3 fails, the problem is at Layer 3 or below. Each test cuts the remaining search space in half.
Divide-and-Conquer Workflow
Divide-and-conquer — binary search through the OSI stack: Start at Layer 3 (middle ground): Test: ping <destination IP> ┌─────────────────────────────────────────────────────────────────┐ │ PING SUCCEEDS │ PING FAILS │ │ Layer 3 and below are OK │ Problem is Layer 3 or below│ │ → Focus on Layer 4 and above │ → Focus on Layer 1, 2, or 3│ └─────────────────────────────────────────────────────────────────┘ If PING FAILS → test Layer 3 specifically: → Check show ip route — does a route exist? → Check show ip interface brief — correct IP, interface up? If routes exist but ping still fails → drill down to Layer 2: → Check ARP — is the MAC learned? → Check VLAN — is the port in the correct VLAN? → Check STP — is the port in forwarding state? If Layer 2 is OK but ping still fails → check Layer 1: → show interfaces — CRC errors, physical down? If PING SUCCEEDS → test Layer 4: → telnet <ip> <port> — is the specific port reachable? → show ip access-lists — any hits on deny statements? If Layer 4 OK → test Layer 7: → Application log, service status on server Each test halves the remaining search space — efficient for unknown problems.
When to Use Divide-and-Conquer
| Use Divide-and-Conquer When... | Reason |
|---|---|
| The problematic layer is unknown and could be anywhere in the stack | Most efficient approach when you have no hypothesis — each test eliminates 50% of possible causes |
| Time is critical and you need to narrow down the problem quickly | Divide-and-conquer typically reaches the root cause in 2–3 tests vs 6–7 tests for bottom-up on a 7-layer stack |
| The problem spans multiple possible OSI layers | Starting in the middle avoids committing to a direction before any evidence points one way |
7. Follow-the-Path Troubleshooting
Follow-the-path (also called path isolation or trace-the-packet) follows the actual route a packet takes from source to destination, examining each device along the path. Rather than working vertically through OSI layers on one device, this approach works horizontally across the network topology — device by device, hop by hop.
Follow-the-Path Workflow
Scenario: PC (192.168.1.10) cannot reach Server (10.20.0.5) Network path: [PC] ─── [Switch SW1] ─── [Router R1] ─── [Router R2] ─── [Switch SW2] ─── [Server] Step 1: Determine the path using traceroute: PC# traceroute 10.20.0.5 1 192.168.1.1 (R1) — 2 ms ← reaches R1 2 * * * ← timeout at R2 or beyond → Packet reaches R1 but something fails at or after R2. Step 2: Examine R1: R1# show ip route 10.20.0.5 → Route exists: via 172.16.0.2 (R2). R1 is OK. R1# ping 10.20.0.5 source Lo0 → Fails. Problem is downstream from R1. Step 3: Examine R2: R2# show ip route 10.20.0.5 → 10.20.0.0/24 is directly connected, Gi0/1. R2 routing OK. R2# ping 10.20.0.5 → Fails. Problem is at R2 or between R2 and the server. R2# show ip arp 10.20.0.5 → ARP entry missing. R2 cannot resolve server MAC. R2# show interfaces Gi0/1 → "GigabitEthernet0/1 is up, line protocol is down" ← FOUND: R2's interface toward SW2 is down (Layer 1 or 2 issue) Step 4: Examine SW2: SW2# show interfaces Gi0/24 → Port is down — cable unplugged on SW2's uplink to R2. ROOT CAUSE: Physical disconnection between R2 Gi0/1 and SW2 Gi0/24.
Key Follow-the-Path Tools
| Tool | Purpose in Path Troubleshooting |
|---|---|
traceroute |
Identifies the last responding hop — shows exactly where the path breaks. Timeouts (***) indicate where packets stop. |
ping with source |
ping <dst> source <interface> simulates traffic
from a specific interface — confirms which hop the problem is on
vs which hop is reporting it. |
show ip route |
Confirms the route exists on each hop and points to the correct next-hop toward the destination. |
show ip arp |
Confirms ARP resolution is working at each hop — a missing ARP entry on the last-hop router is a common issue. |
show interfaces |
Physical and data link status at each hop; error counters reveal transmission problems on a specific link. |
show cdp neighbors |
Confirms physical adjacency — which devices are connected to which port, and their platform/model. |
When to Use Follow-the-Path
| Use Follow-the-Path When... | Reason |
|---|---|
| The path between source and destination traverses multiple routers and switches | The problem could be on any device in the path — following the path pinpoints the exact failing device |
| traceroute shows where the path breaks (first timeout hop) | traceroute has already identified the approximate location — follow-the-path investigates that device in detail |
| Intermittent connectivity — some paths work, some do not | A specific device in a specific path is misbehaving — following the affected path isolates it |
8. Comparing the Four Approaches
| Approach | Starting Point | Direction | Best For | Main Limitation |
|---|---|---|---|---|
| Bottom-Up | Layer 1 (Physical) | Upward through OSI | Unknown layer; suspected physical; complete failure | Slow when problem is at upper layers |
| Top-Down | Layer 7 (Application) | Downward through OSI | Application-specific failure; lower layers known OK | Requires application access; slow if lower layers broken |
| Divide-and-Conquer | Layer 3 (Network — ping) | Binary split up or down | Unknown layer; time-critical; most efficient method | Requires experience to interpret middle-layer tests |
| Follow-the-Path | Source device | Horizontal — hop by hop | Multi-hop path failure; after traceroute points to a device | Requires access to each device in the path |
Hybrid Approach — Real-World Practice
In practice, experienced engineers use a hybrid:
1. Start with divide-and-conquer (ping test) to determine which half of
the OSI stack contains the problem.
2. If physical layer is suspect (interface down/down, errors) →
switch to bottom-up for detailed Layer 1/2 investigation.
3. If application-specific (ping works, app fails) →
switch to top-down for Layer 4/7 investigation.
4. If the problem involves a multi-hop path →
use follow-the-path (traceroute + per-device show commands)
to identify the exact failing device, then apply bottom-up or
divide-and-conquer on that specific device.
The approaches are tools — use the right tool for the current phase
of the investigation. Switching methods is not inconsistency;
it is efficiency.
9. Layer-by-Layer Diagnostic Commands
Knowing which commands to run at each OSI layer is as important as knowing which methodology to use. The following is a practical reference for the most important diagnostic commands at each layer.
Layer 1 — Physical
Key commands: show interfaces — check for up/down status, CRC errors, input errors, giants, runts. See also Cable Testing Tools.
Router# show interfaces GigabitEthernet 0/0
GigabitEthernet0/0 is up, line protocol is up ← L1 and L2 status
GigabitEthernet0/0 is down, line protocol is down ← L1 failure (cable, SFP)
GigabitEthernet0/0 is up, line protocol is down ← L1 OK, L2 failure (keepalive)
GigabitEthernet0/0 is administratively down ← shutdown command applied
Hardware CRC/error counters (signs of Layer 1 problems):
5 minute input rate 1234000 bits/sec, 0 packets/sec
Input errors: 14, CRC: 14 ← faulty cable, bad connector, electrical noise
Giants: 0, Runts: 0 ← frame size issues
No buffer: 0, Ignored: 0
Key threshold: any non-zero and increasing CRC errors = physical problem.
Router# show interfaces status (on switches)
Port Name Status Vlan Duplex Speed Type
Gi0/1 connected 1 a-full a-1000 10/100/1000BaseTX
Gi0/2 notconnect 1 -- auto 10/100/1000BaseTX ← no link
Layer 2 — Data Link
Key commands: show vlan brief, show mac address-table, show spanning-tree (STP), show arp. See also VLANs.
! VLAN and port assignment: Switch# show vlan brief Switch# show interfaces GigabitEthernet 0/1 switchport ! MAC address table — is the destination MAC known? Switch# show mac address-table dynamic Switch# show mac address-table address AA:BB:CC:DD:EE:FF ! Spanning Tree — is the port in forwarding state? Switch# show spanning-tree vlan 10 ! Port states: root, designated, alternate, backup ! Port roles: forwarding, blocking, listening, learning ! Duplex and speed (duplex mismatch causes high errors): Switch# show interfaces GigabitEthernet 0/1 | include duplex|speed ! ARP table — Layer 2 to Layer 3 mapping: Router# show arp Router# show ip arp 192.168.1.5 ! check specific entry
Layer 3 — Network
Key commands: show ip route, show ip interface brief, show ip protocols, ping. Routing protocols: OSPF, EIGRP, BGP. See also ACLs and NAT.
! Routing table — does a route exist to the destination? Router# show ip route Router# show ip route 10.20.0.0 Router# show ip route 10.20.0.5 ! specific host lookup ! Interface IP configuration: Router# show ip interface brief ! all interfaces, status, IP Router# show ip interface Gi0/0 ! detailed per-interface IP info ! Routing protocol status: Router# show ip protocols ! which protocols are running Router# show ip ospf neighbor ! OSPF adjacencies Router# show ip eigrp neighbors ! EIGRP adjacencies Router# show ip bgp summary ! BGP peer status ! Ping — end-to-end Layer 3 connectivity: Router# ping 10.20.0.5 Router# ping 10.20.0.5 source GigabitEthernet 0/0 ! from specific source Router# ping 10.20.0.5 repeat 100 ! extended: 100 pings
Layer 4 — Transport
Key commands: show ip access-lists (ACLs), telnet <ip> <port> to test TCP port reachability. See also Firewalls and Common Port Numbers.
! ACL hit counters — are packets being denied? Router# show ip access-lists ! Look for non-zero match counters on deny statements ! Test TCP port reachability: Router# telnet 10.20.0.5 80 ! test if port 80 is open Trying 10.20.0.5, 80 ... % Connection refused by remote host ← port closed or ACL blocking ! On the host (Windows): C:\> telnet 10.20.0.5 443 ! test HTTPS port C:\> Test-NetConnection 10.20.0.5 -Port 443 (PowerShell)
Layer 7 — Application
Key commands: nslookup, dig for DNS resolution; show ip dhcp binding for DHCP; ssh for remote access. See also HTTP/HTTPS.
! DNS resolution: C:\> nslookup www.example.com C:\> nslookup www.example.com 8.8.8.8 ! query specific DNS server Router# show hosts ! locally cached DNS entries ! DHCP troubleshooting: Router# show ip dhcp binding ! current DHCP leases Router# show ip dhcp conflict ! addresses with ARP conflicts Router# debug ip dhcp server events ! trace DHCP request/offer process ! SSH connection test (confirm Layer 7 SSH service is running): $ ssh [email protected] ssh: connect to host 192.168.1.1 port 22: Connection refused ← SSH service not running or port blocked
10. Common Problem Patterns and Their Layer
Experience builds a mental library of problem patterns — certain symptoms that almost always point to a specific layer and cause. The following table is a reference for the most common network problems and where they live in the OSI stack.
| Symptom | Most Likely Layer | Most Common Cause | First Check |
|---|---|---|---|
| Interface "down/down" | Layer 1 | Cable unplugged, broken cable, faulty SFP, speed/auto-negotiation failure | show interfaces — physical status and error
counters |
| Interface "up/down" | Layer 2 | Keepalive failure, encapsulation mismatch (serial), no HDLC/PPP peer | show interfaces — encapsulation, keepalives |
| High CRC / input errors | Layer 1 | Damaged cable, faulty connector, interference, duplex mismatch | show interfaces — CRC counter;
inspect cable and connectors |
| Ping to default gateway fails | Layer 2 or Layer 3 | Wrong VLAN, STP blocking, wrong IP/subnet on PC or gateway interface | show vlan, show arp,
show ip int brief |
| Ping succeeds but application fails | Layer 4 or Layer 7 | ACL blocking the specific port, firewall rule, server not listening, DNS failure | show ip access-lists,
telnet <ip> <port>,
nslookup |
| Routing loop / high CPU on router | Layer 3 | Redistributed default route causing a loop, summary route pointing back, static route loop | show ip route — look for routes with
very low admin distance or recursive loops |
| Intermittent connectivity (flapping) | Layer 1 or Layer 2 | Marginal cable (passes some traffic, fails under load), STP topology change, routing protocol instability | show interfaces — input/output errors over time;
show spanning-tree detail — TCN events |
| Users in one VLAN cannot reach users in another | Layer 2 or Layer 3 | Missing inter-VLAN routing, wrong default gateway, ACL blocking inter-VLAN traffic | show ip route on the L3 switch/router;
verify default gateways on clients |
| DNS resolution failing | Layer 7 | Wrong DNS server configured, DNS server unreachable, UDP 53 blocked by ACL | nslookup <name>;
ping <DNS-server-IP> |
| DHCP not providing addresses | Layer 3 or Layer 7 | Missing DHCP relay (ip helper-address) when server
is on different subnet, DHCP pool exhausted,
server misconfiguration |
show ip dhcp binding;
show ip dhcp pool;
verify ip helper-address on gateway |
11. Documenting the Troubleshooting Process
Documentation is an integral part of systematic troubleshooting — not an afterthought. Real-time notes during an incident serve as a working memory aid, enable smooth handoff to colleagues, and build the institutional knowledge base that prevents the same problem from taking as long to resolve next time.
Minimum information to document during a troubleshooting incident: ┌─────────────────────────────────────────────────────────────────────┐ │ INCIDENT RECORD │ │ │ │ Date/Time reported: 2025-03-15 14:32 UTC │ │ Reported by: NOC Engineer / User Help Desk ticket #12345 │ │ │ │ SYMPTOM: PC users in VLAN 10 (Finance) cannot reach server │ │ 192.168.20.5 (Payroll). Issue started ~14:15 UTC. │ │ Users in VLAN 20 (HR) CAN reach the server. │ │ │ │ RECENT CHANGES: Access switch SW3 was replaced at 13:45 UTC. │ │ │ │ INVESTIGATION: │ │ 14:35 — Ping from PC (10.10.1.5) to server fails │ │ 14:36 — Ping from PC to default gateway (10.10.1.1) fails │ │ 14:38 — show vlan brief on SW3: Fa0/1 is in VLAN 1 (should be 10)│ │ │ │ ROOT CAUSE: Replacement switch SW3 not configured with VLAN 10. │ │ Port Fa0/1 defaulted to VLAN 1. │ │ │ │ FIX: Configured SW3 Fa0/1: switchport access vlan 10 │ │ │ │ VERIFICATION: Ping from Finance PCs to server succeeds. │ │ All Finance users confirm access restored at 14:47. │ │ │ │ PREVENTION: Add SW3 to configuration management — apply standard │ │ VLAN config via Ansible playbook on next change window. │ └─────────────────────────────────────────────────────────────────────┘
12. Troubleshooting Methodology Summary — Key Facts
| Topic | Key Fact |
|---|---|
| Structured process steps | Define problem → gather info → analyse → form hypothesis → test hypothesis → implement fix → verify and document |
| Most powerful first question | "What changed recently?" — most problems follow a recent change |
| Bottom-up | Start Layer 1 → work upward; best for suspected physical problems or total connectivity failure |
| Top-down | Start Layer 7 → work downward; best for application-specific failures when lower layers are known to work |
| Divide-and-conquer | Start Layer 3 (ping); result divides stack in half; most efficient when problematic layer is unknown |
| Follow-the-path | Horizontal hop-by-hop investigation; best for multi-hop paths; used after traceroute points to a suspect device |
| Ping interpretation | Success = L1, L2, L3 all working end-to-end; Failure = problem at L1, L2, or L3 — see ping Command |
| Interface "down/down" | Layer 1 physical failure — cable, SFP, port — check show interfaces |
| Interface "up/down" | Layer 2 failure — encapsulation mismatch, keepalive, peer — check show interfaces |
| Ping works, app fails | Layer 4 (port blocked by ACL or firewall) or Layer 7 (server not listening, DNS failure) |
| Change one thing at a time | Making multiple changes simultaneously makes root cause identification impossible |
| Documentation | Document in real time — symptom, timeline, investigation steps, root cause, fix, prevention |