schoolSession 05

Building a Small Town ISP Network

A real-world scenario: Irene, Dennis, Msabi, and Daudi build a network infrastructure from the ground up. Learn how everything connects in practice.

DE
David Emiru Egwell
CTO · SprintUG Internet Limited
4
Team Members
3
Network Sites
500+
Customers
10km
Fiber Reach
tocSession Outline

The Team & Their Small Town ISP

Meet the team that's building SprintMbale — Tanzania's newest regional ISP. They're connecting three towns with fiber-optic cables, serving 500+ customers. Each team member owns a piece of the puzzle.

👩‍💻
Irene
Network Engineer
IP: 192.168.1.3

Manages the core routing and firewall. Designs IP subnets.

👨‍💼
Dennis
NOC Technician
IP: 192.168.1.4

Monitors customer connections, troubleshoots outages.

👨‍🔧
Msabi
Hardware Tech
IP: 192.168.1.5

Field installations, fiber terminations, cable management.

👨‍💻
Daudi
Systems Admin
IP: 192.168.1.7

Manages customer billing, access portals, DNS.

The Three Towns

TownLocationNetworkCustomersKey Device
MbeyaHQ Site (Mountain)192.168.1.0/24150MikroTik CCR (Core Router)
Iringa15 km East192.168.2.0/24200MikroTik hEX
Moshi10 km South192.168.3.0/24150MikroTik hEX

The Infrastructure

  • check_circleLayer 1 (Physical): Fiber optic trunk between cities (SFP-10G-LR, 10km single-mode). Copper to customer buildings (RJ45). Radio wireless for remote areas.
  • check_circleLayer 2 (Switches): Mikrotik CSS106 switches at each site (24 ports). Customer edge equipment varies (home routers, outdoor APs).
  • check_circleLayer 3 (Routing): OSPF between the three router sites. BGP to upstream ISP (TZCOM). Static routes for customer networks.
  • check_circleLayer 4 (Firewall/Security): Mikrotik firewall NAT rules. Customer isolation via VLANs. Rate limiting per customer.

The Network Architecture

Here's how SprintMbale's network is physically and logically structured:

Physical Network Diagram

🌐 TZCOM (Tanzanian ISP Backbone) ↓ 203.0.198.1 (Upstream Gateway) ↓ ┌────────────────────────┐ │ MBEYA (HQ) │ │ CCR (Core Router) │ │ 197.248.25.100 │ ← Public IP │ 192.168.1.1 (gateway) │ └──────────┬─────────────┘ │ │ │ │ SFP optical fiber │ (2x10G, redundant) │ ┌───┴──────────────────────┬──────────────┐ │ (15km) (10km)│ ↓ ↓ ┌────────────┐ ┌──────────────┐ │ IRINGA │ │ MOSHI │ │ hEX Router │ │ hEX Router │ │ 192.168.2.1│ │ 192.168.3.1 │ └────────────┘ └──────────────┘ Each town connects to ~50-200 customer buildings Via: RJ45 copper, WiFi, or wireless backbone

IP Addressing Hierarchy

SprintMbale IP Space: 197.248.25.0/24 (Public) 192.168.0.0/16 (Private) ┌─────────────────────────────┐ │ PUBLIC INTERNET FACING │ ├─────────────────────────────┤ │ 197.248.25.100/32 HQ Router WAN Interface │ 197.248.25.50/32 Customer-facing website │ 197.248.25.60/32 NOC portal (Dennis's monitoring) │ (Others reserved for future services) └─────────────────────────────┘ ┌─────────────────────────────┐ │ MBEYA (HQ) INTERNAL │ ├─────────────────────────────┤ │ 192.168.1.0/24 │ │ 192.168.1.1 → Gateway (CCR) │ 192.168.1.2 → Management VLAN (Irene, Dennis, Msabi, Daudi login here) │ 192.168.1.3 → Irene's workstation │ 192.168.1.4 → Dennis' workstation │ 192.168.1.5 → Msabi's workstation │ 192.168.1.7 → Daudi's workstation │ 192.168.1.100-150 → Staff WiFi (pool) │ 192.168.1.200 → NTP Server (time sync) │ 192.168.1.201 → DNS cache (recursive) │ 192.168.1.202 → RADIUS Auth (for WiFi) └─────────────────────────────┘ ┌─────────────────────────────┐ │ IRINGA REMOTE SITE │ ├─────────────────────────────┤ │ 192.168.2.0/24 │ │ 192.168.2.1 → Gateway (hEX router) │ 192.168.2.2 → Management │ 192.168.2.100/25 → Customer pool (up to 100 IPs) │ (192.168.2.100–192.168.2.127) │ 192.168.2.150/25 → Backup segment └─────────────────────────────┘ ┌─────────────────────────────┐ │ MOSHI REMOTE SITE │ ├─────────────────────────────┤ │ 192.168.3.0/24 │ │ 192.168.3.1 → Gateway (hEX router) │ 192.168.3.100/25 → Customer pool │ 192.168.3.150/25 → Backup segment └─────────────────────────────┘ Routing Table (Mbeya Core Router): Destination Next Hop Protocol Cost 192.168.1.0/24 Direct Connected 0 192.168.2.0/24 192.168.2.1 OSPF 10 192.168.3.0/24 192.168.3.1 OSPF 10 0.0.0.0/0 203.0.198.1 BGP 100

VLAN Segmentation

VLAN 1 (MGMT): 192.168.1.2/24 → Staff management, secure access VLAN 2 (CUSTOMERS): 192.168.1.100-150 → Customer-facing (though CPEs get own IPs) VLAN 3 (GUEST): 192.168.1.251-254 → Guest WiFi, limited bandwidth VLAN 10 (BACKUP): 192.168.1.200/24 → Failover routing, redundancy Firewall Rules Between VLANs: ✓ MGMT can access CUSTOMERS (monitoring) ✓ CUSTOMERS cannot access MGMT ✓ GUEST has 512 Kbps rate limit ✓ All VLANs blocked to/from internet except via NAT

Communication Scenarios

Scenario 1: Customer Connects to the Internet

Setup: A customer in Iringa (town #2) has subscribed to SprintMbale. They have a home Router at 192.168.2.130 (assigned from pool). They visit youtube.com.

Step-by-step flow: 1. CUSTOMER DEVICE (Home) Browser: "GET https://www.youtube.com/" Device IP: 192.168.2.130 Sends to: Gateway 192.168.2.1 (Iringa router) 2. IRINGA ROUTER (hEX at 192.168.2.1) Receives packet: [SRC 192.168.2.130] [DST 8.8.8.8:443] Checks routing table: "8.8.8.8? Not local. Not in 192.168.0.0/16" Default route: 0.0.0.0/0 → 192.168.1.1 (Mbeya) Action: Forward to Mbeya Before forwarding, apply NAT-OUT rule: [SRC 192.168.2.130 ] → [SRC 197.248.25.100 ] (Masquerade) (Customer's private IP hidden behind public ISP IP) Packet now: [SRC 197.248.25.100] [DST 8.8.8.8:443] 3. FIBER TRUNK (Layer 1/2) Packet travels 15km of fiber from Iringa to Mbeya Signal converted: Electrical → Optical (by Iringa SFP) Transmitted at 10Gbps Received at Mbeya SFP: Optical → Electrical Latency: ~1ms 4. MBEYA CCR (Core Router at 192.168.1.1) Receives packet on internal interface Checks routing: "Destination 8.8.8.8? Not mine. Not in 192.168.0.0/16" Default route: 0.0.0.0/0 → 203.0.198.1 (TZCOM uplink) Firewall check: "Outbound HTTPS from 197.248.25.100" Rule: ALLOW (established connection tracking) Before forwarding to ISP, translate again: [SRC 197.248.25.100] → [SRC 197.248.25.0] (re-masquerade if using carrier-grade NAT) Send to: 203.0.198.1 (TZCOM gateway) 5. INTERNET BACKBONE Packet travels through TZCOM's network TZCOM → Google AS (8.8.8.8 anycast DNS) ~10-20 hops Google data center receives packet Responds: [SRC 8.8.8.8:443] [DST 197.248.25.100] 6. RETURN PATH (reverse) Google's response comes back through TZCOM Arrives at 203.0.198.1 Mbeya CCR receives: "Who is 197.248.25.100?" Firewall state table: "This is a known outbound connection" NAT reverse mapping: 197.248.25.100 → 192.168.2.130 Translate: [DST 197.248.25.100] → [DST 192.168.2.130] Send to: 192.168.1.1 Iringa interface 7. FIBER RETURN (1ms) Packet travels 15km of fiber back to Iringa 8. IRINGA ROUTER Receives: [SRC 8.8.8.8:443] [DST 192.168.2.130] Checks ARP: "What MAC has 192.168.2.130?" Gets customer's home router MAC Forwards on layer 2 to customer's MAC 9. CUSTOMER DEVICE Receives YouTube response Browser renders page Total latency: ~150ms (Google usually faster than 8.8.8.8 recursive, direct TCP video stream) KEY INSIGHTS: • Customer never knows they're being NATted • NAT happens TWICE (Iringa, then Mbeya Jungles) • Fiber latency is negligible (1ms for 15km) • Return path uses connection tracking, not MAC learning • YouTube video streams through same path, multiple TCP flows

Scenario 2: Irene Manages Daudi's Router Remotely

Setup: Daudi (192.168.1.7) is in Mbeya. Irene (192.168.1.3) needs to SSH into daemon-router.local (192.168.2.1 in Iringa) to troubleshoot a BGP issue.

Irene on her workstation: $ ssh admin@192.168.2.1 What happens: 1. IRENE'S PC (192.168.1.3) Resolves: "192.168.2.1? Is Iringa router. That's on our network." No ARP needed; she checks routing: "192.168.2.1 is in 192.168.2.0/24" "I don't have direct connection to that. Default gateway: 192.168.1.1" Sends SSH (port 22) to 192.168.1.1 2. MBEYA CCR Firewall (192.168.1.1) Sees: [SRC 192.168.1.3] [DST 192.168.2.1:22] (SSH) Is this allowed? Check firewall rules: "MGMT VLAN can access CUSTOMERS VLAN"? Yes. Irene is in MGMT VLAN. Iringa is CUSTOMER VLAN. Actually, WAIT. Let's fix this: Iringa is a REMOTE SITE. Use rule: "MGMT can access remote routers"? Yes. Create state entry: "SSH from 192.168.1.3:54321 → 192.168.2.1:22" 3. CCR Routes to Iringa "192.168.2.1? That's at Iringa. Use OSPF route." OSPF says: 192.168.2.1 is directly connected to 192.168.2.1 (obviously) But Iringa IS.the gateway for 192.168.2.0/24. So actually, the CCR sends packet to Iringa's WAN interface. Because OSPF adjacency, next hop is: 192.168.2.1 via OSPF neighbor Direct connection on fiber. Intra-network routing. Forward packet over fiber to Iringa using MAC of Iringa router 4. IRINGA hEX (192.168.2.1) Receives SSH packet [DST 192.168.2.1:22] Checks: "Port 22 is listening on me (SSH server running)" Accepts connection Responds: SYN-ACK back to 192.168.1.3 5. TCP HANDSHAKE (3-way) 192.168.1.3 → 192.168.2.1 [SYN] ← [SYN-ACK] → [ACK] Connection established 6. SSH PROTOCOL PublicKey authentication (no password) Irene's SSH key verified Remote shell opened Irene types: "show bgp summary" Remote output sent back across fiber to her terminal STATISTICS: • Latency: ~2-4ms round-trip (fiber + hardware processing) • Bandwidth: Negligible (SSH is mostly text) • Security: SSH encrypted (port 22 used) • Firewall: State tracking allowed the return traffic automatically

Scenario 3: Dennis Monitors Network Health

Setup: Dennis (192.168.1.4) is in the NOC (Network Operations Center) at Mbeya. He checks:

  • check_circleCustomer bandwidth usage in Iringa
  • check_circleFiber link health (latency, packet loss)
  • check_circleActive BGP sessions
Dennis's Monitoring Dashboard (Web, port 8080): 1. SNMP POLLING (from Mbeya to Iringa) Dennis's monitoring tool: "Check Iringa hEX CPU usage" Tool → SNMPv3 query to 192.168.2.1 (community string authenticated) Iringa responds: CPU 45%, Memory 78%, up 156 days Dashboard shows GREEN (all healthy) 2. PING TEST (Latency check) Mbeya CCR → ICMP echo to 192.168.2.1 Response: 1.2 ms (excellent, means fiber is healthy) Mbeya CCR → Ping to Moshi 192.168.3.1 Response: 0.8 ms (even better, slightly shorter fiber) 3. BGP STATUS Mbeya CCR → show bgp summary Session to 203.0.198.1 (TZCOM): UP, 256 routes learned Advertised: 3 networks (192.168.1, 2, 3) Uptime: 45 days since last session drop (excellent stability) 4. CUSTOMER BANDWIDTH Flow tracking at Iringa border: 25 active customers in Iringa Aggregate traffic: 150 Mbps downstream, 30 Mbps upstream (Within expected capacity for a 1 Gbps trunk) 5. ALERT: One Customer Exceeds Rate Limit Customer at 192.168.2.131 downloading a torrent Rate: 50 Mbps (exceeds their 10 Mbps SLA) Firewall action: "Apply QoS, queue to 10 Mbps" Dashboard shows YELLOW warning Dennis checks: "Valid? Billing says: 'Home Package' 10 Mbps" Actions: Flag for billing team to contact customer All of this works because: • SNMP uses UDP port 161 (allowed in firewall) • ICMP ping allowed (actually, sometimes blocked, but they allow it on routers) • BGP runs on port 179 (authenticated, only to TZCOM) • QoS/rate limiting configured on all customer ports

Troubleshooting & Real Issues

Issue 1: "Iringa Customers Can't Reach Moshi"

Problem: Wednesday morning, 10:15 AM. Dennis gets calls: "Internet is down in Iringa!" But Mbeya customers are fine, and Moshi is fine too.

DIAGNOSIS (Irene checks): 1. Check fiber physical layer (Layer 1) Mbeya CCR optics: [Iringa port] RX power: -3.5 dBm (normal: -3 to 0dBm) [Moshi port] RX power: -2.8 dBm (normal) Fiber to Iringa appears OK (signal level normal) 2. Check routing (Layer 3) Mbeya: "show route" 192.168.2.0/24 via 192.168.2.1 (OSPF, distance 110) 192.168.3.0/24 via 192.168.3.1 (OSPF, distance 110) Both routes EXIST in routing table 3. Test Layer 3 connectivity $ ping 192.168.2.1 Response: OK, 1.2ms $ ping 192.168.2.100 (Iringa customer) Response: TIMEOUT (bad!) $ ping 192.168.3.1 Response: OK, 0.8ms $ traceroute 192.168.2.100 1. 192.168.1.1 (Mbeya, OK) 2. 192.168.2.1 (Iringa, OK) 3. 192.168.2.100 (TIMEOUT - packet lost here!) 4. Problem is BEYOND Iringa router! Irene thinks: "Switch at Iringa? Or customer CPE?" 5. Check Iringa switch (Layer 2) SSH to 192.168.2.1 $ show interfaces * Port 1-24: UP * But Port 13 (customer segment): DOWN! Msabi is in the field. Called: "Port 13 on the Switch - no link." Msabi goes to the building, checks the cable: "RJ45 plugged into the patch panel, but..." Msabi notices: Cable is unplugged on SWITCH side! Fiber damage? No. Simple cable came loose. 6. SOLUTION Msabi re-plugs the cable into port 13 on the Iringa switch Waits 2 seconds... LED lights up: LINK UP From Mbeya: $ ping 192.168.2.100 Response: OK, 1.2ms! All customers' PCs get internet again. TOTAL OUTAGE: 15 minutes ROOt CAUSE: Physical Layer (cable unplugged - possibly from power surge or vibration) PREVENTION: Cable clips, surge protectors, scheduled inspections THE LESSON: Always start at Layer 1. Before debugging routing, check the cable!

Issue 2: "NAT Not Working — Customer Gets Blocked"

Problem: Customer at 192.168.1.105 in Mbeya complains: "I can't access my work VPN. It keeps timing out."

ANALYSIS (Irene + Dennis collaborate): 1. Dennis tests from his workstation: $ curl http://192.168.1.105:8080 (VPN app) Response: Connection refused Hmm, service isn't even running locally. Customer's PC specs: "Windows 10, OpenVPN client" Customer says: "OpenVPN opens, tries to connect, then times out after 60 sec" 2. Irene checks firewall state & NAT: Customer's OpenVPN uses port 1194 (UDP) Customer connects to: 198.51.100.10 (workplace VPN server) Expected flow: [SRC 192.168.1.105:59234] → [DST 198.51.100.10:1194] ↓ NAT-OUT ↓ [SRC 197.248.25.100:59234] → [DST 198.51.100.10:1194] Firewall rule check: "Outbound UDP"? Firewall rule: ALLOW (default allow for established sessions) But WAIT: UDP is stateless! Firewall NAT rule: "Masquerade outbound UDP"? Does the rule exist? $ show firewall nat /ip firewall nat chain=srcnat protocol=tcp action=masquerade chain=srcnat protocol=icmp action=masquerade Aha! TCP and ICMP are allowed, but NO UDP rule! 3. ROOT CAUSE OpenVPN (UDP port 1194) isn't in the NAT rules! Packets leave 192.168.1.105 with original IP Workplace VPN server sees packet from 192.168.1.105 (private IP) Tries to respond to 192.168.1.105 Response can't route back (192.168.1.105 doesn't exist on internet) Customer never sees reply Timeout! 4. SOLUTION Irene adds NAT rule: /ip firewall nat add chain=srcnat protocol=udp action=masquerade OR more specific: /ip firewall nat add chain=srcnat dst-address=198.51.100.10/32 protocol=udp action=masquerade Customer re-opens OpenVPN Now packets are NATted: [SRC 192.168.1.105] → [SRC 197.248.25.100] Workplace VPN responds to 197.248.25.100 Firewall de-NATtes return: [DST 197.248.25.100] → [DST 192.168.1.105] Customer receives response VPN connects successfully! THE LESSON: NAT rules must be explicit. UDP, TCP, and ICMP are treated separately. Most firewalls default to TCP/ICMP only. VPN, DNS, and gaming (UDP) need explicit rules.

Issue 3: "BGP Route Flapping"

Problem: Every 30 seconds, traffic to internet cuts out for 5 seconds. Customers report: "Websites load, then hang, then load again."

INVESTIGATION: 1. Irene checks BGP status: $ show bgp summary BGP session to 203.0.198.1: FLAPPING (alternates UP/DOWN) BGP log: [10:00:01] Session established [10:00:30] Session lost (TCP connection reset) [10:00:32] Reconnect attempt [10:00:33] Session established [10:00:59] Session lost again Pattern: Lost every ~30 seconds 2. Check firewall rules: Is port 179 (BGP) being rate-limited? $ show firewall filter stats "Found it!" One rule: "Limit BGP to 10 packets/second" BGP keepalive + updates exceed 10 pps Firewall drops excess packets TCP connection times out Session drops Reconvergence takes 30 seconds Repeat! 3. SOLUTION Irene removes the overly-strict BGP rate limit: /ip firewall filter remove [find comment="Limit BGP"] OR adjust it: /ip firewall filter set [find comment="Limit BGP"] limit=1000/s BGP session stabilizes Traffic is now consistent Websites load smoothly THE LESSON: Firewalls can help but also hurt if misconfigured. BGP MUST NOT be rate-limited. Critical protocols: DNS (53), NTP (123), BGP (179), SSH (22) — never rate-limit these.

Issue 4: "DNS Broken — Websites Won't Load"

Problem: Customers report: "Internet is working, but Google, Facebook, YouTube — nothing loads by name. But if I type in an IP address directly (8.8.8.8), it works!"

DIAGNOSIS: Customer's symptoms: • ping google.com → times out (can't resolve domain) • ping 8.8.8.8 → works perfectly (IP works) • Browser shows "DNS_PROBE_FINISHED_NXDOMAIN" or "Failed to resolve" This is a CLASSIC DNS failure signature! STEP 1: Test DNS from customer location Dennis (customer) runs on his PC: $ nslookup google.com Server: 192.168.2.1 (ISP nameserver) *** Can't find google.com: Server failed That's VERY bad. ISP's nameserver is not responding. STEP 2: Irene tests from Mbeya HQ Irene runs: $ nslookup google.com 192.168.2.2 Server: 192.168.2.2 (Iringa DNS server) *** Can't find google.com: Server failed Same problem! But wait: $ nslookup google.com 8.8.8.8 Name: google.com Address: 142.251.41.14 ✓ Works! This tells Irene: • 8.8.8.8 (Google's DNS) works • 192.168.2.2 (SprintMbale's DNS) broken • Customer's router points to broken nameserver STEP 3: Check SprintMbale's DNS server Irene SSH's to dns-server.local (192.168.2.2): $ systemctl status named ● named.service - BIND DNS Server Loaded: loaded Active: inactive (dead) BOOM! DNS service crashed! Check the log: $ tail -50 /var/log/named.log [Jun 05 14:23:22] Error: Zone file corrupt [Jun 05 14:23:23] Shutting down... [Jun 05 14:23:24] fatal: unable to load zone "example.local" STEP 4: Restart DNS Irene restarts: $ systemctl restart named $ systemctl status named ● named.service - BIND DNS Server Loaded: loaded Active: active (running) Now test: $ nslookup google.com 192.168.2.2 Name: google.com Address: 142.251.41.14 ✓ Works! Dennis immediately reports: • Browser works • google.com loads • YouTube loads • No more DNS_PROBE errors STEP 5: Prevent future failures Issue: What if the DNS server crashes again? Solution: Set up redundant DNS Option A - Customer uses Google DNS directly: $ dhcp-client config dns: 8.8.8.8, 8.8.4.4 Problem: Depends on internet to Google Pro: Faster than SprintMbale's DNS Con: Less control, privacy concerns Option B - Add fallback nameserver: $ cat /etc/resolv.conf nameserver 192.168.2.2 (Primary) nameserver 8.8.8.8 (Secondary) If 192.168.2.2 fails → tries 8.8.8.8 Timeout on primary: ~3 seconds Then tries backup: ~1 second Total delay: ~4 seconds Option C - Set up secondary DNS server 192.168.2.3 (DNS-2 mirror) Replicates zones from 192.168.2.2 (Primary) Zone transfer (AXFR) every hour $ dig @192.168.2.2 example.local AXFR Customers use both: nameserver 192.168.2.2 nameserver 192.168.2.3 Irene implements C + uses 8.8.8.8 as safety net COMMON DNS FAILURES & HOW THEY MANIFEST: 1. DNS SERVER DOWN (this issue) ↓ Symptom: "Can't resolve ANY domain" ↓ Fix: Restart service or failover to backup 2. DNS CACHE POISONED ↓ Some domains resolve to WRONG IP ↓ Symptom: "google.com resolves to 1.1.1.1 instead of 142.251.41.14" ↓ Fix: FLUSH DNS CACHE $ rndc flush ↓ Cause: Malicious DNS response cached (DNSSEC should prevent this) 3. DNSSEC VALIDATION FAILURE ↓ Domain has DNSSEC enabled, but signature invalid ↓ Symptom: "google.com not found" (even though it exists) ↓ Check: $ dig +dnssec google.com ↓ Fix: Check time sync (DNSSEC uses timestamps!) $ timedatectl Verify NTP is running $ ntpq -p 4. FIREWALL BLOCKING PORT 53 ↓ DNS uses UDP 53 and TCP 53 (zone transfers) ↓ Symptom: "Queries timeout, never get response" ↓ Check firewall rules: $ show firewall filter (UDP 53 blocked?) ↓ Fix: Allow port 53 /ip firewall filter add protocol=udp dst-port=53 action=accept 5. RECURSIVE QUERY LOOP ↓ DNS server points to itself ↓ Symptom: "No response, query hangs 30 seconds" ↓ Config error: forwarders = 192.168.2.2 (itself!) ↓ Fix: forwarders = upstream.dns.server.com 6. WRONG DNS CONFIG VIA DHCP ↓ Router sends bad DNS server in DHCP offers ↓ Symptom: "Works for me, but not for other customers" ↓ Check DHCP server config: $ show dhcp server dns-servers = 192.168.2.99 (doesn't exist!) ↓ Fix: Update DHCP pool $ dhcp set dns-servers 192.168.2.2 THE LESSON: DNS is critical but fragile. It's often overlooked until it breaks. Key DNS facts: • UDP 53 (queries), TCP 53 (zone transfers blocking at firewalls) • DNS timeouts hide root causes (server down? Network? Firewall?) • Always have DNS monitoring (check /var/log/named.log regularly) • DNSSEC adds complexity but prevents poisoning attacks • Redundant DNS (primary + secondary) is non-negotiable for ISPs • NTP (timekeeping) is critical for DNSSEC validation • Customers need both ISP DNS + fallback (8.8.8.8) for resilience

The Complete Picture

How all 5 Sessions Come Together: Issue 1: Physical cable unplugged ↓ Layer 1: Electrical signal lost Layer 2: No MAC learning (switch port down) Layer 3: OSPF dead timer (no hellos from Iringa) Route withdrawn from table All customer packets to Iringa dropped Issue 2: NAT rule missing for UDP ↓ Layer 4: OpenVPN uses UDP port 1194 Firewall sees outbound UDP, no NAT rule Packet leaves with original private IP (192.168.1.105) Remote VPN server can't respond to private IP Connection times out Issue 3: BGP rate-limited ↓ Layer 3: BGP uses TCP port 179 Firewall drops packets exceeding 10 pps BGP keepalives lost Session drops every 30 seconds Traffic reconverges but oscillates Issue 4: DNS server crashed ↓ Layer 4: DNS uses UDP 53 (and TCP 53 for zone transfers) DNS service crashed, not responding to queries Firewall rule might also block port 53 (misconfiguration) Customers can't resolve domains by name IP addresses still work (routing intact) Users see "Can't reach server" or "DNS lookup failed" All required Sessions' knowledge to solve: • Session 01: IP addresses, binary, understanding 192.168.1.0/24 • Session 02: Sockets & ports, TCP 179 for BGP, UDP 1194 for VPN, UDP 53 for DNS, DNS resolution process • Session 03: Port understanding, opening 1194, not blocking SSH or DNS • Session 04: Full stack — layer 1 cable, layer 2 switch port, layer 3 routing, layer 4 firewall/NAT/DNS • Session 05 (This one): Real scenarios piecing it all together, troubleshooting methodology