I wish to graphically visualize how large blocks of IP addresses are utilized by their network operator, and gain some insight into network “areas” with latency or possibly network health issues. This 2D ping latency heat map, relies upon a way to quickly ping a large range of IPs. In another POST, I provided a fast, highly concurrent icmp ping python application, for quickly pinging large IP blocks. I use an AWS US-WEST-2 EC2 server to ping the IP blocks below, and create the 2D ping latency heat maps.
The python visualization code is here , in GITHUB.
If one thinks about consecutive IP addresses, as being “close” to each other; I’d like to map a block of sequential IPs, such that consecutive IPs would be contiguously grouped “closely” into a new x-y cartesian space. In 2006 the XKCD webcomic first mapped the IPv4 internet to a 2D space, using hilbert curve fractal mapping.
The following IP heat map is a ping latency visualization, are for the network. All 4096 IPs in this network, are either unpingable and marked as white space, or their ping latency times are in a range of colors. The faster ping times are dark blue in color, and ping times greater than 350 milliseconds are brighter red. The tick marks are in increments of 16 and a 16×16 area is a /24 subnet.
A single .png file, 2D ping latency heat map
This view might let you know; not much of the /20 subnet is pingable, or in-use. This snapshot of ping latency times, taken at 05-26-2020 05:04:01 UTC, was likely during a period of high network usage.
An animated GIF view, of multiple 2D ping latency heat map snapshots
The following GIF is an animation of 82 hourly ping latency png file snapshots. It appears there are 1-3 hourly periods during the day, where groupings of ping latency times jump from ~ 170 msec to ~400 msec . Ping times can be a proxy for network capacity. This view could suggest; network links / routers supporting groups of servers, are heavily loaded during peak usage times. Ping times are often a reflection of how-busy each router’s control plane is, or if the router control plane provides resources to icmp functionality. This post references a BFD protocol parser, that may provide better data insights into network link capacity via BFD timing.
The following is a gallery of interesting CDN server subnets:
Because my CenturyLink DSL internet service is intermittent, I want to log DSL line condition stats. For the most part, the modem stats are only available via the Technicolor C1100T’s, DSL modem web pages. The modem’s API does not provide access to all the stats, so scraping is the only mechanism to obtain the data. The project uses a Raspberry Pi (RPi) and its Chromium browser, to automatically navigate modem pages and scrape key data. The C1100T project shares a python Selenium automation application that “scrapes” C1100T web pages for DSL and bandwidth statistics. Because I want to analyze and graph modem statistic trends, the scraped data is logged to a CSV (comma separated variable) file.
The scraped JSON to CSV data project is here and it works for all the modem scrapers.
The following time series graphic, provides roughly a month of modem stats:
Anecdotally, the signal to noise (SnR) ratio for the DSL downstream (SNRD) drops during rainy days. The plant seems to dry-out and the SnR recovers. The signal power upstream (PowerU), also seems to be negatively impacted by a “wet” DSL plant. There were several outages during this graph and they coincided with the downstream power dropping to roughly 16.5 dbm.
Resetting the accumulated MODEM stats
The FEC (forward error correction) stats are reset every night, when the raspberry pi cronjob also reboots the modem. The FECs seem to problematically, accumulate roughly 80K counts in a 24 hour period, and some days the modem can see > 200K FEC counts. Centurylink techs sometimes recommend, rebooting the modem periodically. Since rebooting my modems nightly, my internet access could be more reliable. If you need code to reboot your modem nightly and without manual intervention, this python code scrapes data, but also controls the modem.
Scrapping “dirty” MODEM stats
If we trust the modem’s up and downstream packet counts, the spurious peaks seem problematic. There is a likelihood, the packet spikes are abnormal packet retries.
The data scraped from the C1100T modem is likely providing some dirty data. The downstream bandwidth and downstream packet data, burst at nearly impossible rates. In other words; NETFLIX and AMAZON Prime are not capable of pushing these almost daily bandwidth bursts, down the DSL plant, and to the modem. I would like to throw-out, seemingly non-sense downstream data bursts.
The scraping application and environment
The application was tested on, and is used 24×7 on an RPi. It will run on most linux platforms as a headless application. The RPi has an HDMI and analog video output, but they are not used by the application. The RPi’s keyboard and mouse are not used either. Our Chromium browser uses a ‘dummy’ video output device when it logs into and scrapes DSL modem values. The Selenium python module is most often used as ‘web site test automation’. Watching the browser is not often practical, especially when it’s running on a server, in the cloud.
Other MODEM stats …
In addition to monitoring DSL line statistics, your bandwidth usage data is also available. If you’re trying to stay under your ISP’s bandwidth cap, this data can warn you ahead of time.
The python application “scrapes” DSL statistics from my Technicolor C1100T DSL modem-router-wifi, sitting on the CenturyLink’s DSL physical network. The application browses to my modem’s site, logs into the C1100T as the ‘admin’ user, navigates to the DSL 1 Status page, scrapes the variable values, and writes the variables to a JSON log file. The following modem variables are scraped twice an hour.
Modem variables to scrape and log to CSV
example data
SNR downstream
SNR upstream
Power downstream
Power Upstream
Packets downstream
Packets Upstream
Total Usage Downstream
Total Usage Upstream
0 Days,0H:14M:53S
RS_FEC – Near End
error fixes
CRC_Errors – Near End
this is a few of the variables, displayed in the C1100T modem pages
The first variable, “ts” is the timestamp in “epoch” ms (milliseconds). This timestamp reference comes from the RPI server. This is a nice site for explaining epoch time, or converting epoch to your local time: https://www.epochconverter.com/
Screen Shots, of pages scrapped for the above variables
Other CenturyLink DSL modems and yet more available data
There is also a python scraper project for the Actiontec C1000A, and for the Zyxel C3000Z. I found the C1100T’s 2.4ghz WIFI worked poorly, so I bought a used Zyxel C3000Z and wrote the scraper code for it. I had an old C1000A ACTIONTEC, that worked intermittently, and wrote the scraping code for it too. The web pages for the three modems look similar, however their HTML is different and that requires different python code. There are quite a few more modem stats, than the DSL and INTERNET stats, scraped in the code. Adding more stats, or even adding new modems is fairly straightforward.
I have an intermittent DSL internet connectivity issue and wish to document the brief outages. I wrote a python 3 network connectivity test script for a Raspberry Pi (RPi model B Plus Rev 1.3), which is connected directly to my home router via ethernet. The script tested DSL connectivity by pinging another customer’s host IP. While at it, the script tested the ISP’s DNS‘ ability to reverse lookup the IPs FQDN (fully qualified domain name). My ISP has DNS issues quite often, and the DNS check has been a better “internet” functional test. During the DNS issues, I have IP network connectivity, but DNS availability breaks all applications.
The script pings an IP, and also asks the DNS to reverse lookup that IP for its FQDN. Since the RPi cron job is only granular to 1 minute, the script repeats pings and DNS lookups every second, for a minute. The githhub document notes how to cron the job. For RPi specific cron details: https://www.raspberrypi.org/documentation/linux/usage/cron.md
I wish to ping a large /16 (65536 IPs) network space for pingable hosts, in a short period of time. I’ve provided a fast python concurrent ping application, as a tool for finding large network IP blocks with:
servers having patterns of slow ping latency times and possibly peak usage
network outages
networks handing-out IPs with DHCP, and disconnecting clients
networks and ISPs, well-utilizing their IP allocations
The ping wiki is a nice explanation of how ping works, but pinging every IP in a /16 (65536) is prohibitively slow. The GIT python code “ping” code avoids serially pinging each device, but sends a burst of ping echo commands to a block of 128 IPs. The application then recovers the ping echo packets, from its network packet capture. This concurrent ping mechanism, can ping an entire /20 IP block (4096 IPs) in 41 seconds. This ping application timing came from an AWS-EC2 t2-tiny AMI linux client.
The “ping an entire /16 (65536 IP) network quickly” use-case, is a great concurrency GOLANG problem. On the other hand, I need the python practice.
Fast Python Concurrent Ping – Theory of Operations
The python application, starts by turning-on a tcpdump capture, then sends a single-threaded burst of X sequential ICMP echo request packets to a small block of 128 IPs. After the last ICMP in the burst is sent, the app lets tcpdump run for 1 more second. The tcpdump pcap file is then parsed for ICMP responses, and the “ping time” is recovered and logged. The application iterates and repeats the burst-send and pcap listen process, until the entire network space has been pinged.
This blast mechanism of pinging a small block of 128 IPs, and recovering ICMP echo responses in a parallel network capture, decouples the application from maintaining 128 ping threads. The end-point IPs in the block, simply respond with an ICMP echo response concurrently.
Bidirectional Forwarding Detection (BFD) is a network protocol providing fast insights, into faults between two forwarding routers. BFD timestamps may also provide insight into network performance and link capacity.
This python code snippet allows one to obtain network performance analytics (variables) from captured BFD network packets. The code parses a pcap network capture file for BFD transactions, and recovers time stamps from the echos. From the pcap parsed BFD timing details, the python application generates two variables; the round trip time (RTT) and the BFD send time deviations. From these two analytic variables, one may infer; how-well a router data plane is capable of periodically sending BFDs. The analytics also provide insight into; how-well the adjacent router can respond with an echo timestamp. Link capacity and health may also be derived from these network performance analytics.
The GIT site below hosts the python code with performance variable plots. The plots suggest there may be additional variables, like link demand loading, affecting the network performance. Monitoring these BFD variables in real-time, may provide insights into transient anomalies, performance and capacity.