Web scrape DSL modem stats with python Selenium and a raspberry pi

Because my CenturyLink DSL internet service is intermittent, I want to log DSL line condition stats. For the most part, the modem stats are only available via the Technicolor C1100T’s, DSL modem web pages. The modem’s API does not provide access to all the stats, so scraping is the only mechanism to obtain the data. The project uses a Raspberry Pi (RPi) and its Chromium browser, to automatically navigate modem pages and scrape key data. The C1100T project shares a python Selenium automation application that “scrapes” C1100T web pages for DSL and bandwidth statistics. Because I want to analyze and graph modem statistic trends, the scraped data is logged to a CSV (comma separated variable) file.

The scraped JSON to CSV data project is here and it works for all the modem scrapers.

The following time series graphic, provides roughly a month of modem stats:

Anecdotally, the signal to noise (SnR) ratio for the DSL downstream (SNRD) drops during rainy days. The plant seems to dry-out and the SnR recovers. The signal power upstream (PowerU), also seems to be negatively impacted by a “wet” DSL plant. There were several outages during this graph and they coincided with the downstream power dropping to roughly 16.5 dbm.

Resetting the accumulated MODEM stats

The FEC (forward error correction) stats are reset every night, when the raspberry pi cronjob also reboots the modem. The FECs seem to problematically, accumulate roughly 80K counts in a 24 hour period, and some days the modem can see > 200K FEC counts. Centurylink techs sometimes recommend, rebooting the modem periodically. Since rebooting my modems nightly, my internet access could be more reliable. If you need code to reboot your modem nightly and without manual intervention, this python code scrapes data, but also controls the modem.

Scrapping “dirty” MODEM stats

If we trust the modem’s up and downstream packet counts, the spurious peaks seem problematic. There is a likelihood, the packet spikes are abnormal packet retries.

The data scraped from the C1100T modem is likely providing some dirty data. The downstream bandwidth and downstream packet data, burst at nearly impossible rates. In other words; NETFLIX and AMAZON Prime are not capable of pushing these almost daily bandwidth bursts, down the DSL plant, and to the modem. I would like to throw-out, seemingly non-sense downstream data bursts.

The scraping application and environment

The application was tested on, and is used 24×7 on an RPi. It will run on most linux platforms as a headless application. The RPi has an HDMI and analog video output, but they are not used by the application. The RPi’s keyboard and mouse are not used either. Our Chromium browser uses a ‘dummy’ video output device when it logs into and scrapes DSL modem values. The Selenium python module is most often used as ‘web site test automation’. Watching the browser is not often practical, especially when it’s running on a server, in the cloud.

Other MODEM stats …

In addition to monitoring DSL line statistics, your bandwidth usage data is also available. If you’re trying to stay under your ISP’s bandwidth cap, this data can warn you ahead of time.

The python application “scrapes” DSL statistics from my Technicolor C1100T DSL modem-router-wifi, sitting on the CenturyLink’s DSL physical network. The application browses to my modem’s site, logs into the C1100T as the ‘admin’ user, navigates to the DSL 1 Status page, scrapes the variable values, and writes the variables to a JSON log file. The following modem variables are scraped twice an hour.

Modem variables to scrape and log to CSV

Variableexample dataUnits
SNR downstream8.8db
SNR upstream8.1db
Power downstream17.0dbm
Power Upstream7.9dbm
Packets downstream16048791packets
Packets Upstream7695389packets
Total Usage Downstream162716.409Mbits
Total Usage Upstream8966.587Mbits
LinkUptime0 Days,0H:14M:53S
RS_FEC – Near End331error fixes
CRC_Errors – Near End0errors
this is a few of the variables, displayed in the C1100T modem pages

The JSON data for the above, looks like this:

{"1586713519097": {"SNR_downstream": 10.3, "SNR_upstream": 8.5, "Power_downstream": 16.9, "Power_upstream": 8.6, "Packets_Downstream": 47770, "Packets_Upstream": 24521, "Total_Usage_Downstream": 475.615, "Total_Usage_Upstream": 29.162, "dslUpstreamElement": 0.895, "dslDownstreamElement": 23.103, "dslLineStatusElement": "GOOD", "LinkUptime": "0 Days,0H:14M:53S", "LinkTrainErrors": 1, "RS_FEC": 331, "CRC_Errors": 0}}

The following is a single record, CSV snippet:


The first variable, “ts” is the timestamp in “epoch” ms (milliseconds). This timestamp reference comes from the RPI server. This is a nice site for explaining epoch time, or converting epoch to your local time: https://www.epochconverter.com/

Screen Shots, of pages scrapped for the above variables

Other CenturyLink DSL modems and yet more available data

There is also a python scraper project for the Actiontec C1000A, and for the Zyxel C3000Z. I found the C1100T’s 2.4ghz WIFI worked poorly, so I bought a used Zyxel C3000Z and wrote the scraper code for it. I had an old C1000A ACTIONTEC, that worked intermittently, and wrote the scraping code for it too. The web pages for the three modems look similar, however their HTML is different and that requires different python code. There are quite a few more modem stats, than the DSL and INTERNET stats, scraped in the code. Adding more stats, or even adding new modems is fairly straightforward.

fast python concurrent ping scan

I wish to ping a large /16 (65536 IPs) network space for pingable hosts, in a short period of time. I’ve provided a fast python concurrent ping application, as a tool for finding large network IP blocks with:

  • servers having patterns of slow ping latency times and possibly peak usage
  • network outages
  • networks handing-out IPs with DHCP, and disconnecting clients
  • networks and ISPs, well-utilizing their IP allocations

The ping wiki is a nice explanation of how ping works, but pinging every IP in a /16 (65536) is prohibitively slow. The GIT python code “ping” code avoids serially pinging each device, but sends a burst of ping echo commands to a block of 128 IPs. The application then recovers the ping echo packets, from its network packet capture. This concurrent ping mechanism, can ping an entire /20 IP block (4096 IPs) in 41 seconds. This ping application timing came from an AWS-EC2 t2-tiny AMI linux client.

The “ping an entire /16 (65536 IP) network quickly” use-case, is a great concurrency GOLANG problem. On the other hand, I need the python practice.

Fast Python Concurrent Ping – Theory of Operations

The python application, starts by turning-on a tcpdump capture, then sends a single-threaded burst of X sequential ICMP echo request packets to a small block of 128 IPs. After the last ICMP in the burst is sent, the app lets tcpdump run for 1 more second. The tcpdump pcap file is then parsed for ICMP responses, and the “ping time” is recovered and logged. The application iterates and repeats the burst-send and pcap listen process, until the entire network space has been pinged.

This blast mechanism of pinging a small block of 128 IPs, and recovering ICMP echo responses in a parallel network capture, decouples the application from maintaining 128 ping threads. The end-point IPs in the block, simply respond with an ICMP echo response concurrently.

BFD Protocol Python Parsing and Network Performance Analytics

Bidirectional Forwarding Detection (BFD) is a network protocol providing fast insights, into faults between two forwarding routers. BFD timestamps may also provide insight into network performance and link capacity.

This python code snippet allows one to obtain network performance analytics (variables) from captured BFD network packets. The code parses a pcap network capture file for BFD transactions, and recovers time stamps from the echos. From the pcap parsed BFD timing details, the python application generates two variables; the round trip time (RTT) and the BFD send time deviations. From these two analytic variables, one may infer; how-well a router data plane is capable of periodically sending BFDs. The analytics also provide insight into; how-well the adjacent router can respond with an echo timestamp. Link capacity and health may also be derived from these network performance analytics. 

The GIT site below hosts the python code with performance variable plots. The plots suggest there may be additional variables, like link demand loading, affecting the network performance. Monitoring these BFD variables in real-time, may provide insights into transient anomalies, performance and capacity.

GIT BFD parsing code and analytics: