Web scrape DSL modem stats with python Selenium and a raspberry pi

Because my CenturyLink DSL internet service is intermittent, I want to log DSL line condition stats. For the most part, the modem stats are only available via the Technicolor C1100T’s, DSL modem web pages. The modem’s API does not provide access to all the stats, so scraping is the only mechanism to obtain the data. The project uses a Raspberry Pi (RPi) and its Chromium browser, to automatically navigate modem pages and scrape key data. The C1100T project shares a python Selenium automation application that “scrapes” C1100T web pages for DSL and bandwidth statistics. Because I want to analyze and graph modem statistic trends, the scraped data is logged to a CSV (comma separated variable) file.

The scraped JSON to CSV data project is here and it works for all the modem scrapers.

The following time series graphic, provides roughly a month of modem stats:

Anecdotally, the signal to noise (SnR) ratio for the DSL downstream (SNRD) drops during rainy days. The plant seems to dry-out and the SnR recovers. The signal power upstream (PowerU), also seems to be negatively impacted by a “wet” DSL plant. There were several outages during this graph and they coincided with the downstream power dropping to roughly 16.5 dbm.

Resetting the accumulated MODEM stats

The FEC (forward error correction) stats are reset every night, when the raspberry pi cronjob also reboots the modem. The FECs seem to problematically, accumulate roughly 80K counts in a 24 hour period, and some days the modem can see > 200K FEC counts. Centurylink techs sometimes recommend, rebooting the modem periodically. Since rebooting my modems nightly, my internet access could be more reliable. If you need code to reboot your modem nightly and without manual intervention, this python code scrapes data, but also controls the modem.

Scrapping “dirty” MODEM stats

If we trust the modem’s up and downstream packet counts, the spurious peaks seem problematic. There is a likelihood, the packet spikes are abnormal packet retries.

The data scraped from the C1100T modem is likely providing some dirty data. The downstream bandwidth and downstream packet data, burst at nearly impossible rates. In other words; NETFLIX and AMAZON Prime are not capable of pushing these almost daily bandwidth bursts, down the DSL plant, and to the modem. I would like to throw-out, seemingly non-sense downstream data bursts.

The scraping application and environment

The application was tested on, and is used 24×7 on an RPi. It will run on most linux platforms as a headless application. The RPi has an HDMI and analog video output, but they are not used by the application. The RPi’s keyboard and mouse are not used either. Our Chromium browser uses a ‘dummy’ video output device when it logs into and scrapes DSL modem values. The Selenium python module is most often used as ‘web site test automation’. Watching the browser is not often practical, especially when it’s running on a server, in the cloud.

Other MODEM stats …

In addition to monitoring DSL line statistics, your bandwidth usage data is also available. If you’re trying to stay under your ISP’s bandwidth cap, this data can warn you ahead of time.

The python application “scrapes” DSL statistics from my Technicolor C1100T DSL modem-router-wifi, sitting on the CenturyLink’s DSL physical network. The application browses to my modem’s site, logs into the C1100T as the ‘admin’ user, navigates to the DSL 1 Status page, scrapes the variable values, and writes the variables to a JSON log file. The following modem variables are scraped twice an hour.

Modem variables to scrape and log to CSV

Variableexample dataUnits
SNR downstream8.8db
SNR upstream8.1db
Power downstream17.0dbm
Power Upstream7.9dbm
Packets downstream16048791packets
Packets Upstream7695389packets
Total Usage Downstream162716.409Mbits
Total Usage Upstream8966.587Mbits
LinkUptime0 Days,0H:14M:53S
RS_FEC – Near End331error fixes
CRC_Errors – Near End0errors
this is a few of the variables, displayed in the C1100T modem pages

The JSON data for the above, looks like this:

{"1586713519097": {"SNR_downstream": 10.3, "SNR_upstream": 8.5, "Power_downstream": 16.9, "Power_upstream": 8.6, "Packets_Downstream": 47770, "Packets_Upstream": 24521, "Total_Usage_Downstream": 475.615, "Total_Usage_Upstream": 29.162, "dslUpstreamElement": 0.895, "dslDownstreamElement": 23.103, "dslLineStatusElement": "GOOD", "LinkUptime": "0 Days,0H:14M:53S", "LinkTrainErrors": 1, "RS_FEC": 331, "CRC_Errors": 0}}

The following is a single record, CSV snippet:


The first variable, “ts” is the timestamp in “epoch” ms (milliseconds). This timestamp reference comes from the RPI server. This is a nice site for explaining epoch time, or converting epoch to your local time: https://www.epochconverter.com/

Screen Shots, of pages scrapped for the above variables

Other CenturyLink DSL modems and yet more available data

There is also a python scraper project for the Actiontec C1000A, and for the Zyxel C3000Z. I found the C1100T’s 2.4ghz WIFI worked poorly, so I bought a used Zyxel C3000Z and wrote the scraper code for it. I had an old C1000A ACTIONTEC, that worked intermittently, and wrote the scraping code for it too. The web pages for the three modems look similar, however their HTML is different and that requires different python code. There are quite a few more modem stats, than the DSL and INTERNET stats, scraped in the code. Adding more stats, or even adding new modems is fairly straightforward.