我正在尝试从ISP Cable Modem中扫描每个通道的上游和下游值。
我无法正确显示数据。 我希望数据以标准化的CSV格式输出以进行记录。
这是我的代码,但是它非常不合适。
import requests
from bs4 import BeautifulSoup
# Collect and parse first page
page = requests.get('http://192.168.100.1/Docsis_system.asp')
soup = BeautifulSoup(page.text, 'html.parser')
# Pull all text from the proper section in the page
#signal_value_list = soup.find('tbody')
signal_value_list = soup.find('table', {'summary':'Downstream Channels'})
# Pull text from all instances of <td> tag within align div
signal_value_list_items = signal_value_list.find_all('td')
# Create for loop to print out all values
for signal_value in signal_value_list_items:
sigval = signal_value.contents[0]
print(sigval)
我尝试解析的页面位于此链接中作为TXT文件:
screenshot of the modem page with tables
我愿意采取其他方向来获取这些数据,但我希望我能够比完成这项工作更轻松地完成这项工作。
有人有想法吗?
答案 0 :(得分:0)
我不确定我是否完全理解您要解析的数据是什么,但假设我们正在谈论(例如)下游频道&#39;功率电平和SNR,一种可能的解决方案如下:
import requests
from bs4 import BeautifulSoup
import csv
# Collect and parse first page
# page = requests.get('http://192.168.100.1/Docsis_system.asp')
with open('files/modem-page.html') as f:
text = f.read()
soup = BeautifulSoup(text, 'html.parser')
# Pull all text from the proper section in the page
# signal_value_list = soup.find('tbody')
signal_value_list = soup.find('table', {'summary': 'Downstream Channels'})
# Pull text from all instances of <td> tag within align div
signal_value_list_items = signal_value_list.find_all('td')
res = {}
# Create for loop to print out all values
for signal_value in signal_value_list_items:
try:
channel_string = signal_value.attrs.get('headers')[0] # get name of channel
property_string = signal_value.attrs.get('headers')[1] # get name of property
value_string = signal_value.text # get actual value
if channel_string not in res:
res[channel_string] = {}
res[channel_string][property_string] = value_string
except TypeError as e: # for all irrelevant elements
continue
with open('res.csv', 'wb+') as csv_file: # add column names to your liking before this
writer = csv.writer(csv_file)
for key, value in res.items():
writer.writerow([key] + [value[prop] for prop in value])
print res
也可以为上游数据编写类似的代码(我将留给您)。可能的改进是对频道名称进行排序(我使用字典,因此顺序是随机的)并添加列名称。希望这有帮助