在python中,我试图从HTML文件中获取一个表,然后将这些表属性存储在列表中,这样我就可以在更改的表数据中进行比较。我能够使用mechanize自动下载ID \ Password登录后面的HTML页面,但是将数据放入列表的第二部分是输出如下,标签就位。因此,虽然看起来我已经解决了存储数据的问题,但我不确定如何在传递数据之前删除标签?
链接到HTML文档:我正在尝试从以下位置提取数据: https://www.dropbox.com/s/b684ecl7b2l3m10/guildwar.html?dl=0
示例输出:(TOP PART),代码从bs4开始
[None, None, None, <td class="t1"> 1 </td>, <td class="t1"> 2 </td>, <td class="t1"> 3 </td>]
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("guildwar.html"))
rank_0 = []
color_1 = []
name_2 = []
land_3 = []
fortress_4 = []
power_5 = []
for el in soup.findAll('tr'):
rank = el.find('td', {'class':'t1'})
rank_0.append(rank)
color = el.find('td', {'class':'t2'})
color_1.append(color)
name = el.find('td', {'class':'t3'})
name_2.append(name)
land = el.find('td', {'class':'t4'})
land_3.append(land)
fortress = el.find('td', {'class':'t5'})
fortress_4.append(fortress)
power = el.find('td', {'class':'t6'})
power_5.append(power)
print("Ranking")
print(rank_0)
print("\nMagic Color")
print(color_1)
print("\nMage Name")
print(name_2)
print("\nLand")
print(land_3)
print("\nFortress")
print(fortress_4)
print("\nPower")
print(power_5)
===============================
答案 0 :(得分:1)
您可以在元素上使用text
属性,如下所示:
In [2]: s = '<tr><td class="t1"> 1 </td>, <td class="t1"> 2 </td>, <td class="t1"> 3 </td></tr>'
In [4]: soup = BeautifulSoup(s, "lxml")
In [5]: for el in soup.findAll('tr'):
...: rank = el.find('td', {'class': 't1'})
...: print("Ranking > ", rank.text) # use text attribute
...:
Ranking > 1
在旁注中,我可能会存储整个<table>
并比较它是否随时间变化,然后您节省了比较所有单个列的时间...并且仅在存在更新/更改时存储数据