Python - BeautifulSoup - 在标签卡住的情况下提取表数据

时间:2016-05-15 23:08:13

标签: python-2.7 beautifulsoup

在python中,我试图从HTML文件中获取一个表,然后将这些表属性存储在列表中,这样我就可以在更改的表数据中进行比较。我能够使用mechanize自动下载ID \ Password登录后面的HTML页面,但是将数据放入列表的第二部分是输出如下,标签就位。因此,虽然看起来我已经解决了存储数据的问题,但我不确定如何在传递数据之前删除标签?

链接到HTML文档:我正在尝试从以下位置提取数据: https://www.dropbox.com/s/b684ecl7b2l3m10/guildwar.html?dl=0

示例输出:(TOP PART),代码从bs4开始

[None, None, None, <td class="t1"> 1 </td>, <td class="t1"> 2 </td>,       <td class="t1"> 3 </td>]




from bs4 import BeautifulSoup

soup = BeautifulSoup(open("guildwar.html"))

rank_0 = []
color_1 = []
name_2 = []
land_3 = []
fortress_4 = []
power_5 = []


for el in soup.findAll('tr'):
    rank = el.find('td', {'class':'t1'})
    rank_0.append(rank)
    color = el.find('td', {'class':'t2'})
    color_1.append(color)
    name = el.find('td', {'class':'t3'})
    name_2.append(name)
    land = el.find('td', {'class':'t4'})
    land_3.append(land)
    fortress = el.find('td', {'class':'t5'})
    fortress_4.append(fortress)
    power = el.find('td', {'class':'t6'})
    power_5.append(power)

print("Ranking")
print(rank_0)
print("\nMagic Color")
print(color_1)
print("\nMage Name")
print(name_2)
print("\nLand")
print(land_3)
print("\nFortress")
print(fortress_4)
print("\nPower")
print(power_5)

===============================

1 个答案:

答案 0 :(得分:1)

您可以在元素上使用text属性,如下所示:

In [2]: s = '<tr><td class="t1"> 1 </td>, <td class="t1"> 2 </td>,       <td class="t1"> 3 </td></tr>'

In [4]: soup = BeautifulSoup(s, "lxml")

In [5]: for el in soup.findAll('tr'):
   ...:     rank = el.find('td', {'class': 't1'})
   ...:     print("Ranking > ", rank.text) # use text attribute
   ...:     
Ranking >   1 

在旁注中,我可能会存储整个<table>并比较它是否随时间变化,然后您节省了比较所有单个列的时间...并且仅在存在更新/更改时存储数据