如何使用beautifulsoup和python提取xml属性

时间:2019-07-18 02:19:52

标签: python xml beautifulsoup

我正在尝试从此xml中提取“ totalvotes”值:

<poll title="User Suggested Number of Players" totalvotes="0" name="suggested_numplayers">
<results numplayers="3+"> </results>
</poll>

我弄乱了以下代码的许多不同组合,但是它们都不起作用。

soup.find_all('poll',{'title':'User Suggested Number of Players'})[0].find_all('totalvotes')

在这种情况下,我只是试图检索0的值。我该怎么做?

谢谢。

2 个答案:

答案 0 :(得分:1)

有多种获取元素的方法,一种是使用CSS选择器:

data = '''<poll title="User Suggested Number of Players" totalvotes="0" name="suggested_numplayers">
<results numplayers="3+"> </results>
</poll>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

# method 1 (select <poll> with attribute "votes")
print(soup.select_one('poll[totalvotes]')['totalvotes'])

# method 2 (more specific, select <poll> that has in attribute title "User Suggested Number of Players")
print(soup.select_one('poll[title="User Suggested Number of Players"][totalvotes]')['totalvotes'])

# method 3 (select <poll> that has <results> inside )
print(soup.select_one('poll:has(results)[totalvotes]')['totalvotes'])

打印:

0
0
0

进一步阅读:

CSS Selectors Reference

答案 1 :(得分:0)

要从第一个元素中提取

soup.find('poll').get('totalvotes')

从所有元素中提取

for poll in soup.find_all('poll'):
    print (poll.get('totalvotes'))