获取页面中所有标签的所有属性[美味汤]

时间:2017-01-11 12:18:12

标签: python beautifulsoup

我希望通过漂亮的汤获取html页面中每个标签的所有属性 在一个数组

例如我有一个html页面 我希望字符串数组中的所有标签属性

<div att0="content1">
<a href="link1">link data</a>
</div>

结果将是: [content1,link1]

2 个答案:

答案 0 :(得分:2)

查找所有元素并从.attrs attribute获取属性:

attrs = []
for elm in soup():  # soup() is equivalent to soup.find_all()
    attrs += list(elm.attrs.values())

print(attrs)

演示:

>>> from bs4 import BeautifulSoup
>>> 
>>> data = """
... <div att0="content1">
... <a href="link1">link data</a>
... </div>
... """
>>> 
>>> soup = BeautifulSoup(data, 'lxml')
>>> 
>>> attrs = []
>>> for elm in soup():
...     attrs += list(elm.attrs.values())
... 
>>> print(attrs)
['content1', 'link1']

答案 1 :(得分:0)

import bs4

html = '''
<div att0="content1">
<a href="link1">link data</a>
</div>
<div att0="content1">
<a href="link1">link data</a>
</div>
<div att0="content1">
<a href="link1">link data</a>
</div>'''

soup = bs4.BeautifulSoup(html, 'lxml')

for div in soup.find_all('div', att0=True):
    out = [div['att0'], div.a['href']]
    print(out)

出:

['content1', 'link1']
['content1', 'link1']
['content1', 'link1']