我希望通过漂亮的汤获取html页面中每个标签的所有属性 在一个数组
例如我有一个html页面 我希望字符串数组中的所有标签属性
<div att0="content1">
<a href="link1">link data</a>
</div>
结果将是: [content1,link1]
答案 0 :(得分:2)
查找所有元素并从.attrs
attribute获取属性:
attrs = []
for elm in soup(): # soup() is equivalent to soup.find_all()
attrs += list(elm.attrs.values())
print(attrs)
演示:
>>> from bs4 import BeautifulSoup
>>>
>>> data = """
... <div att0="content1">
... <a href="link1">link data</a>
... </div>
... """
>>>
>>> soup = BeautifulSoup(data, 'lxml')
>>>
>>> attrs = []
>>> for elm in soup():
... attrs += list(elm.attrs.values())
...
>>> print(attrs)
['content1', 'link1']
答案 1 :(得分:0)
import bs4
html = '''
<div att0="content1">
<a href="link1">link data</a>
</div>
<div att0="content1">
<a href="link1">link data</a>
</div>
<div att0="content1">
<a href="link1">link data</a>
</div>'''
soup = bs4.BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', att0=True):
out = [div['att0'], div.a['href']]
print(out)
出:
['content1', 'link1']
['content1', 'link1']
['content1', 'link1']