有没有办法立即查找带有条件列表的所有标签?
例如,在此HTML中,我想提取<div data-type="b">
标签和<div>
<h1>Chapter 1</h1>
<p>aaa</p>
<p>aaa</p>
<p>aaa</p>
<div>
<h1>Section 1</h1>
<p>bbb</p>
<p>bbb</p>
<p>bbb</p>
</div>
<div data-type="a">...</div>
<div data-type="a">...</div>
<div data-type="b">...</div>
...
</div>
标签。
HTML
<p>aaa</p>
<p>aaa</p>
<p>aaa</p>
<p>bbb</p>
<p>bbb</p>
<p>bbb</p>
<div data-type="a">...</div>
<div data-type="a">...</div>
所需的输出
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
p_tags = soup.find_all('p')
div_tags = soup.find_all('div', {"data-type": "a"})
我当然可以:
p_and_div_tags = soup.find_all(['p', 'div_tag_with_attribute'])
但是我想做类似的事情:
=TRANSPOSE(SPLIT(TEXTJOIN("@",TRUE,TRANSPOSE(A:C),TRANSPOSE(D1:D5)),"@",FALSE,FALSE))
有办法吗?
谢谢
答案 0 :(得分:2)
如果您具有 BS4 4.7.1 或更高版本,则可以使用css选择器。
代码:
from bs4 import BeautifulSoup
html='''<div>
<h1>Chapter 1</h1>
<p>aaa</p>
<p>aaa</p>
<p>aaa</p>
<div>
<h1>Section 1</h1>
<p>bbb</p>
<p>bbb</p>
<p>bbb</p>
</div>
<div data-type="a">...</div>
<div data-type="a">...</div>
<div data-type="b">...</div>
...
</div>'''
soup=BeautifulSoup(html,'html.parser')
items=soup.select('p,div[data-type="a"]')
print(items)
输出:
[<p>aaa</p>, <p>aaa</p>, <p>aaa</p>, <p>bbb</p>, <p>bbb</p>, <p>bbb</p>, <div data-type="a">...</div>, <div data-type="a">...</div>]
答案 1 :(得分:1)
您可以尝试:
def func(tag):
return 'div' in tag.name and tag.has_attr('data-type')
soup.find_all(['p', func])
输出
[<p>aaa</p>,
<p>aaa</p>,
<p>aaa</p>,
<p>bbb</p>,
<p>bbb</p>,
<p>bbb</p>,
<div data-type="a">...</div>,
<div data-type="a">...</div>,
<div data-type="b">...</div>]