我想分解具有照片或原型文本/内容的h4标签。
这是HTML代码
<div class='wrap'>
<div class='col'>
<h4 class='h4'>photos</h4>
<h4 class='h4'>videos</h4>
<h4 class='h4'>prototypes</h4>
<h4 class='h4'>weight</h4>
</div>
<div class='col'>
<h4 class='h4'>color</h4>
<h4 class='h4'>selfie</h4>
<h4 class='h4'>front</h4>
<h4 class='h4'>back</h4>
</div>
</div>
并输出我想要的:
<div class='wrap'>
<div class='col'>
<h4 class='h4'>videos</h4>
<h4 class='h4'>weight</h4>
</div>
<div class='col'>
<h4 class='h4'>color</h4>
<h4 class='h4'>selfie</h4>
<h4 class='h4'>front</h4>
<h4 class='h4'>back</h4>
</div>
</div>
答案 0 :(得分:2)
您可以将a regular expression传递给text
中的find_all
参数。然后decompose匹配的每个标签。
html_doc="""
<div class='wrap'>
<div class='col'>
<h4 class='h4'>photos</h4>
<h4 class='h4'>videos</h4>
<h4 class='h4'>prototypes</h4>
<h4 class='h4'>weight</h4>
</div>
<div class='col'>
<h4 class='h4'>color</h4>
<h4 class='h4'>selfie</h4>
<h4 class='h4'>front</h4>
<h4 class='h4'>back</h4>
</div>
</div>
"""
from bs4 import BeautifulSoup
import re
soup=BeautifulSoup(html_doc,'html.parser')
for tag in soup.find_all('h4',text=re.compile('photos|prototypes')):
tag.decompose()
print(soup)
输出
<div class="wrap">
<div class="col">
<h4 class="h4">videos</h4>
<h4 class="h4">weight</h4>
</div>
<div class="col">
<h4 class="h4">color</h4>
<h4 class="h4">selfie</h4>
<h4 class="h4">front</h4>
<h4 class="h4">back</h4>
</div>
</div>
答案 1 :(得分:1)
使用Python lambda
函数查找tag
及其text
,然后分解()。
from bs4 import BeautifulSoup
data='''<div class='wrap'>
<div class='col'>
<h4 class='h4'>photos</h4>
<h4 class='h4'>videos</h4>
<h4 class='h4'>prototypes</h4>
<h4 class='h4'>weight</h4>
</div>
<div class='col'>
<h4 class='h4'>color</h4>
<h4 class='h4'>selfie</h4>
<h4 class='h4'>front</h4>
<h4 class='h4'>back</h4>
</div>
</div>'''
soup=BeautifulSoup(data,'html.parser')
for item in soup.find_all(lambda tag:tag.name=='h4' and ('photos' in tag.text or 'prototypes' in tag.text) ):
item.decompose()
print(soup)
输出:
<div class="wrap">
<div class="col">
<h4 class="h4">videos</h4>
<h4 class="h4">weight</h4>
</div>
<div class="col">
<h4 class="h4">color</h4>
<h4 class="h4">selfie</h4>
<h4 class="h4">front</h4>
<h4 class="h4">back</h4>
</div>
</div>