示例:
import bs4
html = '''
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to
best design and develop Android apps with security in mind. The book explores
techniques that developers can use to build additional layers of security into
their apps beyond the security controls provided by Android itself.
<p class="scroll-down">∨ <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> ∨</p></div>
'''
soup = bs4.BeautifulSoup(html)
如何从soup
获取以下内容(beautifulsoup对象)?
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to
best design and develop Android apps with security in mind. The book explores
techniques that developers can use to build additional layers of security into
their apps beyond the security controls provided by Android itself.
</div>
答案 0 :(得分:4)
只需搜索它:
soup.find('p', class_='scroll-down')
我使用该类来限制查找,但由于此处没有其他p
元素有点多余。
如果您需要删除标记,请使用上述方法首先找到它,然后在其上调用.extract()
将其从文档中删除:
>>> soup.find('p', class_='scroll-down').extract()
<p class="scroll-down"> <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> </p>
>>> print soup
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to
best design and develop Android apps with security in mind. The book explores
techniques that developers can use to build additional layers of security into
their apps beyond the security controls provided by Android itself.
</div>
两件事:从.extract()
方法返回已删除的标记,您可以保存它以供以后使用。标签将从文档中完全删除,如果您仍然需要将其放在文档中,则必须稍后手动重新添加标记。
或者,您可以使用.decompose()
method,它会完全删除文档中的标记,而不返回引用。然后标签就永远消失了。