使用Python / BeautifulSoup我想删除一个外部div标签,但保留它的内容。
从这开始:
<div>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>
会这样结束。
<p>Paragraph 1</p>
<p>Paragraph 2</p>
我觉得这应该很简单,但无法找到办法......
答案 0 :(得分:0)
您可以使用unwrap()
功能,如下所示:
from bs4 import BeautifulSoup
html = """
<div>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>"""
soup = BeautifulSoup(html, "html.parser")
soup.div.unwrap()
print(soup)
将显示:
<p>Paragraph 1</p>
<p>Paragraph 2</p>
如果使用lxml
,则为:
from bs4 import BeautifulSoup
html = """
<div>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>"""
soup = BeautifulSoup(html, "lxml")
soup.div.unwrap()
print soup
给你:
<html><body>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</body></html>