使用BeautifulSoup删除div标签但保留内容

时间:2018-03-08 15:34:37

标签: python python-3.x beautifulsoup

使用Python / BeautifulSoup我想删除一个外部div标签,但保留它的内容。

从这开始:

<div>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>

会这样结束。

<p>Paragraph 1</p>
<p>Paragraph 2</p>

我觉得这应该很简单,但无法找到办法......

1 个答案:

答案 0 :(得分:0)

您可以使用unwrap()功能,如下所示:

from bs4 import BeautifulSoup

html = """
<div>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>"""

soup = BeautifulSoup(html, "html.parser")
soup.div.unwrap()
print(soup)

将显示:

<p>Paragraph 1</p>
<p>Paragraph 2</p>

如果使用lxml,则为:

from bs4 import BeautifulSoup

html = """
<div>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</div>"""

soup = BeautifulSoup(html, "lxml")
soup.div.unwrap()
print soup

给你:

<html><body>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
</body></html>