Question

在documentation on Beautiful Soup 4's unwrap method中，我希望下面的代码能打印Lorem ipsum dolor sit amet。而是打印。 unwrap()是否应该“用标签内的任何内容替换标签”（引用文档）？

#!/usr/bin/env python3
import bs4
markup = '<p>Lorem ipsum dolor sit amet</p>'
soup = bs4.BeautifulSoup(markup, "lxml")
p_tag = soup.p
p_tag.unwrap()
print(p_tag)

我认为我误会了文档中的示例。我正在使用Python 3.7.3和Beautiful Soup 4.7.1。

Answer 1

文档说：

Tag.unwrap（）与wrap（）相反。它将标签替换为该标签内的所有内容。

因此，它会将汤中的标签替换为标签内的内容。

考虑以下示例：

使用import bs4 markup = '<other_tag>Lorem ipsum dolor sit amet</other_tag>' soup = bs4.BeautifulSoup(markup, "lxml") p_tag = soup.p print(p_tag.parent) # <other_tag>Lorem ipsum dolor sit amet</other_tag> p_tag.unwrap() print(p_tag) # print(p_tag.parent) # None print(soup.other_tag) # <other_tag>Lorem ipsum dolor sit amet</other_tag>，我们有效地从汤中删除了标签，并将其替换为该标签内的内容。现在，展开后的标签的父级设置为.unwrap()，并且为空->它的内容已移动到其他位置（到父级）。

为什么BeautifulSoup的unwrap方法删除标签内的文本而不是删除标签？

1 个答案: