删除另一个标签beautifulsoup内的标签

时间:2018-05-09 08:12:05

标签: python beautifulsoup

如何在标签内提取文字,如下所示:

<div><blockquote type="cite" class=""><p>Find me<\p>
<blockquote cite="mid:609415CB-0979-47C1-9A75-CE1BE65939A0@wiwacom.fr" type="cite" class=""><p>Not me<\p>
      <blockquote type="cite" class=""><p>Not me too<\p>
      </blockquote>
</blockquote>

我想得到:

Find me

使用python和beautifulsoup

1 个答案:

答案 0 :(得分:2)

您可以使用.find来获取所需的文字。

<强>演示:

from bs4 import BeautifulSoup
s = """<div><blockquote type="cite" class=""><p>Find me</p>
<blockquote cite="mid:609415CB-0979-47C1-9A75-CE1BE65939A0@wiwacom.fr" type="cite" class=""><p>Not me<\p>
      <blockquote type="cite" class=""><p>Not me too<\p>
      </blockquote>
</blockquote></div>"""
soup = BeautifulSoup(s, "html.parser")
print(soup.find("div").find("p").text)

<强>输出:

Find me

注意:您有一些无效的p代码<\p> ==> </p>