仅在H2内部刮取内容 - BeautifulSoup

时间:2015-01-05 19:00:48

标签: python python-3.x beautifulsoup

我有这个标记。

<h2>
  Virtual Office packages
  <span>From</span><span class="cost">$74.97</span>
</h2>

有时候

<h2>Virtual Office packages</h2>

和这段代码。

service_header = service.select("h2")[0].string

我只想获得Virtual Office packages而不是价格信息。

我可以在第二种情况下得到它,但是如果HTML被发现像第一种那样,我得到None,有价格信息。

准确地说,如何使用任何类型的HTML获取“虚拟办公室软件包”?

1 个答案:

答案 0 :(得分:1)

soup = BeautifulSoup("""
<h2>
  Virtual Office packages
  <span>From</span><span class="cost">$74.97</span>
</h2>""")


print(soup.find("h2").next_element.strip())
Virtual Office packages


soup = BeautifulSoup("""
<h2>Virtual Office packages</h2>
""")


print(soup.find("h2").next_element)

Virtual Office packages

soup = BeautifulSoup("""
<h2>
  Virtual Office packages
  <span>From</span><span class="cost">$74.97</span>
</h2>""")


print(soup.find("h2").contents[0].strip())
Virtual Office packages

soup = BeautifulSoup("""
<h2>Virtual Office packages</h2>
""")

print soup.find("h2").contents[0]
Virtual Office packages