使用BeautifulSoup在Python中提取嵌入式<span>

时间:2019-07-14 03:34:22

标签: python html beautifulsoup extract data-science

我正在尝试提取跨度中的值,但是跨度已嵌入另一个值中。我想知道如何只获得1个跨度的值,而不是两个都得到。

from bs4 import BeautifulSoup


some_price = page_soup.find("div", {"class":"price_FHDfG large_3aP7Z"})
some_price.span

# that code returns this:

'''
<span>$289<span class="rightEndPrice_6y_hS">99</span></span>
'''

# BUT I only want the $289 part, not the 99 associated with it

进行此调整后:

some_price.span.text

解释器返回

$28999

是否可以以某种方式删除最后的“ 99”?还是只提取跨度的第一部分?

任何帮助/建议将不胜感激!

1 个答案:

答案 0 :(得分:0)

您可以从soup.contents属性访问所需的值:

from bs4 import BeautifulSoup as soup
html = '''
 <span>$289<span class="rightEndPrice_6y_hS">99</span></span>
'''
result = soup(html, 'html.parser').find('span').contents[0]

输出:

'$289'

因此,在原始div查找的背景下:

result = page_soup.find("div", {"class":"price_FHDfG large_3aP7Z"}).span.contents[0]