Question

我正在尝试提取跨度中的值，但是跨度已嵌入另一个值中。我想知道如何只获得1个跨度的值，而不是两个都得到。

from bs4 import BeautifulSoup


some_price = page_soup.find("div", {"class":"price_FHDfG large_3aP7Z"})
some_price.span

# that code returns this:

'''
<span>$289<span class="rightEndPrice_6y_hS">99</span></span>
'''

# BUT I only want the $289 part, not the 99 associated with it

进行此调整后：

some_price.span.text

解释器返回

$28999

是否可以以某种方式删除最后的“ 99”？还是只提取跨度的第一部分？

任何帮助/建议将不胜感激！

Answer 1

您可以从soup.contents属性访问所需的值：

from bs4 import BeautifulSoup as soup
html = '''
 <span>$289<span class="rightEndPrice_6y_hS">99</span></span>
'''
result = soup(html, 'html.parser').find('span').contents[0]

输出：

'$289'

因此，在原始div查找的背景下：

result = page_soup.find("div", {"class":"price_FHDfG large_3aP7Z"}).span.contents[0]

使用BeautifulSoup在Python中提取嵌入式<span>

1 个答案: