我从网络抓取开始,我想提取strong
元素之间的数字。
我正在使用python 3.8和beautifulsoup
<li class="price-current">
<span class="price-current-label">
</span>$<strong>409</strong><sup>.99</sup> <a class="price-current-num" href="https://www.newegg.com/gigabyte-radeon-rx-5700-xt-gv-r57xtgaming-oc-8gd/p/N82E16814932208?Item=N82E16814932208&buyingoptions=New">(5 Offers)</a>
<span class="price-current-range">
<abbr title="to">–</abbr>
</span>
</li>
答案 0 :(得分:0)
要获取<strong>...</strong>
之间的数字,可以使用以下示例:
from bs4 import BeautifulSoup
txt = '''<li class="price-current">
<span class="price-current-label">
</span>$<strong>409</strong><sup>.99</sup> <a class="price-current-num" href="https://www.newegg.com/gigabyte-radeon-rx-5700-xt-gv-r57xtgaming-oc-8gd/p/N82E16814932208?Item=N82E16814932208&buyingoptions=New">(5 Offers)</a>
<span class="price-current-range">
<abbr title="to">–</abbr>
</span>
</li>'''
soup = BeautifulSoup(txt, 'html.parser')
print( soup.select_one('.price-current strong').text )
打印:
409
要获取整个价格(包括.
的价格),可以使用re
模块:
import re
price = re.search(r'\$\d+.?\d*', soup.select_one('.price-current').text)
if price:
print(price.group())
打印:
$409.99