如何提取两个元素之间的数字? (抓网)

时间:2020-06-12 23:07:20

标签: javascript python html beautifulsoup

我从网络抓取开始,我想提取strong元素之间的数字。

我正在使用python 3.8和beautifulsoup

<li class="price-current">
    <span class="price-current-label">
    </span>$<strong>409</strong><sup>.99</sup> <a class="price-current-num" href="https://www.newegg.com/gigabyte-radeon-rx-5700-xt-gv-r57xtgaming-oc-8gd/p/N82E16814932208?Item=N82E16814932208&amp;buyingoptions=New">(5 Offers)</a>
    <span class="price-current-range">
        <abbr title="to">–</abbr>
    </span>
</li>

1 个答案:

答案 0 :(得分:0)

要获取<strong>...</strong>之间的数字,可以使用以下示例:

from bs4 import BeautifulSoup

txt = '''<li class="price-current">
    <span class="price-current-label">
    </span>$<strong>409</strong><sup>.99</sup> <a class="price-current-num" href="https://www.newegg.com/gigabyte-radeon-rx-5700-xt-gv-r57xtgaming-oc-8gd/p/N82E16814932208?Item=N82E16814932208&amp;buyingoptions=New">(5 Offers)</a>
    <span class="price-current-range">
        <abbr title="to">–</abbr>
    </span>
</li>'''

soup = BeautifulSoup(txt, 'html.parser')

print( soup.select_one('.price-current strong').text )

打印:

409

要获取整个价格(包括.的价格),可以使用re模块:

import re

price = re.search(r'\$\d+.?\d*', soup.select_one('.price-current').text)
if price:
    print(price.group())

打印:

$409.99