尝试从Python中的newegg抓取价格信息时遇到问题

时间:2020-04-21 00:35:13

标签: html python-3.x web-scraping beautifulsoup price

我尝试使用beautifulsoup来获取有关newegg的价格信息,但是没有运气。 我尝试使用下面的代码,并试图使其返回笔记本电脑的价格1268。

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.newegg.com/p/1XV-000E-00331?Description=MUHN2LL%2fA&cm_re=MUHN2LL%2fA-_-1XV-000E-00331-_-Product')
soup = BeautifulSoup(data.content, 'lxml')
price = soup.select_one('[itemprop=price]')['content']
print(price)

有人可以帮助我退还1268吗?

1 个答案:

答案 0 :(得分:0)

您所需的目标已加载JavaScript,因此bs4requests模块将无法呈现JS

但这是一个解决方案。

所有产品页面均包含一个稳定的字符串,该字符串为:

Compare offers from more sellers as low as $1,268.90 plus shipping

所以我们将regex应用于您,您也可以将其应用于其他任何页面。

import requests
import re

params = {
    "Description": "MUHN2LL/A",
    "cm_re": "MUHN2LL/A-_-1XV-000E-00331-_-Product"
}


def main(url):
    r = requests.get(url, params=params)
    match = re.search(r'low as.+\$(.+\d)', r.text).group(1)
    print(match)


main("https://www.newegg.com/p/1XV-000E-00331")

输出:

1,268.90

还有另一个丑陋的想法,您可以在其中解析经过编码的JSONP

类似以下内容:

import requests
from bs4 import BeautifulSoup
import re

params1 = {
    "Description": "MUHN2LL/A",
    "cm_re": "MUHN2LL/A-_-1XV-000E-00331-_-Product"
}

params2 = {
    "FirstCall": "true",
    "PageNum": "1",
    "TabType": "0",
    "FilterBy": "",
    "SortBy": "0",
    "action": "Biz.Product.MoreBuyingOptions.JsonpCallBack"
}


def main(url):
    with requests.Session() as req:
        r = req.get(url, params=params1)
        soup = BeautifulSoup(r.content, 'html.parser')
        params2['ParentItem'] = soup.find(
            "input", id="mboParentItemNumber").get("value")
        params2['MappingId'] = soup.find(
            "input", id="mboMappingId").get("value")
        r = req.get(
            "https://www.newegg.com/Common/Ajax/LoadMoreBuyingOption.aspx", params=params2)
        match = [item.group(1, 2) for item in re.finditer(
            r'price-current-label.+?\>(\d.+?)<.+?p>(.+?)<', r.text)][-1]
        print(match)


main("https://www.newegg.com/p/1XV-000E-00331")

输出:

('1,268', '.90')