我尝试使用beautifulsoup来获取有关newegg的价格信息,但是没有运气。 我尝试使用下面的代码,并试图使其返回笔记本电脑的价格1268。
import requests
from bs4 import BeautifulSoup
data = requests.get('https://www.newegg.com/p/1XV-000E-00331?Description=MUHN2LL%2fA&cm_re=MUHN2LL%2fA-_-1XV-000E-00331-_-Product')
soup = BeautifulSoup(data.content, 'lxml')
price = soup.select_one('[itemprop=price]')['content']
print(price)
有人可以帮助我退还1268吗?
答案 0 :(得分:0)
您所需的目标已加载JavaScript
,因此bs4
和requests
模块将无法呈现JS
。
但这是一个解决方案。
所有产品页面均包含一个稳定的字符串,该字符串为:
Compare offers from more sellers as low as $1,268.90 plus shipping
所以我们将regex
应用于您,您也可以将其应用于其他任何页面。
import requests
import re
params = {
"Description": "MUHN2LL/A",
"cm_re": "MUHN2LL/A-_-1XV-000E-00331-_-Product"
}
def main(url):
r = requests.get(url, params=params)
match = re.search(r'low as.+\$(.+\d)', r.text).group(1)
print(match)
main("https://www.newegg.com/p/1XV-000E-00331")
输出:
1,268.90
还有另一个丑陋的想法,您可以在其中解析经过编码的JSONP
:
类似以下内容:
import requests
from bs4 import BeautifulSoup
import re
params1 = {
"Description": "MUHN2LL/A",
"cm_re": "MUHN2LL/A-_-1XV-000E-00331-_-Product"
}
params2 = {
"FirstCall": "true",
"PageNum": "1",
"TabType": "0",
"FilterBy": "",
"SortBy": "0",
"action": "Biz.Product.MoreBuyingOptions.JsonpCallBack"
}
def main(url):
with requests.Session() as req:
r = req.get(url, params=params1)
soup = BeautifulSoup(r.content, 'html.parser')
params2['ParentItem'] = soup.find(
"input", id="mboParentItemNumber").get("value")
params2['MappingId'] = soup.find(
"input", id="mboMappingId").get("value")
r = req.get(
"https://www.newegg.com/Common/Ajax/LoadMoreBuyingOption.aspx", params=params2)
match = [item.group(1, 2) for item in re.finditer(
r'price-current-label.+?\>(\d.+?)<.+?p>(.+?)<', r.text)][-1]
print(match)
main("https://www.newegg.com/p/1XV-000E-00331")
输出:
('1,268', '.90')