当您进入亚马逊时,有多个不同价格的卖家可用,我可以抓取页面上显示的价格,但无法抓取其他卖家的价格。 在“立即购买”和“添加”列表下方有一个按钮,上面写着“New(x)from”,如果您点击所有其他卖家都显示出来,我想刮掉他们的价格,但是当我输入他们的价格 XPath 时,它给了我一个错误>
from requests_html import HTMLSession
url = 'https://www.amazon.co.uk/Panini-Sticker-Collection-
x50Packs/dp/B08V8CF748?
ref_=Oct_DLandingS_D_7a870443_60&smid=A3P5ROKL5A1OLE'
def GetPrice(URL):
s = HTMLSession()
r = s.get(url)
product = {
'price':r.html.xpath('//*[@id="aod-price-1"]/span/span[2]' )
}
print(product)
return product
GetPrice('https://www.amazon.co.uk/Colgate-Fresh-Cooling-Crystals-Toothpaste/dp/B073V1MB17/ref=sr_1_5_mod_primary_new?dchild=1&keywords=Toothpaste&qid=1625698678&rdc=1&sbo=RZvfv%2F%2FHxDF%2BO5021pAnSA%3D%3D&sr=8-5')
答案 0 :(得分:0)
要解决此问题,请尝试使用浏览器开发人员工具并检查在触发任何事件时如何加载请求,然后尝试通过您的代码复制相同的行为。
代码
import requests
from lxml import html
headers = {
'authority': 'www.amazon.co.uk',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'rtt': '100',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0',
'accept': 'text/html,*/*',
'x-requested-with': 'XMLHttpRequest',
'downlink': '8.4',
'ect': '4g',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.amazon.co.uk/Colgate-Fresh-Cooling-Crystals-Toothpaste/dp/B073V1MB17/ref=sr_1_5_mod_primary_new?dchild=1&keywords=Toothpaste&qid=1625698678&rdc=1&sbo=RZvfv%2F%2FHxDF%2BO5021pAnSA%3D%3D&sr=8-5',
'accept-language': 'en-US,en;q=0.9',
'cookie': 'session-id=260-4106472-8409244; i18n-prefs=GBP; ubid-acbuk=260-0481762-4830301; session-token="jc9/khgoELjvvnVyfyUE0zuV+IqwaxQgEelGbV4ihI0VtbHOyZRfQgTpdo7j85y9QuCH+19fCvnLDgNhdjtSrMCWh4U1Pct/A53U0ylVSUCMLNa4HHZqV6q/VBo8EIf0KSIkY47ClNUgwWLkZxzHkm5GWvqvqYBBl7wXIR9zKxY9x0WhN1KrWagXd8Ud062lFMG+ThXyKi0JTHk2K14qmEbPRjE2tmDCZbANgBgXvq4GAXYK/qamSGtiwHIL88aOcKL+4xjmV0o="; csm-hit=adb:adblk_yes&t:1625843678136&tb:s-HPFR45XZN4E5FMK5EH5M|1625843675264; session-id-time=2082758401l',
}
params = (
('asin', 'B073V1MB17'),
('m', ''),
('qid', '1625698678'),
('smid', ''),
('sourcecustomerorglistid', ''),
('sourcecustomerorglistitemid', ''),
('sr', '8-5'),
('pc', 'dp'),
)
s = requests.Session()
response = s.get('https://www.amazon.co.uk/gp/aod/ajax/ref=dp_aod_NEW_mbc', headers=headers,params=params)
tree = html.fromstring(response.content)
prices = tree.xpath('//span[contains(@class,"a-offscreen")]')
for price in prices[1:]:
print(price.text)
输出
£1.87
£2.04
£1.44