Question

我正在尝试学习Scrapy。我试图在Scrapy中复制以下发布请求，但没有成功。我也尝试过scrapy.Request(method='POST')，但也没有用。

import requests, json

headers = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'content-length': '132',
'content-type': 'application/x-www-form-urlencoded',
'origin': 'https://www.autozone.com',
'referer': 'https://www.autozone.com/miscellaneous-non-automotive/jump-starter',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
'x-requested-with': 'XMLHttpRequest'
}

url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'

data = {
'arg1': '9801',
'arg2': '',
'arg3': '824997',
'arg4': ''
}

response = requests.post(url, headers=headers, data=data, timeout=5)

info = json.loads(response.text)
print(info['atgResponse'][0]['retailPrice']) # prints 129.99

Scrapy Shell：

> r = scrapy.FormRequest(url, formdata=data, headers=headers)
> fetch(r) # Doesn't work

谁能指出我要去哪里

编辑1：

这是scrapy的堆栈跟踪。希望这会有所帮助。

>>> fetch(r)
2020-02-15 15:00:08 [scrapy.core.engine] INFO: Spider opened
2020-02-15 15:03:08 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2> (failed 1 times): User timeout caused connection failure: Getting https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2 took longer than 180.0 seconds..

它重试几次，然后失败。

谢谢。

Answer 1

我尝试访问your link，但返回了此错误Access to the requested resource is not allowed: /autozone/diy/commerce/pricing/PricingServices，因此我怀疑您在请求中需要Authorization标头或会话Cookie，但未提供也未放置一个占位符。缺少这些可能会导致超时。

请求模块有效，但FormRequest无效

1 个答案: