我正在抓取一个必须通过graphql API进行身份验证的网站,该代码停止了工作,现在出现此错误:
2019-07-02 14:40:06 [matchScrapDetail] ERROR: <twisted.python.failure.Failure scrapy.spidermiddlewares.httperror.HttpError: Ignoring non-200 r
esponse>
2019-07-02 14:40:06 [matchScrapDetail] ERROR: HttpError on https://www.my_site.com/graphql
2019-07-02 14:40:06 [scrapy.core.engine] INFO: Closing spider (finished)
当我在浏览器上检查登录名时,我看不到任何变化:
请求标头看起来相同:
:authority: www.my_site.com
:method: POST
:path: /graphql
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
content-length: 416
content-type: application/json
与有效载荷相同:
REQUEST PAYLOAD
operationName: "Counts", variables: {},…}
operationName: "Counts"
query: "query Counts {↵ counts {↵ userConnectionCounts {↵ connectionType↵ totalCount↵ newCount↵ __typename↵ }↵ userInteractionCounts {↵ interactionType↵ newCount↵ __typename↵ }↵ subscriptionBenefitsCounts {↵ newFeatureCount↵ newBenefitGrantCount↵ __typename↵ }↵ __typename↵ }↵}↵"
variables: {}
这是我发送的有效载荷:
operationName": "Counts",
"variables": {},
"query": "query Counts {\n counts {\n userConnectionCounts {\n connectionType\n totalCount\n newCount\n __typename\n }\n userInteractionCounts {\n interactionType\n newCount\n __typename\n }\n __typename\n }\n}\n
和请求:
self.tokenCookieData['authtoken'] = authtoken
return Request(url="https://www.my_site.com/graphql",
method="POST",
body=json.dumps(my_data),
cookies=self.tokenCookieData,
headers=headersData,
errback=self.errback_httpbin,
meta={'dont_redirect': True,
'handle_httpstatus_list': [302]},
callback=self.afterToken,
dont_filter=True
)