HttpError与GraphQL上刮

时间:2019-07-02 14:59:20

标签: api web-scraping scrapy graphql

我正在抓取一个必须通过graphql API进行身份验证的网站,该代码停止了工作,现在出现此错误:

2019-07-02 14:40:06 [matchScrapDetail] ERROR: <twisted.python.failure.Failure scrapy.spidermiddlewares.httperror.HttpError: Ignoring non-200 r
esponse>
2019-07-02 14:40:06 [matchScrapDetail] ERROR: HttpError on https://www.my_site.com/graphql
2019-07-02 14:40:06 [scrapy.core.engine] INFO: Closing spider (finished)

当我在浏览器上检查登录名时,我看不到任何变化:

请求标头看起来相同:

:authority: www.my_site.com
:method: POST
:path: /graphql
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
content-length: 416
content-type: application/json

与有效载荷相同:

REQUEST PAYLOAD

operationName: "Counts", variables: {},…}
operationName: "Counts"
query: "query Counts {↵  counts {↵    userConnectionCounts {↵      connectionType↵      totalCount↵      newCount↵      __typename↵    }↵    userInteractionCounts {↵      interactionType↵      newCount↵      __typename↵    }↵    subscriptionBenefitsCounts {↵      newFeatureCount↵      newBenefitGrantCount↵      __typename↵    }↵    __typename↵  }↵}↵"
variables: {}

这是我发送的有效载荷:

operationName": "Counts",
                        "variables": {},
                        "query": "query Counts {\n  counts {\n    userConnectionCounts {\n      connectionType\n      totalCount\n      newCount\n      __typename\n    }\n    userInteractionCounts {\n      interactionType\n      newCount\n      __typename\n    }\n    __typename\n  }\n}\n

和请求:

self.tokenCookieData['authtoken'] = authtoken
        return Request(url="https://www.my_site.com/graphql",
                        method="POST",
                        body=json.dumps(my_data),
                        cookies=self.tokenCookieData,
                        headers=headersData,
                        errback=self.errback_httpbin,
                        meta={'dont_redirect': True,
                            'handle_httpstatus_list': [302]},
                        callback=self.afterToken,
                        dont_filter=True
                        )

0 个答案:

没有答案