Scrapy中的XHR请求失败但在python请求中有效

时间:2016-06-24 13:17:54

标签: python xmlhttprequest scrapy httprequest python-requests

我正在尝试使用Ajax从站点获取数据,我只是使用标题和正文来缓解XHR请求,并且我得到400响应,告诉我不允许该请求。这是我的代码:

from scrapy import Spider
from scrapy import Request, FormRequest
import json

class jsonSpider(Spider):
    name = 'json'

    start_urls = [
        'http://m.ctrip.com/restapi/soa2/10932/hotel/Product/domestichotelget']

    def start_requests(self):
        headers = {
            "Host": "m.ctrip.com",
            "User-Agent": "Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16",
            "Accept": "application/json",
            "Accept-Language": "en-US,en;q=0.5",
            "Accept-Encoding": "gzip, deflate",
            "Content-Type": "application/json",
            "cookieOrigin": "http://wap.ctrip.com",
            "Cache-Control": "no-cache",
            "Referer": "http://wap.ctrip.com/webapp/hotel/hoteldetail/426638.html?days=1&atime=20160623&contrl=2&num=1&biz=1",
            "Content-Length": "455",
            "Origin": "http://wap.ctrip.com",
            "Connection": "keep-alive"}
        data = '{"biz":1,"contrl":3,"facility":0,"faclist":[],"key":"","keytp":0,"pay":0,"querys":[],"couponlist":[],"setInfo":{"cityId":2,"dstId":0,"inDay":"2016-06-24","outDay":"2016-06-25"},"sort":{"dir":1,"idx":70,"ordby":0,"size":100},"qbitmap":0,"alliance":{"ishybrid":0},"head":{"ctok":"","cver":"1.0","lang":"01","sid":"8888","syscode":"09","auth":null,"extension":[{"name":"pageid","value":"212093"},{"name":"webp","value":0},{"name":"protocal","value":"http"}]},"contentType":"json"}'
        for url in self.start_urls:
            yield Request(
                url,
                self.parse,
                method='POST',
                headers=headers,
                body=data
            )

    def parse(self, response):
        page = response.body
        print(page)

但是当我用python请求模拟XHR时,它工作正常并得到了json响应,这是我的代码使用请求:

import requests

url = 'http://m.ctrip.com/restapi/soa2/10932/hotel/Product/domestichotelget'
headers = {
    "Host": "m.ctrip.com",
    "User-Agent": "Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16",
    "Accept": "application/json",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate",
    "Content-Type": "application/json",
    "cookieOrigin": "http://wap.ctrip.com",
    "Cache-Control": "no-cache",
    "Referer": "http://wap.ctrip.com/webapp/hotel/hoteldetail/426638.html?days=1&atime=20160623&contrl=2&num=1&biz=1",
    "Content-Length": "455",
    "Origin": "http://wap.ctrip.com",
    "Connection": "keep-alive"}
body = '{"biz":1,"contrl":3,"facility":0,"faclist":[],"key":"","keytp":0,"pay":0,"querys":[],"couponlist":[],"setInfo":{"cityId":2,"dstId":0,"inDay":"2016-06-24","outDay":"2016-06-25"},"sort":{"dir":1,"idx":70,"ordby":0,"size":100},"qbitmap":0,"alliance":{"ishybrid":0},"head":{"ctok":"","cver":"1.0","lang":"01","sid":"8888","syscode":"09","auth":null,"extension":[{"name":"pageid","value":"212093"},{"name":"webp","value":0},{"name":"protocal","value":"http"}]},"contentType":"json"}'


response = requests.post(url, headers=headers, data=body).content
print(response)

我的scrapy代码出了什么问题?

2 个答案:

答案 0 :(得分:2)

这对您有用,它为以下代码提供了200个响应

from scrapy import Spider
from scrapy import Request, FormRequest
import json


class jsonSpider(Spider):
   name = 'json_spider'

   start_urls = [
    'http://m.ctrip.com/restapi/soa2/10932/hotel/Product/domestichotelget']

   def start_requests(self):
      headers = {
        "Accept": "application/json",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate",
        "Connection": "keep-alive"}
      data = {"biz":1,"contrl":3,"facility":0,"faclist":[],"key":"","keytp":0,"pay":0,"querys":[],"couponlist":[],"setInfo":{"cityId":2,"dstId":0,"inDay":"2016-06-24","outDay":"2016-06-25"},"sort":{"dir":1,"idx":70,"ordby":0,"size":100},"qbitmap":0,"alliance":{"ishybrid":0},"head":{"ctok":"","cver":"1.0","lang":"01","sid":"8888","syscode":"09","auth":None,"extension":[{"name":"pageid","value":"212093"},{"name":"webp","value":0},{"name":"protocal","value":"http"}]},"contentType":"json"}
      for url in self.start_urls:
         yield Request(
                url,
                self.parse,
                method='POST',
                headers=headers,
                body=json.dumps(data)
        )

   def parse(self, response):
     page = response.body
     print(page)

答案 1 :(得分:2)

删除标题中的"Content-Length": "455",并让scrapy自行计算。你的data长477个字节,所以服务器,我猜,它占用传入数据的前455个字节,并且无法解析为json,因为它未完成并返回400,这意味着Bad Request