如何识别需要发送的请求的关键信息?

时间:2015-07-23 13:41:38

标签: web-scraping scrapy session-cookies scrapy-spider

我想从使用自动填充请求的this website中获取一些票价。

这是我的代码:

import scrapy
from scrapy.http import Request, FormRequest
import urllib

class CabforceSpider(scrapy.Spider):
    name = 'cabforce'
    start_urls = ['https://www.cabforce.com']
    complete_url = 'https://www.cabforce.com/v1/geo/autocomplete'

    def parse(self, response):
        payload = {
            'chnl': 'cforce',
            'complete': 'Barcelona Airport',
            'destination': 'Barcelona'
        }
        return Request(
            self.complete_url,
            self.print_json,
            method='POST',
            body=urllib.urlencode(payload),
            headers={'X-Requested-With': 'XMLHttpRequest'})

    def print_json(self, response):
        print response.body

不幸的是,我的回答是这样的:

{"status":"ArgumentError","reason":"Cannot validate input","description":null,"reasonType":2000,"details":[]}

如何找出缺少哪些信息但需要随请求一起发送?我想到了JSESSIONID和版本,但我无法弄清楚如何做到这一点。 感谢任何提示并度过美好的一天!

2 个答案:

答案 0 :(得分:1)

您甚至不需要将cookie与您的请求一起发送。问题在于

body=urllib.urlencode(payload),

这会将正文编码为URL格式,但是如果查看浏览器请求的正文,您会看到JSON是正文。

所以解决方法是import json并将上面提到的行改为这一行:

body=json.dumps(payload),

在这种情况下,我的蜘蛛会得到以下结果:

{"status":"Ok","result":{"autocomplete":{"elements":[{"type":16,"description":"(BCN) - Barcelona Airport, Barcelona, Spain","location":{"lat":41.289545,"lng":2.072639},"raw":{"name":"(BCN) - Barcelona Airport","city":"Barcelona","country":"Spain"}},{"location":{"lat":41.3181887517739,"lng":2.07441323388724},"description":"Barcelona Airport Hotel, Plaza Volatería, 3, El Prat de Llobregat, Spain","raw":{"name":"Barcelona Airport Hotel","city":"El Prat de Llobregat","country":"Spain"},"type":4},{"location":{"lat":41.3176275,"lng":2.0249774},"description":"Airport Barcelona Apartments, Rafael Casanova, 37, Viladecans, Spain","raw":{"name":"Airport Barcelona Apartments","city":"Viladecans","country":"Spain"},"type":4}]}}}

答案 1 :(得分:0)

表单中可能存在隐藏的输入,其中包含您未提交的数据。使用FormRequest对象而不是简单的Request。此请求将自动填充所有字段,您只能覆盖要更改的字段。

查看documentation