Question

我想从使用自动填充请求的this website中获取一些票价。

这是我的代码：

import scrapy
from scrapy.http import Request, FormRequest
import urllib

class CabforceSpider(scrapy.Spider):
    name = 'cabforce'
    start_urls = ['https://www.cabforce.com']
    complete_url = 'https://www.cabforce.com/v1/geo/autocomplete'

    def parse(self, response):
        payload = {
            'chnl': 'cforce',
            'complete': 'Barcelona Airport',
            'destination': 'Barcelona'
        }
        return Request(
            self.complete_url,
            self.print_json,
            method='POST',
            body=urllib.urlencode(payload),
            headers={'X-Requested-With': 'XMLHttpRequest'})

    def print_json(self, response):
        print response.body

不幸的是，我的回答是这样的：

{"status":"ArgumentError","reason":"Cannot validate input","description":null,"reasonType":2000,"details":[]}

如何找出缺少哪些信息但需要随请求一起发送？我想到了JSESSIONID和版本，但我无法弄清楚如何做到这一点。感谢任何提示并度过美好的一天！

Answer 1

您甚至不需要将cookie与您的请求一起发送。问题在于

body=urllib.urlencode(payload),

这会将正文编码为URL格式，但是如果查看浏览器请求的正文，您会看到JSON是正文。

所以解决方法是import json并将上面提到的行改为这一行：

body=json.dumps(payload),

在这种情况下，我的蜘蛛会得到以下结果：

{"status":"Ok","result":{"autocomplete":{"elements":[{"type":16,"description":"(BCN) - Barcelona Airport, Barcelona, Spain","location":{"lat":41.289545,"lng":2.072639},"raw":{"name":"(BCN) - Barcelona Airport","city":"Barcelona","country":"Spain"}},{"location":{"lat":41.3181887517739,"lng":2.07441323388724},"description":"Barcelona Airport Hotel, Plaza Volatería, 3, El Prat de Llobregat, Spain","raw":{"name":"Barcelona Airport Hotel","city":"El Prat de Llobregat","country":"Spain"},"type":4},{"location":{"lat":41.3176275,"lng":2.0249774},"description":"Airport Barcelona Apartments, Rafael Casanova, 37, Viladecans, Spain","raw":{"name":"Airport Barcelona Apartments","city":"Viladecans","country":"Spain"},"type":4}]}}}

Answer 2

表单中可能存在隐藏的输入，其中包含您未提交的数据。使用FormRequest对象而不是简单的Request。此请求将自动填充所有字段，您只能覆盖要更改的字段。

查看documentation。

如何识别需要发送的请求的关键信息？

2 个答案: