我想从使用自动填充请求的this website中获取一些票价。
这是我的代码:
import scrapy
from scrapy.http import Request, FormRequest
import urllib
class CabforceSpider(scrapy.Spider):
name = 'cabforce'
start_urls = ['https://www.cabforce.com']
complete_url = 'https://www.cabforce.com/v1/geo/autocomplete'
def parse(self, response):
payload = {
'chnl': 'cforce',
'complete': 'Barcelona Airport',
'destination': 'Barcelona'
}
return Request(
self.complete_url,
self.print_json,
method='POST',
body=urllib.urlencode(payload),
headers={'X-Requested-With': 'XMLHttpRequest'})
def print_json(self, response):
print response.body
不幸的是,我的回答是这样的:
{"status":"ArgumentError","reason":"Cannot validate input","description":null,"reasonType":2000,"details":[]}
如何找出缺少哪些信息但需要随请求一起发送?我想到了JSESSIONID和版本,但我无法弄清楚如何做到这一点。 感谢任何提示并度过美好的一天!
答案 0 :(得分:1)
您甚至不需要将cookie与您的请求一起发送。问题在于
body=urllib.urlencode(payload),
这会将正文编码为URL格式,但是如果查看浏览器请求的正文,您会看到JSON是正文。
所以解决方法是import json
并将上面提到的行改为这一行:
body=json.dumps(payload),
在这种情况下,我的蜘蛛会得到以下结果:
{"status":"Ok","result":{"autocomplete":{"elements":[{"type":16,"description":"(BCN) - Barcelona Airport, Barcelona, Spain","location":{"lat":41.289545,"lng":2.072639},"raw":{"name":"(BCN) - Barcelona Airport","city":"Barcelona","country":"Spain"}},{"location":{"lat":41.3181887517739,"lng":2.07441323388724},"description":"Barcelona Airport Hotel, Plaza Volatería, 3, El Prat de Llobregat, Spain","raw":{"name":"Barcelona Airport Hotel","city":"El Prat de Llobregat","country":"Spain"},"type":4},{"location":{"lat":41.3176275,"lng":2.0249774},"description":"Airport Barcelona Apartments, Rafael Casanova, 37, Viladecans, Spain","raw":{"name":"Airport Barcelona Apartments","city":"Viladecans","country":"Spain"},"type":4}]}}}
答案 1 :(得分:0)
表单中可能存在隐藏的输入,其中包含您未提交的数据。使用FormRequest
对象而不是简单的Request
。此请求将自动填充所有字段,您只能覆盖要更改的字段。