我试图从https://lajumate.ro/ajax/phone-number获取电话号码(example product page)
该页面需要具有特定cookie和数据的POST机制。 CURL请求的示例如下所示:
curl 'https://lajumate.ro/ajax/phone-number' -H 'Cookie: XSRF-TOKEN=eyJpdiI6IkVqUEwrZjU2UDJOaFB3NDl6b0xhTFE9PSIsInZhbHVlIjoiemJnTCt3S0UxTjUwVjVTQk0xUzlRSVpNZGVPM0dIVHBcL1JlYTVabmcxelFPNG5ZZ1d4NGYxbmpnRTAxeVRSaWRcLzZTUVhRVzlNcmtyOHJvcWFOdlE3UT09IiwibWFjIjoiYmUzOGNkNDlkMjMyNzY3YTQxNzE0ZWEwNmJhMDExZWUzODdmZmU5MmZmMTEwODk1ZTE3ZjYxNTkxZjYyNzFkOCJ9; ljs= eyJpdiI6ImdYR28xcnZvSXFiNHpSekVyeHJOQVE9PSIsInZhbHVlIjoiSnJtTlBRMmRJY1ZqNUtxWXdPREdlYnptc3pKWGRmZ1ppdjdCc0lcL040NzlDbytTcWNZb1Bwa0kyejlKM3NmNGZ0dDMwcFNhaXZ6WHlWSExFaHlNYnFnPT0iLCJtYWMiOiI4ZWY2MzRiNTY5Mjc3M2FmYjllNDJiODEyYWRmNzUxNjViYWM0OTIyZjQ3OTRjODhiMjM3N2NlNTJjYWJiNTRiIn0;' --data '_token=lT8dwMv5vqGrnh0drb6pW7sreYjguJn5qaCXZIck&ad_id=3834372' --compressed
这有效(请注意,Cookie和令牌过期)。所以我创建了一个蜘蛛来重新创建这个请求。代码如下所示:
req = FormRequest(
'https://lajumate.ro/ajax/phone-number',
callback=self.parse_phone,
formdata={'ad_id':re.sub(r".+?(\d+)\.html",r"\1",response.url),'_token':response.xpath('//input[@name="_token"]/@value').extract()[0]},
headers={'Cookie':'ljs='+ljs+';XSRF-TOKEN='+XSRF},
dont_filter=True
)
ljs和XSRF是从响应cookie中提取的。
我使用两个调试记录器来检查请求:
self.logger.debug('Request headers: %s', dict(req.headers))
self.logger.debug('Request body: %s', req.body)
导致:
2017-01-04 11:44:41 [lajumate-sellers] DEBUG:请求标题: {'曲奇&#39 ;: [' LJS = eyJpdiI6IlBTU05tWlV0NW1DZGJaZk5nemEzTUE9PSIsInZhbHVlIjoiY3JDNFR2clpkMGVaNHVqODZFT2NvTmFRb1BKRmZCS0pCRndwd0xNNXVzV2M1WUNCUm5MWXFnbEU5RGZkQnVRNHFNMFp5S0E4TllkZXVtNk5cL3JSU1FBPT0iLCJtYWMiOiJhOWRhYmJmODg1NzcwOTRhNzQ5ZTlhNDg4OTEzZWNiNDc5NDhlNzZmMmQ3MDliYjM0ODlkZDAwOTYzN2NkNTkzIn0%3D; XSRF-TOKEN = eyJpdiI6ImNHNzZhbVViNWxTUm16bmg5amF0SFE9PSIsInZhbHVlIjoiWGRPMWFjVFBPTFNYWkxrNjI2THJIYU1KeStLcTg4Z3FFRkFqOWJjMDdHNUJKNXFuY2pKVXkxTVpuT1ExNXpSZWZHM1FPMzRjSTY0R3lSVndJME1GMFE9PSIsIm1hYyI6ImQwYThlMGQzYzA3NjA3YmE2ZTAwYjA0NjRiNzRjNTY4NGVlNjEwZjUxMzFiMWE0OGI3Nzk5YWVlNmVkODllNGEifQ%3D%3D&#39], '内容类型':[' application / x-www-form-urlencoded']}
2017-01-04 11:44:41 [lajumate-sellers] DEBUG:请求正文: _token = VtCrPpqMwpcO1FRCZ12pnYmXj7Bv14B8o4aRcZyA&安培; ad_id = 3576651
这一切看起来都应该如此。但是当蜘蛛试图加载页面时,它会使用302状态代码重定向请求。
但是,当我将调试数据复制粘贴到curl命令或hurl.it时,我能够获取数据。
有关如何解决此问题的任何建议吗?