我正在尝试学习Scrapy网络爬行并使用分类汽车网站进行主题检查以确定对策。我知道X-AjaxPro-Method存在,因为Chrome Developer Tools显示传递的标题并收到正确的响应。但是当在Scrapy shell中完成时,我得到了#34;这个方法要么没有用AjaxMethod标记,要么不可用。"
以下是使用的shell命令:
>>> from scrapy.http import FormRequest
>>>
request=FormRequest(url='https://www.carwale.com/ajaxpro/CarwaleAjax.AjaxClassifiedBuyer,Carwale.ashx',headers={"X-AjaxPro-Method":"ProcessUsedCarPurchaseInquiry","Content-Type":"application/x-www-form-urlencoded; charset=UTF-8","X-Requested-With":"XMLHttpRequest"},formdata={"profileId":"D1249107","buyerName":"","buyerEmail":"","buyerMobile":"9938223299","carModel":"","makeYear":"","pageUrl":"https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk","isP":"False","transToken":"","ltsrc":"","buyerSourceId":"4","comments":"","cwc":"buJNfItyQKBP8a3OahoJsOOmg","utma":"\"52149691.1076750176.1492103717.1492447801.1492447801.8\"","utmz":"\"52149691.1492103720.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)\"","originId":"3","isFromCaptcha":"","isGSDClick":"","isRecommended":"","isCertificationDownload":""})
>>> fetch(request)
2017-04-18 08:45:32 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://www.carwale.com/ajaxpro/CarwaleAjax.AjaxClassifiedBuyerCarwale,Carwale.ashx> (referer: None)
>>> print(response.body)
{"error":{"Message":"This method is either not marked with an AjaxMethod or is not available.","Type":"System.NotSupportedException"}}
>>>
原始页面位于https://www.carwale.com/used/cars-in-karnal/chevrolet-enjoy-d1249107/?rk=69&isP=false,必须输入手机号码才能获得&#34;卖家详细信息。&#34;
所以,我进一步挖掘并分享了更多信息。我已经能够使用浏览器中的开发人员工具将XHR导出为curl命令,然后将其修剪下来,以便在我看来唯一需要的标题是X-AjaxPro-Method,因为curl命令只适用于标题和数据。
还可以使用Python请求库工作。
答案 0 :(得分:1)
将您发布的请求数据与我在Firebug中看到的数据进行比较,我怀疑您的请求中至少缺少其中一项:
总而言之,像carwale.com这样的ajax供电网站有很多活动部件,并且是一个不太好的对象,开始学习scrapy&#34;
PS:使用FormRequest的更好方法是request = FormRequest.from_response(response_with_form_page, ...)
。这适用于大多数表单,因为scrapy将从表单页面自动提取所有隐藏的POST参数。有关详细信息,请参阅:https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.FormRequest.from_response