Question

我是python和scrapy的新手。

我想废弃网站上的数据。

该网站使用AJAX进行滚动。

获取请求网址如下所示。

http://www.justdial.com/functions/ajxsearch.php?national_search=0&act=pagination&city=Mumbai&search=Chemical+Dealers&where=&catid=944&psearch=&prid=&page=2&SID=&mntypgrp=0&toknbkt=&bookDate=

请帮助我如何使用scrapy或任何其他python库

感谢。

Answer 1

似乎这样，AJAX请求需要一个正确的Referer标头，这只是当前页面的一个网址。您可以在创建请求时简单地设置标题：

def parse(self, response):
    # e.g. http://www.justdial.com/Mumbai/Dentists/ct-385543
    my_headers = {'Referer': response.url}
    yield Request("ajax_request_url",
                  headers=my_headers,
                  callback=self.parse_ajax)

def parse_ajax(self, response):
    # results should be here

使用scrapy从Infinite滚动中抓取数据

1 个答案: