Scrapy无限滚动,其中网站响应是一个数组

时间:2019-05-31 09:54:27

标签: json python-3.x pagination scrapy

目标是搜寻具有无限滚动功能的网站。这通常是相对容易的,但在这种情况下,来自该特定网站的响应是一个数组,而我在python中打calling了

通过Chrome的检查,我在下面确定了分页格式的广告。

enter image description here

还使用Chrome浏览器的检查显示了回调的响应,但是对于这个特定的网站,响应是我习惯的数组而不是json,因此问题是检查“具有下一页支持”

响应为:[{“ selector”:“ nextPageContent”,“ replace”:“ outer”,“ content”:“”},{“ prop”:“ hasNextPage”,“ val”:true}] < / p>

主要兴趣是捕获 {“ prop”:“ hasNextPage”,“ val”:true}

enter image description here

我当前的代码如下:

class WebsiteSpider(scrapy.Spider):
    name = "web_spider"

    pagination_url = 'https://www....?page=%s'  # website callback for pagination 
    start_urls = [pagination_url % 1]
    download_delay = 1.5

    def parse(self, response):
        links = response.xpath(".//a[@class='member-link']/@href").extract()  # works great
        links = list(dict.fromkeys(links))

        # json response trial #1
        data = json.loads(response.body)  # usually works with json response but doesnt work in this site
        print(data)

        # array response trial #1
        array_data = response[1]  # doesnt work
        print(array_data)

谢谢。

0 个答案:

没有答案