Question

目标是搜寻具有无限滚动功能的网站。这通常是相对容易的，但在这种情况下，来自该特定网站的响应是一个数组，而我在python中打calling了

通过Chrome的检查，我在下面确定了分页格式的广告。

还使用Chrome浏览器的检查显示了回调的响应，但是对于这个特定的网站，响应是我习惯的数组而不是json，因此问题是检查“具有下一页支持”

响应为：[{“ selector”：“ nextPageContent”，“ replace”：“ outer”，“ content”：“”}，{“ prop”：“ hasNextPage”，“ val”：true}] < / p>

主要兴趣是捕获 {“ prop”：“ hasNextPage”，“ val”：true}

我当前的代码如下：

class WebsiteSpider(scrapy.Spider):
    name = "web_spider"

    pagination_url = 'https://www....?page=%s'  # website callback for pagination 
    start_urls = [pagination_url % 1]
    download_delay = 1.5

    def parse(self, response):
        links = response.xpath(".//a[@class='member-link']/@href").extract()  # works great
        links = list(dict.fromkeys(links))

        # json response trial #1
        data = json.loads(response.body)  # usually works with json response but doesnt work in this site
        print(data)

        # array response trial #1
        array_data = response[1]  # doesnt work
        print(array_data)

谢谢。

Scrapy无限滚动，其中网站响应是一个数组

0 个答案: