目标是搜寻具有无限滚动功能的网站。这通常是相对容易的,但在这种情况下,来自该特定网站的响应是一个数组,而我在python中打calling了
通过Chrome的检查,我在下面确定了分页格式的广告。
还使用Chrome浏览器的检查显示了回调的响应,但是对于这个特定的网站,响应是我习惯的数组而不是json,因此问题是检查“具有下一页支持”
响应为:[{“ selector”:“ nextPageContent”,“ replace”:“ outer”,“ content”:“”},{“ prop”:“ hasNextPage”,“ val”:true}] < / p>
主要兴趣是捕获 {“ prop”:“ hasNextPage”,“ val”:true}
我当前的代码如下:
class WebsiteSpider(scrapy.Spider):
name = "web_spider"
pagination_url = 'https://www....?page=%s' # website callback for pagination
start_urls = [pagination_url % 1]
download_delay = 1.5
def parse(self, response):
links = response.xpath(".//a[@class='member-link']/@href").extract() # works great
links = list(dict.fromkeys(links))
# json response trial #1
data = json.loads(response.body) # usually works with json response but doesnt work in this site
print(data)
# array response trial #1
array_data = response[1] # doesnt work
print(array_data)
谢谢。