我正试图抓取有关爱德华·斯诺登的所有BBC新闻,除“显示更多结果”按钮外,其他一切都进行得很好。我正在使用以下代码:
import scrapy
class bbcSpider(scrapy.Spider):
name = 'bbc'
start_urls = ['https://www.bbc.co.uk/search?q=edward+snowden&sa_f=search-product&filter=news&suggid=#page=1']
def parse(self, response):
SET_SELECTOR = 'ol.search-results.results'
for article in response.css(SET_SELECTOR):
title = "li article.has_image.media-text div h1 a::text"
link = "li article.has_image.media-text div h1 a::attr(href)"
date = "li article.has_image.media-text aside.flags.top dl dd time.display-date::text"
yield {
'title': article.css(title).getall(),
'link': article.css(link).getall(),
'date': article.css(date).getall(),
}
NEXT_PAGE_SELECTOR = 'nav.pagination a.more::attr(href)'
next_page = response.css(NEXT_PAGE_SELECTOR).extract_first()
if next_page:
yield scrapy.Request(
response.urljoin(next_page),
callback=self.parse
)
我真的不知道该怎么办。我不知道此按钮是否为javascript,但由于scrapy fetch函数适用于该按钮,所以它没有出现。
谢谢!