如何在“加载更多结果”页面中使用scrapy

时间:2019-03-07 18:33:25

标签: javascript python html web-scraping scrapy

我正试图抓取有关爱德华·斯诺登的所有BBC新闻,除“显示更多结果”按钮外,其他一切都进行得很好。我正在使用以下代码:

import scrapy

class bbcSpider(scrapy.Spider):
    name = 'bbc'
    start_urls = ['https://www.bbc.co.uk/search?q=edward+snowden&sa_f=search-product&filter=news&suggid=#page=1']

    def parse(self, response):
         SET_SELECTOR = 'ol.search-results.results'
         for article in response.css(SET_SELECTOR):

             title = "li article.has_image.media-text div h1 a::text"
             link = "li article.has_image.media-text div h1 a::attr(href)"
             date = "li article.has_image.media-text aside.flags.top dl dd time.display-date::text"
             yield {
                'title': article.css(title).getall(),
                'link': article.css(link).getall(),
                'date': article.css(date).getall(),
             }
         NEXT_PAGE_SELECTOR = 'nav.pagination a.more::attr(href)'
                next_page = response.css(NEXT_PAGE_SELECTOR).extract_first()
                if next_page:
                    yield scrapy.Request(
                        response.urljoin(next_page),
                        callback=self.parse
                    )

我真的不知道该怎么办。我不知道此按钮是否为javascript,但由于scrapy fetch函数适用于该按钮,所以它没有出现。

谢谢!

0 个答案:

没有答案