我正在尝试抓取新闻网站:https://www.larazon.es/etiquetas/noticias/meta/politica#.p:3; 我首先使用以下脚本测试了响应,然后看到了它的效果:
class StackSpider(Spider):
name = 'crawler_larazon'
allowed_domains = ['larazon.es']
start_urls = ['https://www.larazon.es/etiquetas/noticias/meta/politica#.p:3']
def parse(self, response):
from scrapy.shell import inspect_response
inspect_response(response, self)
但是,添加我的选择器和规则时,我没有得到任何回应。我是新手,但是我对可能发生的事情有2个假设:
rules = [
Rule(LinkExtractor(allow=r'etiquetas/noticias/meta/politica#.p:[2-3];'),
callback='parse_item', follow=True)
]
class StackCrawlerSpider(CrawlSpider):
name = 'crawler_larazon'
allowed_domains = ['larazon.es']
start_urls = ['https://www.larazon.es/etiquetas/noticias/meta/politica']
rules = [
Rule(LinkExtractor(allow=r'etiquetas/noticias/meta/politica#.p:[2-3];'),
callback='parse_item', follow=True)
]
def parse_item(self, response):
questions = response.xpath('//h2[@class="news__new__title news__new__title"]')
for question in questions:
item = StackItem()
item['url'] = question.xpath(
'a/@href').extract()[0]
item['source'] = self.allowed_domains[0]
yield item
对我所缺少的东西有什么想法吗? 非常感谢!