Question

我正在尝试将http://doc.scrapy.org/en/latest/intro/tutorial.html中的“关注链接”示例改编为我自己的蜘蛛：

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from funda.items import FundaItem

class PropertyLinksSpider(CrawlSpider):

    name = "property_links"
    allowed_domains = ["funda.nl"]

    def __init__(self, place='amsterdam', page='1'):
        self.start_urls = ["http://www.funda.nl/koop/%s/p%s/" % (place, page)]
        self.base_url = "http://www.funda.nl/koop/%s/" % place
        self.le1 = LinkExtractor(allow=r'%s+huis|appartement-\d{8}' % self.base_url)

    def parse(self, response):
        links = self.le1.extract_links(response)
        for link in links:
            if link.url.count('/') == 6 and link.url.endswith('/'):
                item = FundaItem()
                item['url'] = link.url
                yield scrapy.Request(link.url, callback=self.parse_dir_contents)

    def parse_dir_contents(self, response):
        item['title'] = response.xpath('//title').extract()
        yield item

但是，如果我尝试使用命令

运行它

scrapy crawl property_links -a place=amsterdam -a page=1 -o property_links_test.json

我得到一个空的.json文件：

在这个蜘蛛的先前版本中，我只使用parse方法yield item蜘蛛生成了一个带有预期链接的.json文件。我还使用Scrapy shell检查了页面是否有标题。所以我不明白为什么我没有得到任何输出？

Answer 1

您没有将项目解析为第二个功能，此代码对我来说很好。

ViewController (Details)

scrapy请求不产生任何输出

1 个答案: