Question

我刚刚开始使用Scrapy，我正在学习如何使用它。请有人解释为什么我的代码中有错误，这个错误是什么？此错误是否与我提供的无效网址有关，和/或是否与无效的xpath相关联？

这是我的代码：

from scrapy.spider import Spider
from scrapy.selector import Selector

class CatswikiSpider(Spider):
    name = "catswiki"
    allowed_domains = ["http://en.wikipedia.org/wiki/Cat‎"]
    start_urls = [
        "http://en.wikipedia.org/wiki/Cat‎"

    ]

    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//body/div')
        for site in sites:
            title = ('//h1/span/text()').extract()
            subtitle = ('//h2/span/text()').extract()
            boldtext = ('//p/b').extract()
            links = ('//a/@href').extract()
            imagelinks = ('//img/@src').re(r'.*cat.*').extract()
            print title, subtitle, boldtext, links, imagelinks


        #filename = response.url.split("/")[-2]
        #open(filename, 'wb').write(response.body)

以下是一些附件，在命令提示符中显示错误：

First section on the command prompt Final section on the command prompt

Answer 1

您需要在所有extract行之前进行函数调用。我不熟悉scrapy，但它可能类似于：

title = site.xpath('//h1/span/text()').extract()

使用Scrapy从维基百科中搜集数据 - 为什么/何时由于处理URL而发生错误？

1 个答案: