Question

我用parse()创建了这个类：

class PitchforkSpider(scrapy.Spider):
    name = "pitchfork_reissues"
    allowed_domains = ["pitchfork.com"]
    #creates objects for each URL listed here
    start_urls = [
                    "http://pitchfork.com/reviews/best/reissues/?page=1",
                    "http://pitchfork.com/reviews/best/reissues/?page=2",
                    "http://pitchfork.com/reviews/best/reissues/?page=3",
    ]

    def parse(self, response):

        for sel in response.xpath('//div[@class="album-artist"]'):
            item = PitchforkItem()
            item['artist'] = sel.xpath('//ul[@class="artist-list"]/li/text()').extract()
            item['reissue'] = sel.xpath('//h2[@class="title"]/text()').extract()

        return item

然后我导入module所属的class：

from blogs.spiders.pitchfork_reissues_feed import *

并尝试在另一个上下文中调用parse()：

def reissues(self):

    pitchfork_reissues = PitchforkSpider()
    reissues = pitchfork_reissues.parse('response')
    print (reissues)

但是我收到以下错误：

pitchfork_reissues.parse('response')
  File "/Users/vitorpatalano/Documents/Code/Soup/Apps/myapp/blogs/blogs/spiders/pitchfork_reissues_feed.py", line 21, in parse
    for sel in response.xpath('//div[@class="album-artist"]'):
AttributeError: 'str' object has no attribute 'xpath'

我错过了什么？

Answer 1

您正在使用字符串文字调用parse：

reissues = pitchfork_reissues.parse('response')

我想那应该是一个变量名？像这样：

reissues = pitchfork_reissues.parse(response)

修改

Spider的parse方法需要scrapy.http.Response的实例作为它的第一个参数，而不是包含单词＆＃39; response＆＃39;的字符串文字。

我自己还没有使用过Scrapy，所以我只知道我在文档中看到的内容，但显然这样的响应实例通常由“下载器”创建。

您似乎试图在Scrapy常用工作流程之外调用Spider的parse方法。在这种情况下，我认为您负责创建此类响应，并在调用它的parse方法时将其传递给Spider。

Python - 从模块导入类的实例

1 个答案: