Question

我用parse()创建了这个类：

class PitchforkSpider(scrapy.Spider):
    name = "pitchfork_reissues"
    allowed_domains = ["pitchfork.com"]
    #creates objects for each URL listed here
    start_urls = [
                    "http://pitchfork.com/reviews/best/reissues/?page=1",
                    "http://pitchfork.com/reviews/best/reissues/?page=2",
                    "http://pitchfork.com/reviews/best/reissues/?page=3",
    ]

    def parse(self, response):

        items = []

        for sel in response.xpath('//div[@class="album-artist"]'):
            item = PitchforkItem()
            item['artist'] = sel.xpath('//ul[@class="artist-list"]/li/text()').extract()
            item['reissue'] = sel.xpath('//h2[@class="title"]/text()').extract()
            items.append(item)

        return items

从其他脚本，我导入上述module所属的class：

from blogs.spiders.pitchfork_reissues_feed import *

并且，实例化class，我尝试调用parse()方法：

def reissues():

    pitchfork_reissues = PitchforkSpider()
    albums = pitchfork_reissues.parse(response)
    print (albums)

但是我收到以下错误：

    reissues = pitchfork_reissues.parse(response)
NameError: global name 'response' is not defined

显然，parse()方法需要scrapy.http.Response的实例。 如何在reissues()内的第二个脚本的上下文中创建此类实例？

Answer 1

from scrapy.http import Response

response = Response(body=u'html here')

现在我认为你不能以这种方式抓取，因为它不是Scrapy应该如何工作，但你仍然可以创建Response对象

Scrapy - 将蜘蛛称为其他脚本的方法

1 个答案: