Question

我阅读了doc并发现命令行应该是这样的。 scrapy runspider getspecificimg.py -a ip='lizhe'

我的蜘蛛代码是这样的：

class GetImage(scrapy.Spider):
    name = 'ImageSpider'
    start_urls = ['https://www.pexels.com/']

# Get the input argument
    # NameNeedSearch = InputPara
    NameNeedSearch = ip

但我得到的结果意味着ip isn't defined why? 20161211162649.bmp

- 更新 - 我想传入一个变量，然后使用它来连接full url并将其用作start_url 我的代码是这样的：得到错误self is not defined为什么会这样？

class GetImage(scrapy.Spider):
    name = 'ImageSpider'
# Get the input argument
    NameNeedSearch = self.ip
    # startUrl = 'https://www.pexels.com/' + 
    start_urls = ['https://www.pexels.com/']

Answer 1

您需要在self类方法中使用GetImage编写代码，例如在开始抓取时调用的__init__或start_requests。

当框架调用时，这些方法将获得类实例本身的第一个参数，可用作方法签名中使用的传统self变量（它只是一个约定）：

class GetImage(scrapy.Spider):
    name = 'ImageSpider'
    start_urls = ['https://www.pexels.com/']

    def start_requests(self):
        # self points to the spider instance
        # that was initialized by the scrapy framework when starting a crawl
        #
        # spider instances are "augmented" with crawl arguments
        # available as instance attributes,
        # self.ip has the (string) value passed on the command line
        # with `-a ip=somevalue`
        for url in self.start_urls:
            yield scrapy.Request(url+self.ip, dont_filter=True)

如何使用-a选项将参数传递给scrapy？

1 个答案: