如何使用-a选项将参数传递给scrapy?

时间:2016-12-14 12:13:38

标签: python scrapy

我阅读了doc并发现命令行应该是这样的。 scrapy runspider getspecificimg.py -a ip='lizhe'

我的蜘蛛代码是这样的:

class GetImage(scrapy.Spider):
    name = 'ImageSpider'
    start_urls = ['https://www.pexels.com/']

# Get the input argument
    # NameNeedSearch = InputPara
    NameNeedSearch = ip

但我得到的结果意味着ip isn't defined why? 20161211162649.bmp

- 更新 - 我想传入一个变量,然后使用它来连接full url并将其用作start_url 我的代码是这样的:得到错误self is not defined为什么会这样?

class GetImage(scrapy.Spider):
    name = 'ImageSpider'
# Get the input argument
    NameNeedSearch = self.ip
    # startUrl = 'https://www.pexels.com/' + 
    start_urls = ['https://www.pexels.com/']

1 个答案:

答案 0 :(得分:1)

您需要在self类方法中使用GetImage编写代码,例如在开始抓取时调用的__init__start_requests

当框架调用时,这些方法将获得类实例本身的第一个参数,可用作方法签名中使用的传统self变量(它只是一个约定):

class GetImage(scrapy.Spider):
    name = 'ImageSpider'
    start_urls = ['https://www.pexels.com/']

    def start_requests(self):
        # self points to the spider instance
        # that was initialized by the scrapy framework when starting a crawl
        #
        # spider instances are "augmented" with crawl arguments
        # available as instance attributes,
        # self.ip has the (string) value passed on the command line
        # with `-a ip=somevalue`
        for url in self.start_urls:
            yield scrapy.Request(url+self.ip, dont_filter=True)