我阅读了doc并发现命令行应该是这样的。
scrapy runspider getspecificimg.py -a ip='lizhe'
我的蜘蛛代码是这样的:
class GetImage(scrapy.Spider):
name = 'ImageSpider'
start_urls = ['https://www.pexels.com/']
# Get the input argument
# NameNeedSearch = InputPara
NameNeedSearch = ip
但我得到的结果意味着ip isn't defined why?
20161211162649.bmp
- 更新 -
我想传入一个变量,然后使用它来连接full url
并将其用作start_url
我的代码是这样的:得到错误self is not defined
为什么会这样?
class GetImage(scrapy.Spider):
name = 'ImageSpider'
# Get the input argument
NameNeedSearch = self.ip
# startUrl = 'https://www.pexels.com/' +
start_urls = ['https://www.pexels.com/']
答案 0 :(得分:1)
您需要在self
类方法中使用GetImage
编写代码,例如在开始抓取时调用的__init__
或start_requests
。
当框架调用时,这些方法将获得类实例本身的第一个参数,可用作方法签名中使用的传统self
变量(它只是一个约定):
class GetImage(scrapy.Spider):
name = 'ImageSpider'
start_urls = ['https://www.pexels.com/']
def start_requests(self):
# self points to the spider instance
# that was initialized by the scrapy framework when starting a crawl
#
# spider instances are "augmented" with crawl arguments
# available as instance attributes,
# self.ip has the (string) value passed on the command line
# with `-a ip=somevalue`
for url in self.start_urls:
yield scrapy.Request(url+self.ip, dont_filter=True)