我正在研究scrapy,现在我试图搜索不同的关键字,并从命令行工具传递参数。 基本上,我想定义一个关键字,搜寻器应搜索包含该关键字的URL。 这是我的命令行外观:
scrapy crawl myfirst -a nombre="Vermont"
这是我的爬虫:
class myfirstSpider(CrawlSpider):
name = 'myfirst'
allowed_domains= ["leroymerlin.es"]
start_urls = ["https://www.leroymerlin.es/decoracion-navidena/arboles-navidad?index=%s" % (page_number) for page_number in range(2)]
def __init__(self, nombre=None, *args, **kwargs):
super(myfirstSpider, self).__init__(*args, **kwargs)
rules = (
Rule(LinkExtractor(allow= r'/fp/\*nombre*',), callback = 'parse_item'),)
def parse_item(self, response):
items = myfirstItem()
product_name = response.css ('.titleTechniqueSheet::text').extract()
items['product_name'] = product_name
yield items
不幸的是,它不起作用... 欢迎帮助,谢谢!
我找到了方法!它对我有用:
class myfirstSpider(CrawlSpider):
name = 'myfirst'
allowed_domains= ["leroymerlin.es"]
start_urls = ["https://www.leroymerlin.es/decoracion-navidena/arboles-navidad?index=%s" % (page_number) for page_number in range(2)]
def __init__(self, nombre=None, *args, **kwargs):
self.rules = (
Rule(LinkExtractor(allow= nombre), callback = 'parse_item'),)
super(myfirstSpider, self).__init__(*args, **kwargs)
def parse_item(self, response):
items = myfirstItem()
product_name = response.css ('.titleTechniqueSheet::text').extract()
items['product_name'] = product_name
yield items
命令:
scrapy crawl myfirst -a nombre="vermont"
谢谢大家!