Scrapy:如何对多个搜索词使用参数

时间:2019-12-17 22:47:56

标签: python web-scraping scrapy web-crawler

我正在研究scrapy,现在我试图搜索不同的关键字,并从命令行工具传递参数。 基本上,我想定义一个关键字,搜寻器应搜索包含该关键字的URL。 这是我的命令行外观:

scrapy crawl myfirst -a nombre="Vermont"

这是我的爬虫:

class myfirstSpider(CrawlSpider):
    name = 'myfirst'
    allowed_domains= ["leroymerlin.es"]
    start_urls = ["https://www.leroymerlin.es/decoracion-navidena/arboles-navidad?index=%s" % (page_number) for page_number in range(2)]    

    def __init__(self, nombre=None, *args, **kwargs):
        super(myfirstSpider, self).__init__(*args, **kwargs)    
        rules = (
        Rule(LinkExtractor(allow= r'/fp/\*nombre*',), callback = 'parse_item'),)

    def parse_item(self, response):
        items = myfirstItem()

        product_name = response.css ('.titleTechniqueSheet::text').extract()

        items['product_name'] = product_name

        yield items

不幸的是,它不起作用... 欢迎帮助,谢谢!

我找到了方法!它对我有用:

class myfirstSpider(CrawlSpider):
    name = 'myfirst'
    allowed_domains= ["leroymerlin.es"]
    start_urls = ["https://www.leroymerlin.es/decoracion-navidena/arboles-navidad?index=%s" % (page_number) for page_number in range(2)]    

    def __init__(self, nombre=None, *args, **kwargs):

        self.rules = (
        Rule(LinkExtractor(allow= nombre), callback = 'parse_item'),)
        super(myfirstSpider, self).__init__(*args, **kwargs)

    def parse_item(self, response):
        items = myfirstItem()

        product_name = response.css ('.titleTechniqueSheet::text').extract()

        items['product_name'] = product_name

        yield items

命令:

scrapy crawl myfirst -a nombre="vermont"

谢谢大家!

0 个答案:

没有答案