将值从解析传递到管道初始化

时间:2019-06-15 23:43:43

标签: python scrapy

class AmazonSpider(scrapy.Spider):
    name = 'Amazon'
    start_urls = ['https://www.amazon.com/s?me=A39K8Q77DNOTN8&marketplaceID=ATVPDKIKX0DER']
    def parse(self, response):
        file_name = response.xpath('//title/text()').extract_first().replace(' @ Amazon.com: ','')
        #code

    def parse_more(self,response):
         #code
         yield item

pipline.py

    def __init__(self):
        now = datetime.datetime.now()
        self.current_date = now.strftime("%d%b")
        self.file_name = "test" #file_name
        self.infile = open("{}_{}.csv".format(self.current_date,self.file_name),"w")
        self.dict_writer = csv.DictWriter(self.infile)
        self.dict_writer.writeheader()


    def process_item(self, item, spider):
        self.dict_writer.writerow(item)
        #return item

如何将响应中的文件名(解析中)传递给pipleline的__init__(即file_name来自解析,我希望它成为管道中的文件名)

1 个答案:

答案 0 :(得分:0)

在管道中,您必须使 init 读取arg,例如:

变态(不是强制性的)

class AmazonfullPipeline(object):
    def __init__(self,file_name=None):
        self.file_name = file_name
        ...

arg(必填)

class AmazonfullPipeline(object):
    def __init__(self,file_name):
        self.file_name = file_name
        ...

例如: 在其他文件中导入

import filename.AmazonfullPipeline

class AmazonSpider(scrapy.Spider):
    name = 'Amazon'
    start_urls = ['https://www.amazon.com/s?me=A39K8Q77DNOTN8&marketplaceID=ATVPDKIKX0DER']
    def parse(self, response):
        file_name = response.xpath('//title/text()').extract_first().replace(' @ Amazon.com: ','')
        do_something =  AmazonfullPipeline(file_name)