class AmazonSpider(scrapy.Spider):
name = 'Amazon'
start_urls = ['https://www.amazon.com/s?me=A39K8Q77DNOTN8&marketplaceID=ATVPDKIKX0DER']
def parse(self, response):
file_name = response.xpath('//title/text()').extract_first().replace(' @ Amazon.com: ','')
#code
def parse_more(self,response):
#code
yield item
pipline.py
def __init__(self):
now = datetime.datetime.now()
self.current_date = now.strftime("%d%b")
self.file_name = "test" #file_name
self.infile = open("{}_{}.csv".format(self.current_date,self.file_name),"w")
self.dict_writer = csv.DictWriter(self.infile)
self.dict_writer.writeheader()
def process_item(self, item, spider):
self.dict_writer.writerow(item)
#return item
如何将响应中的文件名(解析中)传递给pipleline的__init__
(即file_name来自解析,我希望它成为管道中的文件名)
答案 0 :(得分:0)
在管道中,您必须使 init 读取arg,例如:
变态(不是强制性的)
class AmazonfullPipeline(object):
def __init__(self,file_name=None):
self.file_name = file_name
...
arg(必填)
class AmazonfullPipeline(object):
def __init__(self,file_name):
self.file_name = file_name
...
例如: 在其他文件中导入
import filename.AmazonfullPipeline
class AmazonSpider(scrapy.Spider):
name = 'Amazon'
start_urls = ['https://www.amazon.com/s?me=A39K8Q77DNOTN8&marketplaceID=ATVPDKIKX0DER']
def parse(self, response):
file_name = response.xpath('//title/text()').extract_first().replace(' @ Amazon.com: ','')
do_something = AmazonfullPipeline(file_name)