当我启动scrapy蜘蛛时,如何创建一组文件:
year1.csv
year2.csv
year3.csv
如果文件存在且文件中包含内容,也要清除该文件。
在解析过程中,根据scrapy结果导出到每个文件:
def parse(self,response):
if response.css('#Contact1'):
yield{
'Name': response.css('#ContactName1 a::text').extract_first()
}
if response.css('#Contact1').extract_first() is "1":
export to year1.csv
if response.css('#Contact1').extract_first() is "2":
export to year2.csv
if response.css('#Contact1').extract_first() is "2":
export to year3.csv
答案 0 :(得分:0)
您可以使用管道来执行此操作。这是官方文件:https://doc.scrapy.org/en/latest/topics/item-pipeline.html
这是我将如何去做。 我会为不同的文档创建一个不同的项目
item.py
class Year1Item():
name = scrapy.field()
class Year2Item():
name = scrapy.field()
class Year3Item():
name = scrapy.field()
然后在你的蜘蛛文件中你可以做到这一点
def parse(self,response):
if response.css('#Contact1'):
if response.css('#Contact1').extract_first() is "1":
item = Year1Item()
if response.css('#Contact1').extract_first() is "2":
item = Year2Item()
if response.css('#Contact1').extract_first() is "2":
item = Year3Item()
item['Name'] = response.css('#ContactName1 a::text').extract_first()
return item
然后在你的pipeline.py文件中
def process_item(self, item, spider):
if isinstance(item,Year1Item):
export to year1.csv
if isinstance(item,Year2Item):
export to year2.csv
if isinstance(item,Year3Item):
export to year3.csv
在您的管道文件中,您可以拥有一个在蜘蛛打开时运行的功能
def open_spider(self,spider):
#maybe here you could use python to check if the files already exist and delete them if they do