我有一个.csv文件匹配表名到类别,我想用它来合并文件夹中的任何文件(如cat),其名称对应于.csv中的Sample_Name列,根据类别,更改最终文件& #39;每个类别的名称。
文件夹中的待合并文件不是.csv;它们是一种.fasta文件。
.csv如下所示(将有更多列将被忽略):
Sample_Name Category
1 a
2 a
3 a
4 b
5 b
合并后,输出应该是两个文件:a(样本1,2,3合并)和b(样本4和5)。
我们的想法是让这项工作适用于大量文件和类别。
感谢您的帮助!
答案 0 :(得分:1)
假设文件在输入CSV文件中按顺序排列,这很简单:
from operator import itemgetter
fields = itemgetter(0, 1) # zero-based field numbers of the fields of interest
with open('sample_categories.csv') as csvfile:
next(csvfile) # skip over header line
for line in csvfile:
filename, category = fields(line.split())
with open(filename) as infile, open(category, 'a') as outfile:
outfile.write(infile.read())
这样做的一个缺点是为每个输入文件重新打开输出文件。如果每个类别有大量文件,这可能是一个问题。如果这确实是一个实际问题,那么您可以尝试这个,只要该类别中有输入文件,就会保持输出文件处于打开状态。
from operator import itemgetter
fields = itemgetter(0, 1) # zero-based field numbers of the fields of interest
with open('sample_categories.csv') as csvfile:
next(csvfile) # skip over header line
current_category = None
outfile = None
for line in csvfile:
filename, category = fields(line.split())
if category != current_category:
if outfile is not None:
outfile.close()
outfile = open(category, 'w')
current_category = category
with open(filename) as infile:
outfile.write(infile.read())
答案 1 :(得分:0)
我会建立一个字典,其中包含类别和相应样本名称列表值的键。
d = {'a':['1','2','3'], 'b':['4','5']}
您可以通过读取csv文件并逐行构建字典来直接实现此目的,即
d = {}
with open('myfile.csv'):
for line in myfile.csv:
samp,cat = line.split()
try:
d[cat].append(samp)
except KeyError: # if there is no entry for cat, we will get a KeyError
d[cat] = [samp,]
有关更复杂的方法,请查看collections。
此数据库准备就绪后,您可以从一个类别到另一个类别创建新文件:
for cat in d:
with open(cat,'w') as outfile:
for sample in d[cat]:
# copy sample file content to outfile
将一个文件的内容复制到另一个文件可以通过多种方式完成,请参阅this thread。