Question

我有一个.csv文件匹配表名到类别，我想用它来合并文件夹中的任何文件（如cat），其名称对应于.csv中的Sample_Name列，根据类别，更改最终文件＆＃39;每个类别的名称。

文件夹中的待合并文件不是.csv;它们是一种.fasta文件。

.csv如下所示（将有更多列将被忽略）：

 Sample_Name     Category
 1               a
 2               a
 3               a
 4               b
 5               b

合并后，输出应该是两个文件：a（样本1,2,3合并）和b（样本4和5）。

我们的想法是让这项工作适用于大量文件和类别。

感谢您的帮助！

Answer 1

假设文件在输入CSV文件中按顺序排列，这很简单：

from operator import itemgetter

fields = itemgetter(0, 1)    # zero-based field numbers of the fields of interest
with open('sample_categories.csv') as csvfile:
    next(csvfile)     # skip over header line
    for line in csvfile:
        filename, category = fields(line.split())
        with open(filename) as infile, open(category, 'a') as outfile:
            outfile.write(infile.read())

这样做的一个缺点是为每个输入文件重新打开输出文件。如果每个类别有大量文件，这可能是一个问题。如果这确实是一个实际问题，那么您可以尝试这个，只要该类别中有输入文件，就会保持输出文件处于打开状态。

from operator import itemgetter

fields = itemgetter(0, 1)    # zero-based field numbers of the fields of interest
with open('sample_categories.csv') as csvfile:
    next(csvfile)     # skip over header line
    current_category = None
    outfile = None
    for line in csvfile:
        filename, category = fields(line.split())
        if category != current_category:
            if outfile is not None:
                outfile.close()
            outfile = open(category, 'w')
            current_category = category
        with open(filename) as infile:
            outfile.write(infile.read())

Answer 2

我会建立一个字典，其中包含类别和相应样本名称列表值的键。

d = {'a':['1','2','3'], 'b':['4','5']}

您可以通过读取csv文件并逐行构建字典来直接实现此目的，即

d = {}
with open('myfile.csv'):
    for line in myfile.csv: 
        samp,cat = line.split()
        try: 
            d[cat].append(samp)
        except KeyError:           # if there is no entry for cat, we will get a KeyError
            d[cat] = [samp,]

有关更复杂的方法，请查看collections。

此数据库准备就绪后，您可以从一个类别到另一个类别创建新文件：

for cat in d:
    with open(cat,'w') as outfile:
         for sample in d[cat]:
             # copy sample file content to outfile

将一个文件的内容复制到另一个文件可以通过多种方式完成，请参阅this thread。

通过外部表中的相应列连接文件

2 个答案: