通过外部表中的相应列连接文件

时间:2016-03-01 09:30:37

标签: python csv merge

我有一个.csv文件匹配表名到类别,我想用它来合并文件夹中的任何文件(如cat),其名称对应于.csv中的Sample_Name列,根据类别,更改最终文件& #39;每个类别的名称。

文件夹中的待合并文件不是.csv;它们是一种.fasta文件。

.csv如下所示(将有更多列将被忽略):

 Sample_Name     Category
 1               a
 2               a
 3               a
 4               b
 5               b

合并后,输出应该是两个文件:a(样本1,2,3合并)和b(样本4和5)。

我们的想法是让这项工作适用于大量文件和类别。

感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

假设文件在输入CSV文件中按顺序排列,这很简单:

from operator import itemgetter

fields = itemgetter(0, 1)    # zero-based field numbers of the fields of interest
with open('sample_categories.csv') as csvfile:
    next(csvfile)     # skip over header line
    for line in csvfile:
        filename, category = fields(line.split())
        with open(filename) as infile, open(category, 'a') as outfile:
            outfile.write(infile.read())

这样做的一个缺点是为每个输入文件重新打开输出文件。如果每个类别有大量文件,这可能是一个问题。如果这确实是一个实际问题,那么您可以尝试这个,只要该类别中有输入文件,就会保持输出文件处于打开状态。

from operator import itemgetter

fields = itemgetter(0, 1)    # zero-based field numbers of the fields of interest
with open('sample_categories.csv') as csvfile:
    next(csvfile)     # skip over header line
    current_category = None
    outfile = None
    for line in csvfile:
        filename, category = fields(line.split())
        if category != current_category:
            if outfile is not None:
                outfile.close()
            outfile = open(category, 'w')
            current_category = category
        with open(filename) as infile:
            outfile.write(infile.read())

答案 1 :(得分:0)

我会建立一个字典,其中包含类别和相应样本名称列表值的键。

d = {'a':['1','2','3'], 'b':['4','5']}

您可以通过读取csv文件并逐行构建字典来直接实现此目的,即

d = {}
with open('myfile.csv'):
    for line in myfile.csv: 
        samp,cat = line.split()
        try: 
            d[cat].append(samp)
        except KeyError:           # if there is no entry for cat, we will get a KeyError
            d[cat] = [samp,]

有关更复杂的方法,请查看collections

此数据库准备就绪后,您可以从一个类别到另一个类别创建新文件:

for cat in d:
    with open(cat,'w') as outfile:
         for sample in d[cat]:
             # copy sample file content to outfile

将一个文件的内容复制到另一个文件可以通过多种方式完成,请参阅this thread