通过外部表合并子目录中的文件

时间:2016-03-03 10:33:29

标签: python csv merge

我有一个文件夹(split_libs),其子文件夹是根据SraRunTable3.txt第9和32列中描述的sample_name命名的,每个子文件夹都与sra_study相关联。在每个子文件夹中都有一个seqs.fna文件,遗憾的是我无法更改名称 - 它是QIIME命令的输出。

我想通过读取子文件夹名称(= sample_name),根据sra_study合并子文件夹中的seqs.fna文件。例如来自同一SRA研究的所有seqs.fna都将合并。

目录的示例概述:

split_libs
    sample1
      seqs.fna
    sample2
      seqs.fna
    sample3
      seqs.fna

SraRunTable的示例概述:

(...)Sample_Name(...)SRA_Study(...)
     sample_1        study_1
     sample_2        study_1 
     sample_3        study_2

这是我迄今为止所做的尝试:

import os
from operator import itemgetter

fields = itemgetter(9, 32)

with open('/home/andre/Desktop/PRJEB0000/SraRunTable3.txt') as csvfile:
next(csvfile)
for line in csvfile:
    sample_name, sra_study = fields(line.split())
for folder in os.listdir('./split_libs'):
    if folder == sample_name:
        open('seqs.fna') as infile, open('/home/andre/Desktop/PRJEB0000/cat_fna/' + sra_study + ".fna", 'a') as outfile:
            outfile.write(infile.read())

这个问题从Joining files by corresponding columns in outside table

分离出来

任何贡献都将不胜感激!

1 个答案:

答案 0 :(得分:0)

import os
from operator import itemgetter

fields = itemgetter(9, 32)

with open('/home/andre/Desktop/PRJEB0000/SraRunTable3.txt') as csvfile:
next(csvfile)
for line in csvfile:
    sample_name, sra_study = fields(line.split())
    #open the folder corresponding to sample_name and add the seqs to the appropriate study file
    with open('split_libs/'+sample_name+'/seqs.fna') as infile, open('/home/andre/Desktop/PRJEB0000/cat_fna/' + sra_study + ".fna", 'a') as outfile:
            outfile.write(infile.read())

All credits to Amanda Clare (not registered on Stackoverflow)!