我有一个文件夹(split_libs),其子文件夹是根据SraRunTable3.txt第9和32列中描述的sample_name命名的,每个子文件夹都与sra_study相关联。在每个子文件夹中都有一个seqs.fna文件,遗憾的是我无法更改名称 - 它是QIIME命令的输出。
我想通过读取子文件夹名称(= sample_name),根据sra_study合并子文件夹中的seqs.fna文件。例如来自同一SRA研究的所有seqs.fna都将合并。
目录的示例概述:
split_libs
sample1
seqs.fna
sample2
seqs.fna
sample3
seqs.fna
SraRunTable的示例概述:
(...)Sample_Name(...)SRA_Study(...)
sample_1 study_1
sample_2 study_1
sample_3 study_2
这是我迄今为止所做的尝试:
import os
from operator import itemgetter
fields = itemgetter(9, 32)
with open('/home/andre/Desktop/PRJEB0000/SraRunTable3.txt') as csvfile:
next(csvfile)
for line in csvfile:
sample_name, sra_study = fields(line.split())
for folder in os.listdir('./split_libs'):
if folder == sample_name:
open('seqs.fna') as infile, open('/home/andre/Desktop/PRJEB0000/cat_fna/' + sra_study + ".fna", 'a') as outfile:
outfile.write(infile.read())
这个问题从Joining files by corresponding columns in outside table
分离出来任何贡献都将不胜感激!
答案 0 :(得分:0)
import os
from operator import itemgetter
fields = itemgetter(9, 32)
with open('/home/andre/Desktop/PRJEB0000/SraRunTable3.txt') as csvfile:
next(csvfile)
for line in csvfile:
sample_name, sra_study = fields(line.split())
#open the folder corresponding to sample_name and add the seqs to the appropriate study file
with open('split_libs/'+sample_name+'/seqs.fna') as infile, open('/home/andre/Desktop/PRJEB0000/cat_fna/' + sra_study + ".fna", 'a') as outfile:
outfile.write(infile.read())
All credits to Amanda Clare (not registered on Stackoverflow)!