Question

我的linux机器上的目录中有大约125个文件。我有一个名为annotate.py的脚本它接收一个文件并向列添加功能。基本上我能够放置125个文件之一的文件名并运行annotate.py脚本，但这不是有效的编程。

所有125个文件在列名和列号方面具有相似的格式。那么有人可以告诉我如何在所有125个文件上运行annotate.py吗？

annotate.py合并染色体和位置列上的两个文件。但是我希望input_file1是一次读入的所有125个文件，并与input_file2合并。输出应该是不同的文件，每个文件都带有原始输入文件1的名称。

#!/usr/bin/python
#python snp_search.py  input_file1 input_file2
import numpy as np
import pandas as pd

snp_f=pd.read_table('input_file1.txt', sep="\t", header=None)#input_file1
snp_f.columns=['chr','pos']
lsnp_f=pd.read_table('input2_snpsearch.txt', sep="\t", header=True)#input_file2
lsnp_f.columns=['snpid','chr','pos']
final_snp=pd.merge(snp_f,lsnp_f, on=['chr','pos'])
final_snp.to_csv('input_file1_annotated.txt', index=False,sep='\t')

请帮忙！谢谢！

Answer 1

os模块是您的朋友http://docs.python.org/2/library/os.html。基本的想法是import os并使用os.listdir()来获取您感兴趣的目录中的文件列表。以下内容将起作用。

import numpy as np
import pandas as pd
import os


input_file2 = 'input2_snpssearch.txt'
input_dir = './' #or any other path
files = os.lisdir(input_dir) #listdir will give the file names

#you probably don't want to merge your input_file2 with itself and
#in this case it's in the same directory as the other files so
#filter it out.
files_of_interest = (f for f in files if f != input_file2)

for f in files_of_interest:
    full_name = os.path.join(input_dir, f) #necessary if input_dir is not './'
    snp_f=pd.read_table(full_name, sep="\t", header=None)#input_file1
    snp_f.columns=['chr','pos']
    lsnp_f=pd.read_table(input_file2, sep="\t", header=True)#input_file2
    lsnp_f.columns=['snpid','chr','pos']
    final_snp=pd.merge(snp_f,lsnp_f, on=['chr','pos'])
    new_fname = f.split('.')[0] + '_annotated.txt'
    final_snp.to_csv(os.path.join(input_dir, new_fname), index=False,sep='\t')

将目录文件自动加载到python脚本中

1 个答案: