蟒蛇;在所有子文件夹中运行scipt

时间:2017-03-24 16:21:36

标签: python

我有一个python脚本,如果我在包含我需要的文件的文件夹中运行它,它可以正常工作。但我想更改脚本,它将进入每个子文件夹,使用每个子文件夹的文件,并在每个子文件夹中写入输出文件。 我已经阅读了关于os.walk等的内容,但我不明白如何更改我的脚本,os.walk将起作用。请帮我。 脚本如下:

d1 = {}
with open('genes.gff.genespercontig.csv', 'r') as f:
for line in f:
        tok = line.split()
        d1[tok[1]] = int(float(tok[0]))

d2 = {}
with open('hmmer.analyze.txt.result.txt', 'r') as f2:
    for line in f2:
        tak = line.split()
        d2[tak[1]] = int(float(tak[0]))

from itertools import chain
from collections import defaultdict
d3 = defaultdict(list)
for k, v in chain(d1.items(), d2.items()):
    d3[k].append(v)

import csv    
with open('output_contigsvsgenes.csv', 'w') as f:
    writer = csv.writer(f)    
    for k,v in d3.items():
        writer.writerow([k] + v)

1 个答案:

答案 0 :(得分:3)

如果您知道您尝试阅读的每个文件都可以在树中的每个目录中使用,那么您只需将当前脚本包装在os.walk块中:

import os

for root, dirs, files in os.walk('.'):
    d1 = {}
    with open(os.path.join(root, 'genes.gff.genespercontig.csv'), 'r') as f:
        for line in f:
            tok = line.split()
            d1[tok[1]] = int(float(tok[0]))

    d2 = {}
    with open(os.path.join(root, 'hmmer.analyze.txt.result.txt'), 'r') as f2:
        for line in f2:
            tak = line.split()
            d2[tak[1]] = int(float(tak[0]))

    from itertools import chain
    from collections import defaultdict
    d3 = defaultdict(list)
    for k, v in chain(d1.items(), d2.items()):
        d3[k].append(v)

    import csv
    with open(os.path.join(root, 'output_contigsvsgenes.csv'), 'w') as f:
        writer = csv.writer(f)
        for k,v in d3.items():
            writer.writerow([k] + v)

否则,您需要防范可能不存在您要查找的文件的情况。由于您似乎需要两个文件中的值来创建输出,所以将整个事物包装在try块中可能很好:

import os

for root, dirs, files in os.walk('.'):
    try:
        d1 = {}
        with open(os.path.join(root, 'genes.gff.genespercontig.csv'), 'r') as f:
            for line in f:
                tok = line.split()
                d1[tok[1]] = int(float(tok[0]))

        d2 = {}
        with open(os.path.join(root, 'hmmer.analyze.txt.result.txt'), 'r') as f2:
            for line in f2:
                tak = line.split()
                d2[tak[1]] = int(float(tak[0]))

        from itertools import chain
        from collections import defaultdict
        d3 = defaultdict(list)
        for k, v in chain(d1.items(), d2.items()):
            d3[k].append(v)

        import csv
        with open(os.path.join(root, 'output_contigsvsgenes.csv'), 'w') as f:
            writer = csv.writer(f)
            for k,v in d3.items():
                writer.writerow([k] + v)
    except:
        print traceback.format_exc()

如果您想单独处理单个文件,则可以修改上述内容以处理存在一个文件但不存在另一个文件的情况。