根据原始文件的元素编写多个文件

时间:2016-07-25 12:16:28

标签: python file

我需要读取床格式的文件,其中包含基因组中所有chr的坐标,根据chr读入不同的文件。我试过这种方法,但它不起作用,它不会创建任何文件。任何想法为什么会发生这种情况或解决这个问题的替代方法?

import sys

def make_out_file(dir_path, chr_name, extension):

    file_name = dir_path + "/" + chr_name + extension
    out_file = open(file_name, "w")
    out_file.close()
    return file_name

def append_output_file(line, out_file):

    with open(out_file, "a") as f:
        f.write(line)
    f.close()

in_name = sys.argv[1]
dir_path = sys.argv[2]

with open(in_name, "r") as in_file:

    file_content = in_file.readlines()
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in file_content[:0]:
        line_count += 1
        elems = line.split("\t")
        chr_name = elems[0]
        chr_dict[chr_name] += 1
        if chr_dict.get(chr_name) = 1:
            out_file = make_out_file(dir_path, chr_name, ".bed")
            out_file_dict[chr_name] = out_file
            append_output_file(line, out_file)
        elif chr_dict.get(chr_name) > 1:
            out_file = out_file_dict.get(chr_name)
            append_output_file(line, out_file)
        else:
            print "There's been an Error"


in_file.close()

1 个答案:

答案 0 :(得分:1)

这一行:

for line in file_content[:0]:

说要迭代一个空列表。空列表来自切片[:0],它表示从列表的开头切换到第一个元素之前。这是一个演示:

>>> l = ['line 1\n', 'line 2\n', 'line 3\n']
>>> l[:0]
[]
>>> l[:1]
['line 1\n']

因为列表为空,所以不会发生迭代,因此for循环体中的代码不会被执行。

要遍历文件的每一行,您不需要切片:

for line in file_content:

但是,最好再次遍历文件对象,因为这不需要首先将整个文件读入内存:

with open(in_name, "r") as in_file:    
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in in_file:
        ...

接下来有很多问题,包括语法错误,for循环中的代码可以开始调试。