Question

我需要读取床格式的文件，其中包含基因组中所有chr的坐标，根据chr读入不同的文件。我试过这种方法，但它不起作用，它不会创建任何文件。任何想法为什么会发生这种情况或解决这个问题的替代方法？

import sys

def make_out_file(dir_path, chr_name, extension):

    file_name = dir_path + "/" + chr_name + extension
    out_file = open(file_name, "w")
    out_file.close()
    return file_name

def append_output_file(line, out_file):

    with open(out_file, "a") as f:
        f.write(line)
    f.close()

in_name = sys.argv[1]
dir_path = sys.argv[2]

with open(in_name, "r") as in_file:

    file_content = in_file.readlines()
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in file_content[:0]:
        line_count += 1
        elems = line.split("\t")
        chr_name = elems[0]
        chr_dict[chr_name] += 1
        if chr_dict.get(chr_name) = 1:
            out_file = make_out_file(dir_path, chr_name, ".bed")
            out_file_dict[chr_name] = out_file
            append_output_file(line, out_file)
        elif chr_dict.get(chr_name) > 1:
            out_file = out_file_dict.get(chr_name)
            append_output_file(line, out_file)
        else:
            print "There's been an Error"


in_file.close()

Answer 1

这一行：

for line in file_content[:0]:

说要迭代一个空列表。空列表来自切片[:0]，它表示从列表的开头切换到第一个元素之前。这是一个演示：

>>> l = ['line 1\n', 'line 2\n', 'line 3\n']
>>> l[:0]
[]
>>> l[:1]
['line 1\n']

因为列表为空，所以不会发生迭代，因此for循环体中的代码不会被执行。

要遍历文件的每一行，您不需要切片：

for line in file_content:

但是，最好再次遍历文件对象，因为这不需要首先将整个文件读入内存：

with open(in_name, "r") as in_file:    
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in in_file:
        ...

接下来有很多问题，包括语法错误，for循环中的代码可以开始调试。

根据原始文件的元素编写多个文件

1 个答案: