我需要读取床格式的文件,其中包含基因组中所有chr的坐标,根据chr读入不同的文件。我试过这种方法,但它不起作用,它不会创建任何文件。任何想法为什么会发生这种情况或解决这个问题的替代方法?
import sys
def make_out_file(dir_path, chr_name, extension):
file_name = dir_path + "/" + chr_name + extension
out_file = open(file_name, "w")
out_file.close()
return file_name
def append_output_file(line, out_file):
with open(out_file, "a") as f:
f.write(line)
f.close()
in_name = sys.argv[1]
dir_path = sys.argv[2]
with open(in_name, "r") as in_file:
file_content = in_file.readlines()
chr_dict = {}
out_file_dict = {}
line_count = 0
for line in file_content[:0]:
line_count += 1
elems = line.split("\t")
chr_name = elems[0]
chr_dict[chr_name] += 1
if chr_dict.get(chr_name) = 1:
out_file = make_out_file(dir_path, chr_name, ".bed")
out_file_dict[chr_name] = out_file
append_output_file(line, out_file)
elif chr_dict.get(chr_name) > 1:
out_file = out_file_dict.get(chr_name)
append_output_file(line, out_file)
else:
print "There's been an Error"
in_file.close()
答案 0 :(得分:1)
这一行:
for line in file_content[:0]:
说要迭代一个空列表。空列表来自切片[:0]
,它表示从列表的开头切换到第一个元素之前。这是一个演示:
>>> l = ['line 1\n', 'line 2\n', 'line 3\n']
>>> l[:0]
[]
>>> l[:1]
['line 1\n']
因为列表为空,所以不会发生迭代,因此for循环体中的代码不会被执行。
要遍历文件的每一行,您不需要切片:
for line in file_content:
但是,最好再次遍历文件对象,因为这不需要首先将整个文件读入内存:
with open(in_name, "r") as in_file:
chr_dict = {}
out_file_dict = {}
line_count = 0
for line in in_file:
...
接下来有很多问题,包括语法错误,for循环中的代码可以开始调试。