我有一个大文件如下例:
1 10161 10166 3
1 10166 10172 2
1 10172 10182 1
1 10183 10192 1
1 10193 10199 1
1 10212 10248 1
1 10260 10296 1
1 11169 11205 1
1 11336 11372 1
2 11564 11586 2
2 11586 11587 3
2 11587 11600 4
3 11600 11622 2
我想添加一个" chr"在每一行的开头,例如:
chr1 10161 10166 3
chr1 10166 10172 2
chr1 10172 10182 1
chr1 10183 10192 1
chr1 10193 10199 1
chr1 10212 10248 1
chr1 10260 10296 1
chr1 11169 11205 1
chr1 11336 11372 1
chr2 11564 11586 2
chr2 11586 11587 3
chr2 11587 11600 4
chr3 11600 11622 2
我在python中尝试了以下代码:
file = open("myfile.bg", "r")
for line in file:
newline = "chr" + line
out = open("outfile.bg", "w")
for new in newline:
out.write("n"+new)
但没有归还我想要的东西。你知道如何为此目的修复代码吗?
答案 0 :(得分:1)
您的代码的问题在于您迭代输入文件而不对您读取的数据做任何事情:
file = open("myfile.bg", "r")
for line in file:
newline = "chr" + line
最后一行将myfile.bg
中的每一行分配给newline
变量(一个字符串,前缀为'chr'
),每行覆盖以前的结果。
然后迭代newline
中的字符串(这将是输入文件中的最后一行,前面加'chr'
):
out = open("outfile.bg", "w")
for new in newline: # <== this iterates over a string, so `new` will be individual characters
out.write("n"+new) # this only writes 'n' before each character in newline
如果您只是这样做一次,例如在shell中,你可以使用one-liner:
open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])
更正确(特别是在程序中,你会关心打开文件句柄等)将是:
with open('myfile.bg') as infp:
lines = infp.readlines()
with open('outfile.bg', 'w') as outfp:
outfp.writelines(['chr' + line for line in lines])
如果文件真的大(接近可用内存的大小),您需要逐步处理它:
with open('myfile.bg') as infp:
with open('outfile.bg', 'w') as outfp:
for line in infp:
outfp.write('chr' + line)
(虽然比前两个版本慢得多......)
答案 1 :(得分:1)
完全赞同@rychaza,这是我使用你的代码的版本
file = open("myfile.bg", "r")
out = open("outfile.bg", "w")
for line in file:
out.write("chr" + line)
out.close()
file.close()
答案 2 :(得分:0)
问题是你正在迭代输入并为每一行重新设置相同的变量(newline
),然后打开一个文件进行输出并迭代newline
这是一个字符串,所以{{ 1}}将是该字符串中的每个字符。
我认为这样的事情应该是你正在寻找的东西:
new
在迭代文件时,with open('myfile.bg','rb') as file:
with open('outfile.bg','wb') as out:
for line in file:
out.write('chr' + line)
应该已包含尾随换行符。
当块结束时,line
语句将自动清理文件句柄。