Question

我有一个大文件如下例：

1   10161   10166   3
1   10166   10172   2
1   10172   10182   1
1   10183   10192   1
1   10193   10199   1
1   10212   10248   1
1   10260   10296   1
1   11169   11205   1
1   11336   11372   1
2   11564   11586   2
2   11586   11587   3
2   11587   11600   4
3   11600   11622   2

我想添加一个＆＃34; chr＆＃34;在每一行的开头，例如：

chr1    10161   10166   3
chr1    10166   10172   2
chr1    10172   10182   1
chr1    10183   10192   1
chr1    10193   10199   1
chr1    10212   10248   1
chr1    10260   10296   1
chr1    11169   11205   1
chr1    11336   11372   1
chr2    11564   11586   2
chr2    11586   11587   3
chr2    11587   11600   4
chr3    11600   11622   2

我在python中尝试了以下代码：

   file = open("myfile.bg", "r")
   for line in file: 
      newline = "chr" + line
   out = open("outfile.bg", "w")
   for new in newline:
      out.write("n"+new)

但没有归还我想要的东西。你知道如何为此目的修复代码吗？

Answer 1

您的代码的问题在于您迭代输入文件而不对您读取的数据做任何事情：

file = open("myfile.bg", "r")
for line in file: 
    newline = "chr" + line

最后一行将myfile.bg中的每一行分配给newline变量（一个字符串，前缀为'chr'），每行覆盖以前的结果。

然后迭代newline中的字符串（这将是输入文件中的最后一行，前面加'chr'）：

out = open("outfile.bg", "w")
for new in newline:       # <== this iterates over a string, so `new` will be individual characters
    out.write("n"+new)    # this only writes 'n' before each character in newline

如果您只是这样做一次，例如在shell中，你可以使用one-liner：

open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])

更正确（特别是在程序中，你会关心打开文件句柄等）将是：

with open('myfile.bg') as infp:
    lines = infp.readlines()
with open('outfile.bg', 'w') as outfp:
    outfp.writelines(['chr' + line for line in lines])

如果文件真的大（接近可用内存的大小），您需要逐步处理它：

with open('myfile.bg') as infp:
    with open('outfile.bg', 'w') as outfp:
        for line in infp:
            outfp.write('chr' + line)

（虽然比前两个版本慢得多......）

Answer 2

完全赞同@rychaza，这是我使用你的代码的版本

file = open("myfile.bg", "r")
out = open("outfile.bg", "w")
for line in file:
    out.write("chr" + line)
out.close()
file.close()

Answer 3

问题是你正在迭代输入并为每一行重新设置相同的变量（newline），然后打开一个文件进行输出并迭代newline这是一个字符串，所以{{ 1}}将是该字符串中的每个字符。

我认为这样的事情应该是你正在寻找的东西：

new

在迭代文件时，with open('myfile.bg','rb') as file: with open('outfile.bg','wb') as out: for line in file: out.write('chr' + line)应该已包含尾随换行符。

当块结束时，line语句将自动清理文件句柄。

在Python中修改文本文件中的每一行

3 个答案: