Question

我有一个如下文件。

0       0       0 
0.00254 0.00047 0.00089
0.54230 0.87300 0.74500 
0       0       0

我想修改此文件。如果值小于0.05，则值为1.否则，值为0。

运行python脚本后，该文件应该像

1       1        1
1       1        1
0       0        0
1       1        1

你能帮帮我吗？

Answer 1

好的，既然你是StackOverflow的新手（欢迎！），我会引导你完成这个。我假设您的文件名为test.txt。

with open("test.txt") as infile, open("new.txt", "w") as outfile:

打开我们需要的文件，输入文件和新的输出文件。 with语句确保在退出块后关闭文件。

    for line in infile:

逐行循环遍历文件。

        values = [float(value) for value in line.split()]

现在这更复杂了。每行包含以空格分隔的值。可以使用line.split()将这些字符串拆分为字符串列表。但它们仍然是字符串，因此必须先将它们转换为float。所有这一切都是通过列表理解来完成的。结果是，例如，在以这种方式处理第二行之后，values现在是以下列表：[0.00254, 0.00047, 0.00089]。

        results = ["1" if value < 0.05 else "0" for value in values]

现在我们正在创建一个名为results的新列表。每个元素对应一个values元素，如果"1"则为value < 0.05，如果不是"0"，则为outfile.write(" ".join(results))。

        outfile.write("\n")

将“整数字符串”列表转换回字符串，每个字符串间隔7个空格。

        results = ["1" if float(value) < 0.05 else "0" for value in line.split()]

添加换行符。完成。

如果您不介意额外的复杂性，可以将两个列表推导组合成一个：

{{1}}

Answer 2

如果你可以使用库我会建议numpy：

import numpy as np
myarray = np.genfromtxt("my_path_to_text_file.txt")
my_shape = myarray.shape()
out_array = np.where(my_array < 0.05, 1, 0)
np.savetxt(out_array)

您可以将格式化作为参数添加到savetxt函数。该函数的文档字符串非常自我解释。

如果你坚持使用纯python：

with open("my_path_to_text_file") as my_file:
    list_of_lines = my_file.readlines()
    list_of_lines = [[int( float(x) < 0.05) for x in line.split()] for line in list_of_lines]

然后根据需要将该列表写入文件。

Answer 3

您可以使用此代码

f_in=open("file_in.txt", "r")       #opens a file in the reading mode
in_lines=f_in.readlines()           #reads it line by line
out=[]
for line in in_lines:
    list_values=line.split()        #separate elements by the spaces, returning a list with the numbers as strings
    for i in range(len(list_values)):
        list_values[i]=eval(list_values[i])     #converts them to floats
#       print list_values[i],
        if list_values[i]<0.05:     #your condition
#           print ">>", 1
            list_values[i]=1
        else:
#           print ">>", 0
            list_values[i]=0
    out.append(list_values)         #stores the numbers in a list, where each list corresponds to a lines' content
f_in.close()                        #closes the file

f_out=open("file_out.txt", "w")     #opens a new file in the writing mode
for cur_list in out:
    for i in cur_list:
        f_out.write(str(i)+"\t")    #writes each number, plus a tab
    f_out.write("\n")               #writes a newline
f_out.close()                       #closes the file

Answer 4

以下代码就地执行替换：为此，文件以'rb+'模式打开。绝对必须以二进制模式b打开它。 +中的'rb+'表示可以写入和读取文件。请注意，该模式也可以写为'r+b'。

但使用'rb+'很尴尬：

如果您使用for line in f读取，则文件由块读取，并且几行保存在缓冲区中，它们实际上是一个接一个地读取，直到读取并加载另一块数据缓冲区。这使得执行转换变得更加困难，因为必须在tell()的帮助下跟随文件指针的位置并用seek()移动指针，实际上我还没有完全理解它是如何必须的完成。
。
令人高兴的是，有一个replace()的解决方案，因为，我不知道为什么，但我相信事实，当readline()读取一行时，文件的指针不会在磁盘上比线的末端（也就是说它在换行符处停止）现在可以轻松移动并知道文件指针的位置
在阅读后进行写作，有必要使seek()被执行，即使应该执行seek(0,1)，这意味着从实际位置移动0个角色。这必须改变文件指针的状态，就像那样。

嗯，对于您的问题，代码如下：

import re
from os import fsync
from os.path import getsize

reg = re.compile('[\d.]+')

def ripl(m):
    g = m.group()
    return ('1' if float(g)<0.5 else '0').ljust(len(g))

path = ...........'

print 'length of file before : %d' % getsize(path)

with open('Copie de tixti.txt','rb+') as f:
    line = 'go'
    while line:
        line = f.readline()
        lg = len(line)
        f.seek(-lg,1)
        f.write(reg.sub(ripl,line))
        f.flush()
        fsync(f.fileno())

print 'length of file after : %d' % getsize(path)

必须执行

flush()和fsync()以确保指令f.write(reg.sub(ripl,line))在订购时有效地写入。

请注意，我从未管理过以unicode编码的文件。由于每个unicode字符都是在几个字节上编码的（在UTF8的情况下，根据字符的不同，字节数可变），这当然更加困难。

Python：读取一个文件并在一定条件下逐行替换它

4 个答案: