Question

我有一个简单的文本文件，其中包含ASCII文本中的数字，按照此示例用空格分隔。

150604849   
319865.301865 5810822.964432 -96.425797 -1610
319734.172256 5810916.074753 -52.490280 -122
319730.912949 5810918.098465 -61.864395 -171
319688.240891 5810889.851608 -0.339890 -1790
*<continues like this for millions of lines>*

基本上我想按原样复制第一行，然后对于所有后续行，我想要偏移第一个值（x），偏移第二个值（y），保持第三个值不变，偏移量和最后一个数字的一半

我拼凑了以下代码作为python学习经验（道歉如果它粗暴和冒犯，真的我的意思是没有冒犯）并且它工作正常。但是我正在使用的输入文件大小是几GB，我想知道是否有办法加快执行速度。目前，对于740 MB文件，它需要2分21秒

import glob

#offset values
offsetx = -306000
offsety = -5806000

files = glob.glob('*.pts')
for file in files:
    currentFile = open(file, "r")
    out = open(file[:-4]+"_RGB_moved.pts", "w")
    firstline = str(currentFile.readline())
    out.write(str(firstline.split()[0]))

    while 1:
        lines = currentFile.readlines(100000)
        if not lines:
            break
        for line in lines:
            out.write('\n')
            words = line.split()
            newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), str(float(words[2])), str((int(words[3])+2050)/2)]              
            out.write(" ".join(newwords))

非常感谢

Answer 1

不要使用.readlines()。直接使用该文件作为迭代器：

for file in files:
    with open(file, "r") as currentfile, open(file[:-4]+"_RGB_moved.pts", "w") as out:
        firstline = next(currentFile)
        out.write(firstline.split(None, 1)[0])

        for line in currentfile:
            out.write('\n')
            words = line.split()
            newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), words[2], str((int(words[3]) + 2050) / 2)]              
            out.write(" ".join(newwords))

我还添加了一些Python最佳实践，并且您不需要将words[2]转换为浮点数，然后再转换回字符串。

你也可以考虑使用csv模块，它可以处理C代码中的分割和重新连接行：

import csv

for file in files:
    with open(file, "rb") as currentfile, open(file[:-4]+"_RGB_moved.pts", "wb") as out:
        reader = csv.reader(currentfile, delimiter=' ', quoting=csv.QUOTE_NONE)
        writer = csv.writer(out, delimiter=' ', quoting=csv.QUOTE_NONE)

        out.writerow(next(reader)[0])

        for row in reader:
            newrow = [str(float(row[0])+offsetx), str(float(row[1])+offsety), row[2], str((int(row[3]) + 2050) / 2)]              
            out.writerow(newrow)

Answer 2

使用CSV包。它可能比您的脚本更优化，并将简化您的代码。

如何加速这个真正基本的python脚本来抵消数字行

2 个答案: