Python - 打开和更改大文本文件

时间:2015-06-22 03:46:19

标签: python replace out-of-memory large-files

我有一个~600MB的Roblox类型.mesh文件,它在任何文本编辑器中都像文本文件一样读取。我在下面有以下代码:

mesh = open("file.mesh", "r").read()
mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{")
mesh = "{"+mesh+"}"
f = open("p2t.txt", "w")
f.write(mesh)

它返回:

Traceback (most recent call last):
  File "C:\TheDirectoryToMyFile\p2t2.py", line 2, in <module>
    mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{")
MemoryError

以下是我的文件示例:

[-0.00599, 0.001466, 0.006][0.16903, 0.84515, 0.50709][0.00000, 0.00000, 0][-0.00598, 0.001472, 0.00599][0.09943, 0.79220, 0.60211][0.00000, 0.00000, 0]

我该怎么办?

编辑:

我不确定head,follow和tail命令在那个标记为重复的其他线程中是什么。我试图使用它,但无法让它工作。该文件也是一条巨行,它不会分成几行。

5 个答案:

答案 0 :(得分:4)

每次迭代需要读一口,分析它然后写入另一个文件或sys.stdout。试试这段代码:

mesh = open("file.mesh", "r")
mesh_out = open("file-1.mesh", "w")

c = mesh.read(1)

if c:
    mesh_out.write("{")
else:
    exit(0)
while True:
    c = mesh.read(1)
    if c == "":
        break

    if c == "[":
        mesh_out.write(",{")
    elif c == "]":
        mesh_out.write("}")
    else:
        mesh_out.write©

UPD:

它的效果非常慢(感谢jamylak)。所以我改变了它:

import sys
import re


def process_char(c, stream, is_first=False):
    if c == '':
        return False
    if c == '[':
        stream.write('{' if is_first else ',{')
        return True
    if c == ']':
        stream.write('}')
        return True


def process_file(fname):
    with open(fname, "r") as mesh:
        c = mesh.read(1)
        if c == '':
            return
        sys.stdout.write('{')

        while True:
            c = mesh.read(8192)
            if c == '':
                return

            c = re.sub(r'\[', ',{', c)
            c = re.sub(r'\]', '}', c)
            sys.stdout.write(c)


if __name__ == '__main__':
    process_file(sys.argv[1])

所以现在它在1.4G文件上工作约15秒。要运行它:

$ python mesh.py file.mesh > file-1.mesh

答案 1 :(得分:1)

你可以逐行完成:

mesh = open("file.mesh", "r")
with open("p2t.txt", "w") as f:
   for line in mesh:
      line= line.replace("[", "{").replace("]", "}").replace("}{", "},{")
      line = "{"+line +"}"
      f.write(line)

答案 2 :(得分:1)

BLOCK_SIZE = 1 << 15
with open(input_file, 'rb') as fin, open(output_file, 'wb') as fout:
    for block in iter(lambda: fin.read(BLOCK_SIZE), b''):
        # do your replace
        fout.write(block)

答案 3 :(得分:0)

import os
f = open('p2f.txt','w')
with open("file.mesh") as mesh:
  while True:
    c = mesh.read(1)
    if not c:
      f.seek(-1,os.SEEK_END)
      f.truncate()
      break
    elif c == '[':
        f.write('{')
    elif c == ']':
        f.write('},')
   else:
       f.write(c)

p2f.txt

{-0.00599, 0.001466, 0.006},{0.16903, 0.84515, 0.50709},{0.00000, 0.00000, 0},{-0.00598, 0.001472, 0.00599},{0.09943, 0.79220, 0.60211},{0.00000, 0.00000, 0}

答案 4 :(得分:-2)

 $this->widget('zii.widgets.CMenu', array(
                            'items'=>array(
                                array('label'=>'Home',   'url'=>array('site/index')),
                                array('label'=>'Products', 'url'=>array('product/index'), 'items'=>array(
                                    array('label'=>'New Arrivals', 'url'=>array('product/new')),
                                    array('label'=>'Most Popular', 'url'=>array('product/index')),
                                    array('label'=>'Another', 'url'=>array('product/index'), 'items'=>array(
                                        array('label'=>'Level 3 One', 'url'=>array('product/new')),
                                        array('label'=>'Level 3 Two', 'url'=>array('product/index')),
                                        array('label'=>'Level 3 Three', 'url'=>array('product/index'), 'items'=>array(
                                            array('label'=>'Level 4 One', 'url'=>array('product/new')),
                                            array('label'=>'Level 4 Two', 'url'=>array('product/index')),
                                        )),
                                    )),
                                )),
                                array('label'=>'Login', 'url'=>array('site/login')),
                            ),
                        ));