Question

这是我的代码，计算频率

import collections
import codecs
import io
from collections import Counter
with io.open('Combine.txt', 'r', encoding='utf8') as infh:
    words =infh.read().split()
    with open('Counts2.txt', 'wb') as f:
        for word, count in Counter(words).most_common(100000000):
            f.write(u'{} {}\n'.format(word, count).encode('utf-8'))

当我尝试读取大文件（4 GB）时，我收到错误

Traceback (most recent call last):
  File "counter.py", line 7, in <module>
    words =infh.read().split()
  File "/usr/lib/python2.7/codecs.py", line 296, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

我使用的是Ubuntu 12.4,8 GB RAM Intel Core i7 如何解决这个错误？ /

usr/lib/python2.7/codecs.py", line 296, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    MemoryError

Answer 1

这是逐行处理文件的pythonic方法：

with open(...) as fh:
    for line in fh:
        pass

这将负责打开和关闭文件，包括是否在内部块中引发异常，并且它将文件对象fh视为可迭代，它自动使用缓冲的I / O并管理内存所以你不必担心大文件。

Answer 2

readline而不是read（）

怎么样？

http://docs.python.org/2/tutorial/inputoutput.html

如何在python中管理内存错误？

2 个答案: