这是我的代码,计算频率
import collections
import codecs
import io
from collections import Counter
with io.open('Combine.txt', 'r', encoding='utf8') as infh:
words =infh.read().split()
with open('Counts2.txt', 'wb') as f:
for word, count in Counter(words).most_common(100000000):
f.write(u'{} {}\n'.format(word, count).encode('utf-8'))
当我尝试读取大文件(4 GB)时,我收到错误
Traceback (most recent call last):
File "counter.py", line 7, in <module>
words =infh.read().split()
File "/usr/lib/python2.7/codecs.py", line 296, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError
我使用的是Ubuntu 12.4,8 GB RAM Intel Core i7 如何解决这个错误? /
usr/lib/python2.7/codecs.py", line 296, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError
答案 0 :(得分:1)
这是逐行处理文件的pythonic方法:
with open(...) as fh:
for line in fh:
pass
这将负责打开和关闭文件,包括是否在内部块中引发异常,并且它将文件对象fh
视为可迭代,它自动使用缓冲的I / O并管理内存所以你不必担心大文件。
答案 1 :(得分:-1)
readline而不是read()
怎么样?