Question

以下是我的代码，它给了我内存错误：

with open('E:\\Book\\1900.txt', 'r', encoding='utf-8') as readFile:
    for line in readFile:
        sepFile = readFile.read().lower()
        words_1900 = re.findall('\w+', sepFile)

输出：

Traceback (most recent call last):
File "C:\Python34\50CommonWords.py", line 13, in <module>
sepFile = readFile.read().lower()
MemoryError

Answer 1

我会说，不是将整个文件读入内存，而是应该逐行读取文件，然后使用collections.Counter()逐步跟踪整个文件中的单词及其计数。然后在最后使用Counter.most_common()方法获取50个最常见的元素。示例 -

import collections
import re
cnt = Counter()
with open('E:\\Book\\1900.txt', 'r', encoding='utf-8') as readFile:
    for line in readFile:
        cnt.update(re.findall('\w+', line.lower()))
print("50 most common are")
print([x for x,countx in cnt.most_common(50)])       # Doing this list comprehension to only take the elements, not the count.

如果文件中有许多不同的单词，此方法也可能以MemoryError结束。

此外，Counter.most_common()返回一个元组列表，其中每个元组中元组的第一个元素是实际的单词，第二个元素是该单词的计数。

Python 3.4.1中的内存错误

1 个答案: