我有大约10000个文件,包含大量数据。
我试图为所有文件和每个文件中的一些数据构建一个python dict。
我做的是这样的:
results = {}
for bfile in os.listdir(files_dir):
fname, ext = os.path.splitext(bfile)
fhandle = open(os.path.join(files_dir,bfile), 'r' )
if not results.has_key(fname):
results[fname] = {}
for line in fhandle:
line = line.split("\t")
if not results[fname].has_key(line[0]):
results[fname][line[0]] = {}
if not results[fname][line[0]].has_key(line[1]):
results[fname][line[0]][line[1]] = {}
这应该是一项微不足道的任务,但我收到了这个错误:
File "script.py", line 409, in <module>
file_handle()
File "script.py", line 247, in file_handle
results[fname][line[0]][line[1]] = {}
MemoryError
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python2.7/dist-packages/apport/__init__.py", line 1, in <module>
from apport.report import Report
File "/usr/lib/python2.7/dist-packages/apport/report.py", line 18, in <module>
import problem_report
File "/usr/lib/python2.7/dist-packages/problem_report.py", line 14, in <module>
import zlib, base64, time, sys, gzip, struct, os
File "/usr/lib/python2.7/gzip.py", line 10, in <module>
import io
File "/usr/lib/python2.7/io.py", line 60, in <module>
import _io
MemoryError
Original exception was:
Traceback (most recent call last):
File "script.py", line 409, in <module>
file_handle()
File "script.py", line 247, in file_handle
results[fname][line[0]][line[1]] = {}
MemoryError
Segmentation fault (core dumped)
答案 0 :(得分:0)
您完成后似乎永远不会关闭文件。这个可能是问题所在,请尝试以下方法:
with open(os.path.join(files_dir, bfile), 'r') as fhandle:
if not results.has_key(fname):
results[fname] = {}
for line in fhandle:
line = line.split("\t")
if not results[fname].has_key(line[0]):
results[fname][line[0]] = {}
if not results[fname][line[0]].has_key(line[1]):
results[fname][line[0]][line[1]] = {}