我有一个包含非常大的矩阵列表的文件(即整数列表列表),我想加载到python shell中。文件内容的格式为
L = [ [[1,2],[3,4]], [[5,6],[7,8]], ... ]
所以我尝试通过" execfile(filename)"加载它。不幸的是,我正以这种方式耗尽内存。我做错了什么?
进行比较:文件大小约为2GB,而机器有100GB内存。矩阵的维数为1000x1000。
答案 0 :(得分:1)
My attempt using ast.literal_eval
. If it doesn't work, I'll delete my answer but I think it's worth a shot:
import ast
with open("bigfile.txt") as f:
while True:
c = f.read(1)
if not c:
break
if c=='=':
# equals found, skip spaces if any
while f.read(1)==" ":
pass
break
# rewind to sync with non-whitespace char that we have consumed
f.seek(f.tell()-1)
L = ast.literal_eval(f.read())
basically, open the file, read char by char to skip the assignment (literal_eval
doesn't evaluate assignments, only structures, a bit like json
) and feed the rest of the huge file to the literal evaluator.
Since it's another mean of doing it, it may work, and as a bonus it's much safer than using exec
or eval
.
EDIT: since your comment stated that it still took a lot of memory, I suggest that you write data line by line so ast.literal_eval can evaluate each line as a vector, and you can put it in your matrix.