将大型矩阵列表加载到Python shell中

时间:2016-12-27 11:17:38

标签: python out-of-memory

我有一个包含非常大的矩阵列表的文件(即整数列表列表),我想加载到python shell中。文件内容的格式为

L = [ [[1,2],[3,4]], [[5,6],[7,8]], ... ]

所以我尝试通过" execfile(filename)"加载它。不幸的是,我正以这种方式耗尽内存。我做错了什么?

进行比较:文件大小约为2GB,而机器有100GB内存。矩阵的维数为1000x1000。

1 个答案:

答案 0 :(得分:1)

My attempt using ast.literal_eval. If it doesn't work, I'll delete my answer but I think it's worth a shot:

import ast

with open("bigfile.txt") as f:
    while True:
        c = f.read(1)
        if not c:
            break
        if c=='=':
            # equals found, skip spaces if any
            while f.read(1)==" ":
                pass
            break

    # rewind to sync with non-whitespace char that we have consumed
    f.seek(f.tell()-1)

    L = ast.literal_eval(f.read())

basically, open the file, read char by char to skip the assignment (literal_eval doesn't evaluate assignments, only structures, a bit like json) and feed the rest of the huge file to the literal evaluator.

Since it's another mean of doing it, it may work, and as a bonus it's much safer than using exec or eval.

EDIT: since your comment stated that it still took a lot of memory, I suggest that you write data line by line so ast.literal_eval can evaluate each line as a vector, and you can put it in your matrix.