Question

我有一个带结构的对象（defaultdict）： {srting：[（string，（float，float）），（string，（float，float）），....]}

它的大小约为12.5 MB

我正在用代码腌制：

with open(Path_to_file, 'wb') as file:
    pickle.dump(data_dict, file)

Pickle文件重量约为300 MB。在proccess中使用代码解开：

with open(Path_to_file, 'rb') as file:
    data_dict_new = pickle.load(file)

系统正在使用大量RAM（约3,5 GB及更多）。但是，在解开Python之后，使用大约1 GB的RAM。

所以我有两个问题：

我的结构与RAM有什么区别？
我该如何清洁？

gc.collect（）无济于事。

Answer 1

我能够重现这一点。实际上，如果你要打开一个大的（大约300M）文件，那么就会使用大量额外的内存。就我而言，一个进程使用1.6G只是为了保留原始生成的data_dict，如果我从文件中加载它则为2.9G。

但是，如果您要在子进程中运行unpickling，系统将在进程join()之后执行完整内存清理。（如答案所述：https://stackoverflow.com/a/1316799/1102535）。所以使用unpickling而不使用额外内存的例子：

from multiprocessing import Process, Manager

def load_pickle(filename, data):
    import cPickle as pickle
    with open(filename, 'rb') as file:
        data_pkl = pickle.load(file)
    for key, val in data_pkl.iteritems():
        data[key] = val

manager = Manager()
data_dict = manager.dict()
p = Process(target=load_pickle, args=("test.pkl", data_dict))
p.start()
p.join()
print len(data_dict)

这段代码有它的缺点（比如在dicts之间复制），但至少你有这个想法。至于我，它在取消之前使用的内存几乎与在酸洗之前的原始数据相同。

在unpickling系统后使用大量的RAM

1 个答案: