Question

我很高兴地使用pandas来存储和操作实验数据。通常，我通过pd.HDFstore选择HDF格式（我不掌握）来保存内容。

我的数据框架变得越来越大，需要一些经济内存。

我阅读了相关问题中链接的一些指南，尽管我无法实现可持续的内存消耗，例如：在我的以下典型任务中：

. load some `df` in memory (scale size is 10GB)
. do business with some other preloaded `df`
. unload
. repeat

显然我在卸货阶段一直处于失败状态。

因此，我希望您考虑以下实验。

（从新开始的内核（在ipython笔记本中，如果这很重要））

import pandas as pd

for idx in range(6):
    print idx
    store = pd.HDFStore('detection_DB_N.h5')
    detection_DB = store['detection_DB']
    store.close()

    del detection_DB

统计信息（来自top）：

. memory used by first iteration ~8GB
. memory used at the end of execution ~10GB (6 cycles)

然后，在同一个内核中，我运行

for idx in range(6):
    print idx
    store = pd.HDFStore('detection_DB_N.h5')
    detection_DB = store['detection_DB']
    store.close()

    #del detection_DB  #SAME AS BEFORE, BUT I DON'T del

统计：

. memory used at the end of execution ~15GB

调用del detection_DB不会对内存产生任何影响（CPU使用率会在5秒内变高）。

类比，致电

 import gc 
 gc.collect()

没有任何相关的区别。

我补充说，重复之前的调用，我到达时已经占用了~20GB（并且没有可以加载的对象）。

任何人都能解开一些光明吗？

如何在del之后达到~0GB（左右）？

通过加载/卸载周期的pandas，HDFstore和内存使用情况

0 个答案: