为什么这个简单的pandas操作会泄漏内存?

时间:2018-02-04 06:51:49

标签: python pandas memory-leaks garbage-collection

考虑以下计划:

import pandas as pd
import datetime
import time
import psutil
import os
import gc

# Construct a trivial pandas time series
data = []
indexes = []
for _ in xrange(5):
  data.append(_)
  indexes.append(datetime.datetime.now())
  time.sleep(1)
s = pd.Series(data, index=indexes)

for _ in xrange(100000):
  # Remove the next line to prevent memory leak
  foo = datetime.datetime.now() - s.index[-1] 

  # These lines are okay
  foo_dt = datetime.datetime.now()
  foo_idx = s.index[-1]
  #gc.collect()  # This mitigates but does not eliminate the problem

  # Get memory per https://stackoverflow.com/a/21632554/939259
  process = psutil.Process(os.getpid())
  print(process.memory_info().rss)

这给出了结果(如果包含gc.collect()):

$ python ./test_leak.py | uniq
60502016
60547072
60755968
<snip>

没有gc.collect()类似:

$ python ./test_leak.py | uniq
60518400
60588032
60776448
<snip>

这里发生了什么?当我正在做的是分配一个临时的时,为什么内存会增加?

0 个答案:

没有答案