Question

我有一些代码遍历数据库实体，并在任务中运行 - 见下文。

在应用引擎上，我收到Exceeded soft private memory limit错误，确实检查memory_usage().current()确认问题。请参阅下面的日志声明输出。似乎每次获取一批foos时，内存都会上升。

我的问题是：为什么内存不被垃圾收集？我希望，在循环的每次迭代中（分别为while循环和for循环）重用名称foos和{{1} }会导致foo和foos过去指向的对象被＆＃39; de-referenced＆＃39; （即变得无法访问）因此有资格进行垃圾收集，然后在内存变紧时进行垃圾收集。但显然它没有发生。

foo

和some_module.py

from google.appengine.api.runtime import memory_usage

batch_size = 10
dict_of_results = {}
results = 0
cursor = None

while True:
  foos = models.Foo.all().filter('status =', 6)
  if cursor:
     foos.with_cursor(cursor)

  for foo in foos.run(batch_size = batch_size):

     logging.debug('on result #{} used memory of {}'.format(results, memory_usage().current()))
     results +=1

     bar  = some_module.get_bar(foo)

     if bar:
        try:
           dict_of_results[bar.baz] += 1
        except KeyError:
           dict_of_results[bar.baz] = 1


     if results >= batch_size:
        cursor = foos.cursor()
        break

  else:
     break

logging.debug（缩短）的输出

def get_bar(foo):

  for bar in foo.bars:
    if bar.status == 10:
       return bar

  return None

Answer 1

看起来您的批处理解决方案与db的批处理有冲突，导致许多额外的批处理停止。

运行query.run(batch_size=batch_size)时，db将运行查询，直到完成整个限制。到达批处理结束时，db将获取下一批。但是，在db执行此操作后，您将退出循环并再次启动。这意味着批次1 - > n将全部存在于内存中两次。一次为最后一次查询提取，一次为您的下一次查询提取。

如果要遍历所有实体，只需让db处理批处理：

foos = models.Foo.all().filter('status =', 6)
for foo in foos.run(batch_size = batch_size):
  results +=1
  bar  = some_module.get_bar(foo)
  if bar:
    try:
      dict_of_results[bar.baz] += 1
    except KeyError:
      dict_of_results[bar.baz] = 1

或者，如果您想自己处理批处理，请确保db不进行任何批处理：

while True:
  foo_query = models.Foo.all().filter('status =', 6)
  if cursor:
    foo_query.with_cursor(cursor)
  foos = foo_query.fetch(limit=batch_size)
  if not foos:
    break

  cursor = foos.cursor()

Answer 2

你可能会朝错误的方向看。

请查看此Q＆amp; A，了解检查垃圾收集的方法以及可能的替代解释：Google App Engine DB Query Memory Usage

在迭代db结果时，如何在app引擎（python）中收集内存垃圾

2 个答案: