为什么我的Python循环打算消耗所有内存?

时间:2019-04-28 00:43:26

标签: python

我想在特定时间内生成并保留一组元组。但是我发现,如果有足够的时间,该程序似乎会消耗掉所有内存。

我尝试了两种方法。一种是删除新生成的变量,另一种是gc.collect()。但是他们两个都不起作用。如果仅生成而不保留元组,则程序将消耗有限的内存。

生成并保存:gk.py

class Generator(Sequence):
    # Class is a dataset wrapper for better training performance
    def __init__(self, x_set, y_set, batch_size=256):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size
        self.indices = np.arange(self.x.shape[0])

    def __len__(self):
        return math.floor(self.x.shape[0] / self.batch_size)

    def __getitem__(self, idx):
        inds = self.indices[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_x = self.x[inds]
        batch_y = self.y[inds]
        return batch_x, batch_y

    def on_epoch_end(self):
        np.random.shuffle(self.indices)

生成而不保留:gnk.py

import gc
import time
from memory_profiler import profile
from random import sample
from sys import getsizeof


@profile
def loop(limit):
    t = time.time()
    i = 0
    A = set()
    while True:
        i += 1
        duration = time.time() - t
        a = tuple(sorted(sample(range(200), 100)))
        A.add(a)
        if not i % int(1e4):
            print('step {:.2e}...'.format(i))
        if duration > limit:
            print('done')
            break
        # method 1: delete the variables
#        del duration, a
        # method 2: use gc
#        gc.collect()
    memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
             getsizeof(a) + getsizeof(limit) + getsizeof(A)
    print('memory consumed: {:.2e}MB'.format(memory/2**20))
    pass


def main():
    limit = 300
    loop(limit)
    pass


if __name__ == '__main__':
    print('running...')
    main()

在cmd / shell中使用“ mprof”(需要模块memory_profiler)来检查内存使用情况

import time
from memory_profiler import profile
from random import sample
from sys import getsizeof


@profile
def loop(limit):
    t = time.time()
    i = 0
    while True:
        i += 1
        duration = time.time() - t
        a = tuple(sorted(sample(range(200), 100)))
        if not i % int(1e4):
            print('step {:.2e}...'.format(i))
        if duration > limit:
            print('done')
            break
    memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
             getsizeof(a) + getsizeof(limit)
    print('memory consumed: {:.2e}MB'.format(memory/2**20))
    pass


def main():
    limit = 300
    loop(limit)
    pass


if __name__ == '__main__':
    print('running...')
    main()

gk.py的结果

mprof run my_file.py
mprof plot

gnk.py的结果

memory consumed: 4.00e+00MB
Filename: gk.py

Line #    Mem usage    Increment   Line Contents
================================================
    12     32.9 MiB     32.9 MiB   @profile
    13                             def loop(limit):
    14     32.9 MiB      0.0 MiB       t = time.time()
    15     32.9 MiB      0.0 MiB       i = 0
    16     32.9 MiB      0.0 MiB       A = set()
    17     32.9 MiB      0.0 MiB       while True:
    18    115.8 MiB      0.0 MiB           i += 1
    19    115.8 MiB      0.0 MiB           duration = time.time() - t
    20    115.8 MiB      0.3 MiB           a = tuple(sorted(sample(range(200), 100)))
    21    115.8 MiB      2.0 MiB           A.add(a)
    22    115.8 MiB      0.0 MiB           if not i % int(1e4):
    23    111.8 MiB      0.0 MiB               print('step {:.2e}...'.format(i))
    24    115.8 MiB      0.0 MiB           if duration > limit:
    25    115.8 MiB      0.0 MiB               print('done')
    26    115.8 MiB      0.0 MiB               break
    27                                     # method 1: delete the variables
    28                             #        del duration, a
    29                                     # method 2: use gc
    30                             #        gc.collect()
    31                                 memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
    32    115.8 MiB      0.0 MiB                getsizeof(a) + getsizeof(limit) + getsizeof(A)
    33    115.8 MiB      0.0 MiB       print('memory consumed: {:.2e}MB'.format(memory/2**20))
    34    115.8 MiB      0.0 MiB       pass

我有两个问题:

  1. 这两个程序所消耗的内存均大于所占用的变量。 “ gk.py”占用了115.8MB,其变量占用了4.00MB。 “ gnk.py”消耗了33.0MB,其变量占用了9.08e-04MB。为什么程序消耗的内存比占用的相应变量多?

  2. 消耗的“ gk.py”的
  3. 存储器随时间线性增加。 “ gnk.py”消耗的内存随时间不断变化。为什么会这样?

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:2)

鉴于该集合的大小在不断增加,因此有一段时间它最终会消耗掉所有内存。

一个估算值(通过我的计算机):

10 seconds of code running ~ 5e4 tuples saved to the set
300 seconds of code running ~ 1.5e6 tuples saved to the set

1 tuple = 100 integers ~ 400bytes

total:

1.5e6 * 400bytes = 6e8bytes = 600MB filled in 300s