我想在特定时间内生成并保留一组元组。但是我发现,如果有足够的时间,该程序似乎会消耗掉所有内存。
我尝试了两种方法。一种是删除新生成的变量,另一种是gc.collect()。但是他们两个都不起作用。如果仅生成而不保留元组,则程序将消耗有限的内存。
生成并保存:gk.py
class Generator(Sequence):
# Class is a dataset wrapper for better training performance
def __init__(self, x_set, y_set, batch_size=256):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
self.indices = np.arange(self.x.shape[0])
def __len__(self):
return math.floor(self.x.shape[0] / self.batch_size)
def __getitem__(self, idx):
inds = self.indices[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_x = self.x[inds]
batch_y = self.y[inds]
return batch_x, batch_y
def on_epoch_end(self):
np.random.shuffle(self.indices)
生成而不保留:gnk.py
import gc
import time
from memory_profiler import profile
from random import sample
from sys import getsizeof
@profile
def loop(limit):
t = time.time()
i = 0
A = set()
while True:
i += 1
duration = time.time() - t
a = tuple(sorted(sample(range(200), 100)))
A.add(a)
if not i % int(1e4):
print('step {:.2e}...'.format(i))
if duration > limit:
print('done')
break
# method 1: delete the variables
# del duration, a
# method 2: use gc
# gc.collect()
memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
getsizeof(a) + getsizeof(limit) + getsizeof(A)
print('memory consumed: {:.2e}MB'.format(memory/2**20))
pass
def main():
limit = 300
loop(limit)
pass
if __name__ == '__main__':
print('running...')
main()
在cmd / shell中使用“ mprof”(需要模块memory_profiler)来检查内存使用情况
import time
from memory_profiler import profile
from random import sample
from sys import getsizeof
@profile
def loop(limit):
t = time.time()
i = 0
while True:
i += 1
duration = time.time() - t
a = tuple(sorted(sample(range(200), 100)))
if not i % int(1e4):
print('step {:.2e}...'.format(i))
if duration > limit:
print('done')
break
memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
getsizeof(a) + getsizeof(limit)
print('memory consumed: {:.2e}MB'.format(memory/2**20))
pass
def main():
limit = 300
loop(limit)
pass
if __name__ == '__main__':
print('running...')
main()
gk.py的结果
mprof run my_file.py
mprof plot
gnk.py的结果
memory consumed: 4.00e+00MB
Filename: gk.py
Line # Mem usage Increment Line Contents
================================================
12 32.9 MiB 32.9 MiB @profile
13 def loop(limit):
14 32.9 MiB 0.0 MiB t = time.time()
15 32.9 MiB 0.0 MiB i = 0
16 32.9 MiB 0.0 MiB A = set()
17 32.9 MiB 0.0 MiB while True:
18 115.8 MiB 0.0 MiB i += 1
19 115.8 MiB 0.0 MiB duration = time.time() - t
20 115.8 MiB 0.3 MiB a = tuple(sorted(sample(range(200), 100)))
21 115.8 MiB 2.0 MiB A.add(a)
22 115.8 MiB 0.0 MiB if not i % int(1e4):
23 111.8 MiB 0.0 MiB print('step {:.2e}...'.format(i))
24 115.8 MiB 0.0 MiB if duration > limit:
25 115.8 MiB 0.0 MiB print('done')
26 115.8 MiB 0.0 MiB break
27 # method 1: delete the variables
28 # del duration, a
29 # method 2: use gc
30 # gc.collect()
31 memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
32 115.8 MiB 0.0 MiB getsizeof(a) + getsizeof(limit) + getsizeof(A)
33 115.8 MiB 0.0 MiB print('memory consumed: {:.2e}MB'.format(memory/2**20))
34 115.8 MiB 0.0 MiB pass
我有两个问题:
这两个程序所消耗的内存均大于所占用的变量。 “ gk.py”占用了115.8MB,其变量占用了4.00MB。 “ gnk.py”消耗了33.0MB,其变量占用了9.08e-04MB。为什么程序消耗的内存比占用的相应变量多?
存储器随时间线性增加。 “ gnk.py”消耗的内存随时间不断变化。为什么会这样?
任何帮助将不胜感激。
答案 0 :(得分:2)
鉴于该集合的大小在不断增加,因此有一段时间它最终会消耗掉所有内存。
一个估算值(通过我的计算机):
10 seconds of code running ~ 5e4 tuples saved to the set
300 seconds of code running ~ 1.5e6 tuples saved to the set
1 tuple = 100 integers ~ 400bytes
total:
1.5e6 * 400bytes = 6e8bytes = 600MB filled in 300s