在我的应用程序中,我重新使用在ImageNet上训练的现有MobileNet,并仅使用5个类重新训练鲜花数据集上的输出图层。重新训练的模型将保存到磁盘。然后,加载模型并在几次迭代期间执行评估,这最终导致内存耗尽并且整个应用程序崩溃。在做了一些诊断之后,我意识到泄漏来自model.evaluate()keras方法。该问题可以在独立的示例代码中重现:
import os
import resource
import keras
import numpy as np
if __name__ == '__main__':
init_alloc = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
for it in range(4):
x_valid = np.random.uniform(0, 1, (64, 224, 224, 3)).astype(np.float32)
y_valid = keras.utils.to_categorical(np.random.uniform(0, 5, (64, )).astype(np.int32), 5)
start_alloc = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
model = keras.models.load_model(os.path.abspath(os.path.join('.', 'mobilenet_flowers.h5')),
custom_objects={'relu6': keras.applications.mobilenet.relu6,
'DepthwiseConv2D': keras.applications.mobilenet.DepthwiseConv2D})
loss, _ = model.evaluate(x_valid, y_valid, batch_size=64, verbose=False)
keras.backend.clear_session()
del model
end_alloc = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print('Iteration %d:' % it)
print(' Memory alloc before evaluate() is %7d kilobytes' % start_alloc)
print(' Memory alloc after evaluate() is %7d kilobytes' % end_alloc)
print(' Memory alloc loss for evaluate is %7d kilobytes\n' % (end_alloc - start_alloc))
exit_alloc = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print('Memory alloc before loop is %7d kilobytes' % init_alloc)
print('Memory alloc after loop is %7d kilobytes' % exit_alloc)
print('Memory alloc difference is %7d kilobytes' % (exit_alloc - init_alloc))
执行脚本时,会打印出以下内容:
Iteration 0:
Memory alloc before evaluate() is 251864 kilobytes
Memory alloc after evaluate() is 901696 kilobytes
Memory alloc loss for evaluate is 649832 kilobytes
Iteration 1:
Memory alloc before evaluate() is 901696 kilobytes
Memory alloc after evaluate() is 1036780 kilobytes
Memory alloc loss for evaluate is 135084 kilobytes
Iteration 2:
Memory alloc before evaluate() is 1036780 kilobytes
Memory alloc after evaluate() is 1148692 kilobytes
Memory alloc loss for evaluate is 111912 kilobytes
Iteration 3:
Memory alloc before evaluate() is 1148692 kilobytes
Memory alloc after evaluate() is 1190804 kilobytes
Memory alloc loss for evaluate is 42112 kilobytes
Memory alloc before loop is 138792 kilobytes
Memory alloc after loop is 1190804 kilobytes
Memory alloc difference is 1052012 kilobytes
有什么建议可能会出错吗?通过论坛后,我尝试添加K.clear_session(),但是,正如您在代码中看到的那样,这并没有帮助。该模型临时存储在https://ufile.io/rgaxs。
有关我的环境的一些其他信息:
== cat /etc/issue ===============================================
Linux 4.10.0-38-generic #42~16.04.1-Ubuntu SMP Tue Oct 10 16:32:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
VERSION="16.04.3 LTS (Xenial Xerus)"
VERSION_ID="16.04"
VERSION_CODENAME=xenial
== are we in docker =============================================
No
== compiler =====================================================
c++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
== check pips ===================================================
numpy (1.12.1)
numpydoc (0.7.0)
protobuf (3.5.0)
tensorflow (1.4.0)
tensorflow-tensorboard (0.4.0rc3)
== check for virtualenv =========================================
False
== tensorflow import ============================================
tf.VERSION = 1.4.0
tf.GIT_VERSION = v1.4.0-rc1-11-g130a514
tf.COMPILER_VERSION = v1.4.0-rc1-11-g130a514
keras.VERSION = 2.0.9