我知道这个问题是在GitHub的1,2的几个不同时间问过的。但是,我的问题在几个关键方面有所不同,因此请不要立即重复。
我正在循环训练许多不同的模型,以比较它们在不同时间段内的性能。大致如下:
for period in periods:
train_df, test_df = utils.split_train_test(
df,
period.min_train,
period.max_train,
period.min_test,
period.max_test
)
train_X, train_y, test_X, test_y = extract_features(train_df, test_df)
model_2_mlp = models.train_2_layer_mlp(train_X, train_y, verbose=0)
local_results['2_layer_mlp'] = model_perf.eval_keras(
model_2_mlp,
train_X,
test_X,
train_y,
test_y
)
model_5_mlp = models.train_5_layer_mlp_with_dropout(train_X, train_y,
verbose=0)
local_results['5_layer_mlp_dropout'] = model_perf.eval_keras(
model_5_mlp,
train_X,
test_X,
train_y,
test_y
)
...
# save local_results to a file
在循环几次迭代后,tensorflow引发OOM错误。但是,没有任何模型会耗尽GPU。我什至可以在令人讨厌的时间戳记下重新启动代码,并正确训练模型。只有长时间执行后,我才会收到此错误。
有什么方法可以强制执行GPU垃圾回收吗?
特定错误:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[28277,2000]
and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training_28/Adam/gradients/dense_93/MatMul_grad/MatMul_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to R
unOptions for current allocation info.
答案 0 :(得分:1)
是的,使用keras.backend.clear_session()
从内存中删除所有模型,您应该在每次循环迭代结束时使用它。