我遇到了一个问题,我的模型训练速度显着减慢
这是发生了什么:
Epoch 00001: val_loss did not improve from 0.03340
Run 27 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 2s 156us/step - loss: 0.0420 - binary_accuracy: 0.9459 - accuracy: 0.9848 - val_loss: 0.0362 - val_binary_accuracy: 0.9501 - val_accuracy: 0.9876
Epoch 00001: val_loss did not improve from 0.03340
Run 28 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 2s 150us/step - loss: 0.0422 - binary_accuracy: 0.9431 - accuracy: 0.9851 - val_loss: 0.0395 - val_binary_accuracy: 0.9418 - val_accuracy: 0.9863
Epoch 00001: val_loss did not improve from 0.03340
Run 29 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 6s 474us/step - loss: 0.0454 - binary_accuracy: 0.9479 - accuracy: 0.9833 - val_loss: 0.0395 - val_binary_accuracy: 0.9475 - val_accuracy: 0.9856
Epoch 00001: val_loss did not improve from 0.03340
Run 30 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 8s 701us/step - loss: 0.0462 - binary_accuracy: 0.9406 - accuracy: 0.9830 - val_loss: 0.0339 - val_binary_accuracy: 0.9502 - val_accuracy: 0.9882
Epoch 00001: val_loss did not improve from 0.03340
Run 31 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 8s 646us/step - loss: 0.0457 - binary_accuracy: 0.9462 - accuracy: 0.9836 - val_loss: 0.0375 - val_binary_accuracy: 0.9417 - val_accuracy: 0.9861
Epoch 00001: val_loss did not improve from 0.03340
Run 32 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 8s 640us/step - loss: 0.0471 - binary_accuracy: 0.9313 - accuracy: 0.9827 - val_loss: 0.0373 - val_binary_accuracy: 0.9446 - val_accuracy: 0.9868
Epoch 00001: val_loss did not improve from 0.03340
Run 33 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 8s 669us/step - loss: 0.0423 - binary_accuracy: 0.9458 - accuracy: 0.9852 - val_loss: 0.0356 - val_binary_accuracy: 0.9510 - val_accuracy: 0.9873
Epoch 00001: val_loss did not improve from 0.03340
Run 34 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 8s 648us/step - loss: 0.0441 - binary_accuracy: 0.9419 - accuracy: 0.9841 - val_loss: 0.0407 - val_binary_accuracy: 0.9357 - val_accuracy: 0.9849
Epoch 00001: val_loss did not improve from 0.03340
Run 35 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 9s 713us/step - loss: 0.0460 - binary_accuracy: 0.9473 - accuracy: 0.9829 - val_loss: 0.0423 - val_binary_accuracy: 0.9604 - val_accuracy: 0.9840
Epoch 00001: val_loss did not improve from 0.03340
Run 36 of 40 | Epoch 61 of 100
(15000, 4410) (15000, 12)
Train on 12000 samples, validate on 3000 samples
Epoch 1/1
12000/12000 [==============================] - 7s 557us/step - loss: 0.0508 - binary_accuracy: 0.9530 - accuracy: 0.9810 - val_loss: 0.0470 - val_binary_accuracy: 0.9323 - val_accuracy: 0.9820
我的 GPU 使用率没有减少(实际上增加了):
我的 CPU 使用率、时钟和 GPU 时钟(核心和内存)都保持不变。我的 RAM 使用量也大致保持不变。
唯一奇怪的部分是我的整体功率下降(图像百分比):
我在某处读到这是由于 ADAM 优化器的 beta_1 参数造成的,将其设置为 0.99 应该可以解决问题,但问题仍然存在。
是否还有其他原因导致这种情况发生?它看起来像是计算方面的问题,因为没有硬件/驱动程序问题的迹象。
答案 0 :(得分:0)
以防万一有人遇到这个问题,我将列出可能有帮助的内容:
K.clear_session()
(确保您执行 import from keras import backend as K
)config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.compat.v1.Session(config=config)
with
)del VARIABLE
可能会有所帮助。最坏的情况是,您必须加载较小的数据块或减小模型大小。如果有人对可能解决此类问题的方法有任何其他想法,请随时发表评论,我会编辑此答案。