Keras + Tensorflow优化失速

时间:2016-12-06 10:54:26

标签: machine-learning tensorflow deep-learning keras

我安装了Theano(TH),Tensorflow(TF)和Keras。 基本测试似乎表明它们可以与GPU(GTX 1070),Cuda 8.0,cuDNN5.1配合使用。

如果我用TH作为后端运行cifar10_cnn.py Keras example,它似乎工作正常,需要〜18s / epoch。 如果我用TF运行它,那么几乎(它偶尔会工作,无法重现它),优化在每个纪元后停止acc = 0.1。就像权重没有更新一样。

这是一种耻辱,因为TF后端需要大约10s / epoch(即使它工作的次数很少)。我正在使用Conda,我是Python的新手。如果这有帮助,“conda list”似乎显示了一些软件包的两个版本。

如果您有任何线索,请告诉我。谢谢。屏幕截图如下:

python cifar10_cnn.py

Using TensorFlow backend.

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally

X_train shape: (50000, 32, 32, 3)

50000 train samples

10000 test samples

Using real-time data augmentation.

Epoch 1/200

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 

name: GeForce GTX 1070

major: 6 minor: 1 memoryClockRate (GHz) 1.7845

pciBusID 0000:01:00.0

Total memory: 7.92GiB

Free memory: 7.60GiB

I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)

50000/50000 [==============================] - 11s - loss: 2.3029 - acc: 0.0999 - val_loss: 2.3026 - val_acc: 0.1000

Epoch 2/200

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000

Epoch 3/200

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0992 - val_loss: 2.3026 - val_acc: 0.1000

Epoch 4/200

50000/50000 [==============================] - 10s - loss: 2.3028 - acc: 0.0980 - val_loss: 2.3026 - val_acc: 0.1000

Epoch 5/200

13184/50000 [======>.......................] - ETA: 7s - loss: 2.3026 - acc: 0.1044^CTraceback (most recent call last):

1 个答案:

答案 0 :(得分:0)

在我看来,它只是随机猜测,因为有10种可能性,而且正确的是10%的时间。我唯一能想到的是你学习率有点太高了。我已经看到,高学习率模型有时会收敛,有时不收敛。现在在后端,我认为theano执行更多优化,所以这可能会略微影响某些东西。尝试将学习率降低10倍,看看它是否收敛。