我不会在GPU(Nvidia Quadro P5000)上训练LSTM网络,因此已安装tensorflow-gpu 1.13
。
网络:
model = Sequential()
model.add(LSTM(256, return_sequences=True, input_shape=(SEQ_LEN, FEATURES))) # , return_sequences=True
model.add(Dropout(0.3))
model.add(LSTM(256))
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
tensorboard = TensorBoard(log_dir='logs/{}'.format(NAME))
filepath = NAME + "_{epoch:02d}-{val_acc:.3f}"
checkpoint = ModelCheckpoint("models/{}.model".format(filepath, monitor='val_acc', verbose=1, save_best_only=True,
mode='max')) # saves only the best ones
history = model.fit(
train_x, train_y,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(validation_x, validation_y),
callbacks=[tensorboard, checkpoint],
verbose=2)
当我第一次运行代码时,我得到以下输出:
WARNING:tensorflow:From C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\layers\core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 200, 256) 269312
_________________________________________________________________
dropout (Dropout) (None, 200, 256) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 256) 525312
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
dense (Dense) (None, 128) 32896
_________________________________________________________________
dense_1 (Dense) (None, 1) 129
=================================================================
Total params: 827,649
Trainable params: 827,649
Non-trainable params: 0
_________________________________________________________________
Train on 135 samples, validate on 15 samples
WARNING:tensorflow:From C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-06-22 13:32:38.312006: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-06-22 13:32:38.661805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Quadro P5000 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:18:00.0
totalMemory: 16.00GiB freeMemory: 13.40GiB
2019-06-22 13:32:38.662434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-22 13:32:39.314138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-22 13:32:39.314871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-22 13:32:39.315399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-22 13:32:39.316240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 12964 MB memory) -> physical GPU (device: 0, name: Quadro P5000, pci bus id: 0000:18:00.0, compute capability: 6.1)
Epoch 1/1000
2019-06-22 13:32:54.748055: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
2019-06-22 13:32:55.055557: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.056085: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.056398: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.056787: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.057127: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.057525: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.057834: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.058159: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.065639: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.065970: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.066302: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.066660: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.067030: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.067353: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.067684: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.067977: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.072992: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.073554: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.075976: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.076362: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.076852: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.078560: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.078807: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.079266: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.080071: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.080366: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.080835: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.081167: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.081488: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.081843: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.082937: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.083244: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.083599: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.083848: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.085391: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.085731: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.086090: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.086451: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.087272: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.087620: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.087862: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.088335: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.088958: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.089280: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.089578: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.089914: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.090225: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.090537: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.090909: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.091175: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.098724: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2019-06-22 13:32:55.099470: W tensorflow/stream_executor/stream.cc:2130] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "<input>", line 108, in <module>
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\engine\training.py", line 880, in fit
validation_steps=validation_steps)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
run_metadata_ptr)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 256), b.shape=(256, 256), m=1, n=256, k=256
[[{{node lstm/while/MatMul_4}}]]
[[{{node loss/mul}}]]
我发现可以使用以下方法解决此问题:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
在此代码之后运行newtwok将产生以下结果:
Traceback (most recent call last):
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<input>", line 20, in <module>
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\engine\training.py", line 880, in fit
validation_steps=validation_steps)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 215, in model_iteration
mode=mode)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\callbacks.py", line 106, in configure_callbacks
callback_list.set_model(callback_model)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\callbacks.py", line 178, in set_model
callback.set_model(model)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\callbacks.py", line 1010, in set_model
self._init_writer()
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\callbacks.py", line 947, in _init_writer
self.writer = tf_summary.FileWriter(self.log_dir, K.get_session().graph)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\backend.py", line 482, in get_session
_initialize_variables(session)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\keras\backend.py", line 758, in _initialize_variables
[variables_module.is_variable_initialized(v) for v in candidate_vars])
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "C:\Users\eiLink\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
当我使用tf.device('/ cpu:0')在cpu上运行代码时,代码没有给出任何错误