我在Keras和Tensorflow中的主程序遇到段错误
python fine-tune.py --train_dir /root/data/train_dump/ --val_dir
/root/data/val_dump/ --nb_epoch 100 --batch_size 32
Using TensorFlow backend.
WARNING:root:Keras version 2.2.2 detected. Last version known to be fully compatible of Keras is 2.1.3 .
WARNING:root:TensorFlow version 1.10.0 detected. Last version known to be fully compatible is 1.5.0 .
Found 2976 images belonging to 4 classes.
Found 1100 images belonging to 4 classes.
2018-09-06 23:28:06.891710: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-06 23:28:08.030896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-06 23:28:08.030988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-06 23:28:08.688955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-06 23:28:08.689049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-09-06 23:28:08.689071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-09-06 23:28:08.689614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10974 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
Epoch 1/100
2018-09-06 23:29:31.853354: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-09-06 23:29:31.853522: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 390.30.0
Segmentation fault (core dumped)
通过此tf.session()测试,张量流似乎还可以:
In [2]: import tensorflow as tf
...:
...: with tf.device('/gpu:0'):
...: a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
...: b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
...: c = tf.matmul(a, b)
...: with tf.Session() as sess:
...: print(sess.run(c))
...:
2018-09-06 23:37:31.603090: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-06 23:37:32.765804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-06 23:37:32.765866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-06 23:37:33.358184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-06 23:37:33.358276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-09-06 23:37:33.358295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-09-06 23:37:33.358793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10974 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
[[22. 28.]
[49. 64.]]
认为警告可能与此有关,我确实尝试过。
Installing collected packages: keras
Found existing installation: Keras 2.2.2
Uninstalling Keras-2.2.2:
Successfully uninstalled Keras-2.2.2
Successfully installed keras-2.1.3
(fastai) root@607b0f29-ad6b-482c-aead-aeae0a84fe2f:~# python fine-tune.py --train_dir /root/data/train_dump/ --val_dir /root/data/val_dump/ --nb_epoch 100 --batch_size 32
Using TensorFlow backend.
Found 2976 images belonging to 4 classes.
Found 1100 images belonging to 4 classes.
2018-09-07 00:30:10.722085: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-07 00:30:11.880320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-07 00:30:11.880388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
Epoch 1/100
2018-09-07 00:30:51.858613: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-09-07 00:30:51.858965: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.30 Wed Jan 31 22:08:49 PST 2018
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
"""
2018-09-07 00:30:51.859055: E tensorflow/stream_executor/cuda/cuda_dnn.cc:393] possibly insufficient driver version: 390.30.0
2018-09-07 00:30:51.859099: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
Aborted (core dumped)