https://github.com/zzh8829/yolov3-tf2是项目。我已经安装了所有我认为正确的版本。
google告诉我这可能是VRAM较低的问题,但由于其他原因,我仍在四处寻找。请帮忙。 我正在使用:
Windows 10(不要说“有您的问题”,我需要它)
cuDNN 7.4.6
CUDA 10.0
tensorflow 2.0.0
python 3.6
我有一个gtx1660超级6GB VRAM,在16GB RAM上具有ryzen 7 2700x。再过几天我就会得到gt1080 8gig,我要添加到第二个PCI插槽。
错误如下:
2019-11-30 06:31:26.167368: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-11-30 06:31:27.843742: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2019-11-30 06:31:27.853725: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
Traceback (most recent call last):
File ".\convert.py", line 34, in <module>
app.run(main)
File "C:\Program Files\Python36\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\Program Files\Python36\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File ".\convert.py", line 25, in main
output = yolo(img)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 708, in call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 860, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 708, in call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 860, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py", line 197, in call
outputs = self._convolution_op(inputs, self.kernel)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 1134, in __call__
return self.conv_op(inp, filter)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 639, in __call__
return self.call(inp, filter)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 238, in __call__
name=self.name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 2010, in conv2d
name=name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1031, in conv2d
data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1130, in conv2d_eager_fallback
ctx=_ctx, name=name)
File "C:\Program Files\Python36\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a wa
rning log message was printed above. [Op:Conv2D]
答案 0 :(得分:1)
我在同一存储库中遇到了相同的问题。
对我和我的团队有用的解决方案是将cuDNN升级到7.5或更高版本(而不是7.4)。
有关更新的说明,请访问Nvidia网站:
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
答案 1 :(得分:0)
这可能由于某些原因而发生。
(1)如您所提到的,这可能是一个内存问题,您可以尝试通过为GPU分配更少的内存并查看是否仍然发生该错误来进行验证。您可以像这样(https://github.com/tensorflow/tensorflow/issues/25138#issuecomment-484428798)在TF 2.0中做到这一点:
import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.75)
tf.config.gpu.set_per_process_memory_growth(True)
# your model creation, etc.
model = MyModel(...)
如果您拥有1个以上的GPU(https://github.com/zzh8829/yolov3-tf2/blob/master/train.py#L46-L47),我看到您正在运行的代码会设置动态内存增长,但是由于您只有1个GPU,因此很可能只是尝试分配所有内存(> 90 %)。
(2)当您或其他用户同时使用其他TensorFlow或类似进程使用GPU时,某些用户似乎在Windows上经历了这种情况:https://stackoverflow.com/a/53707323/10993413 >
(3)与往常一样,请确保您的PATH变量正确。有时,如果您尝试了多次安装但未正确清理,则PATH可能会首先找到错误的版本并引起问题。如果将新路径添加到PATH的开头,则应首先找到它们:https://www.tensorflow.org/install/gpu#windows_setup
(4)如@xenotecc所述,您可以尝试升级到CUDNN的较新版本,尽管我不确定这会有所帮助,因为TF文档中列出了您的配置: https://www.tensorflow.org/install/source#gpu。如果确实能解决问题,那么毕竟可能是PATH问题,因为您可能会在安装较新版本后更新PATH。
答案 2 :(得分:0)
遇到相同的错误,并通过以下方法解决:
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5000)])
(使用GTX 1660、6G内存,张量流2.0.1)
答案 3 :(得分:0)
简单修复: 将此行插入“ convert.py”中的导入下
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
这将在加载权重时忽略您的GPU。