主机系统
我正在尝试使用RTX 2080 ti
在新的nvidia-docker
卡上运行densepose。安装不是问题,但是运行推断会崩溃。正在运行
python2 tools/infer_simple.py \
--cfg configs/DensePose_ResNet101_FPN_s1x-e2e.yaml \
--output-dir DensePoseData/infer_out/ \
--image-ext jpg \
--wts https://s3.amazonaws.com/densepose/DensePose_ResNet101_FPN_s1x-e2e.pkl \
DensePoseData/demo_data/demo_im.jpg
产生
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at conv_op_cudnn.cc:572] status == CUDNN_STATUS_SUCCESS. 8 vs 0. , Error at: /var/lib/jenkins/workspace/caffe2/operators/conv_op_cudnn.cc:572: CUDNN_STATUS_EXECUTION_FAILED Error from operator:
input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
*** Aborted at 1547126675 (unix time) try "date -d @1547126675" if you are using GNU date ***
PC: @ 0x7f6e510e5428 gsignal
*** SIGABRT (@0xd) received by PID 13 (TID 0x7f6db5a4c700) from PID 13; stack trace: ***
@ 0x7f6e5148b390 (unknown)
@ 0x7f6e510e5428 gsignal
@ 0x7f6e510e702a abort
@ 0x7f6e4afb284d __gnu_cxx::__verbose_terminate_handler()
@ 0x7f6e4afb06b6 (unknown)
@ 0x7f6e4afb0701 std::terminate()
@ 0x7f6e4afdbd38 (unknown)
@ 0x7f6e514816ba start_thread
@ 0x7f6e511b741d clone
@ 0x0 (unknown)
Aborted (core dumped)
我设法让darknet and yolo使用nvidia-docker
运行。但是,我需要更新Makefile
并添加正确的Compute Capability 7.5
。
该错误似乎与cuDNN
有关,但是码头镜像中不应该包含正确的cuDNN
吗?还是docker在主机上使用cuDNN
? Dockerfile
中的第一行如下:
FROM caffe2/caffe2:snapshot-py2-cuda9.0-cudnn7-ubuntu16.04
对我来说,图像似乎像cudnn一样,我使用哪个版本都没关系?
有人用这张卡经历过类似的事情吗?