DensePose和RTX 2080 Ti抛出'caffe2 :: EnforceNotMet'

时间:2019-01-10 13:41:07

标签: nvidia-docker

主机系统

  • Ubuntu 18.04
  • RTX 2080ti
  • Cuda 10
  • cuDNN 7.4.4(适用于cuda 10.0)
  • NVIDIA驱动程序410.93(于2019年1月3日发布)

我正在尝试使用RTX 2080 ti在新的nvidia-docker卡上运行densepose。安装不是问题,但是运行推断会崩溃。正在运行

python2 tools/infer_simple.py \ --cfg configs/DensePose_ResNet101_FPN_s1x-e2e.yaml \ --output-dir DensePoseData/infer_out/ \ --image-ext jpg \ --wts https://s3.amazonaws.com/densepose/DensePose_ResNet101_FPN_s1x-e2e.pkl \ DensePoseData/demo_data/demo_im.jpg

产生

terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at conv_op_cudnn.cc:572] status == CUDNN_STATUS_SUCCESS. 8 vs 0. , Error at: /var/lib/jenkins/workspace/caffe2/operators/conv_op_cudnn.cc:572: CUDNN_STATUS_EXECUTION_FAILED Error from operator: input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" *** Aborted at 1547126675 (unix time) try "date -d @1547126675" if you are using GNU date *** PC: @ 0x7f6e510e5428 gsignal *** SIGABRT (@0xd) received by PID 13 (TID 0x7f6db5a4c700) from PID 13; stack trace: *** @ 0x7f6e5148b390 (unknown) @ 0x7f6e510e5428 gsignal @ 0x7f6e510e702a abort @ 0x7f6e4afb284d __gnu_cxx::__verbose_terminate_handler() @ 0x7f6e4afb06b6 (unknown) @ 0x7f6e4afb0701 std::terminate() @ 0x7f6e4afdbd38 (unknown) @ 0x7f6e514816ba start_thread @ 0x7f6e511b741d clone @ 0x0 (unknown) Aborted (core dumped)

我设法让darknet and yolo使用nvidia-docker运行。但是,我需要更新Makefile并添加正确的Compute Capability 7.5

该错误似乎与cuDNN有关,但是码头镜像中不应该包含正确的cuDNN吗?还是docker在主机上使用cuDNNDockerfile中的第一行如下:

FROM caffe2/caffe2:snapshot-py2-cuda9.0-cudnn7-ubuntu16.04

对我来说,图像似乎像cudnn一样,我使用哪个版本都没关系?

有人用这张卡经历过类似的事情吗?

0 个答案:

没有答案