我有一个 Docker 镜像,它使用 PyTorch 来执行对象检测。 容器在 local 和 Google Colab 上运行良好,但是在 Kubernetes(通过 Airflow)上运行时,它会引发以下错误:
[2021-06-23 07:13:17,592] {pod_launcher.py:148} INFO - Traceback (most recent call last):
[2021-06-23 07:13:17,592] {pod_launcher.py:148} INFO - File "/content/main.py", line 5, in <module>
[2021-06-23 07:13:17,594] {pod_launcher.py:148} INFO - app()
[2021-06-23 07:13:17,594] {pod_launcher.py:148} INFO - File "/usr/local/lib/python3.6/dist-packages/typer/main.py", line 214, in __call__
[2021-06-23 07:13:17,594] {pod_launcher.py:148} INFO - return get_command(self)(*args, **kwargs)
[2021-06-23 07:13:17,594] {pod_launcher.py:148} INFO - File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
[2021-06-23 07:13:17,594] {pod_launcher.py:148} INFO - return self.main(*args, **kwargs)
[2021-06-23 07:13:17,594] {pod_launcher.py:148} INFO - File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
[2021-06-23 07:13:17,595] {pod_launcher.py:148} INFO - rv = self.invoke(ctx)
[2021-06-23 07:13:17,595] {pod_launcher.py:148} INFO - File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
[2021-06-23 07:13:17,595] {pod_launcher.py:148} INFO - return _process_result(sub_ctx.command.invoke(sub_ctx))
[2021-06-23 07:13:17,595] {pod_launcher.py:148} INFO - File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
[2021-06-23 07:13:17,595] {pod_launcher.py:148} INFO - return ctx.invoke(self.callback, **ctx.params)
[2021-06-23 07:13:17,595] {pod_launcher.py:148} INFO - File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
[2021-06-23 07:13:17,595] {pod_launcher.py:148} INFO - return callback(*args, **kwargs)
[2021-06-23 07:13:17,596] {pod_launcher.py:148} INFO - File "/usr/local/lib/python3.6/dist-packages/typer/main.py", line 497, in wrapper
[2021-06-23 07:13:17,596] {pod_launcher.py:148} INFO - return callback(**use_params) # type: ignore
[2021-06-23 07:13:17,596] {pod_launcher.py:148} INFO - File "/content/app/__init__.py", line 52, in detect_from_file
[2021-06-23 07:13:17,597] {pod_launcher.py:148} INFO - coco_path=coco_path,
[2021-06-23 07:13:17,597] {pod_launcher.py:148} INFO - File "/content/app/__init__.py", line 126, in _detect_from_file
[2021-06-23 07:13:17,597] {pod_launcher.py:148} INFO - tables = infer_page(page_filename, model)
[2021-06-23 07:13:17,597] {pod_launcher.py:148} INFO - File "/content/app/utils.py", line 9, in infer_page
[2021-06-23 07:13:17,597] {pod_launcher.py:148} INFO - result = inference_detector(model, str(img))
[2021-06-23 07:13:17,597] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/apis/inference.py", line 86, in inference_detector
[2021-06-23 07:13:17,598] {pod_launcher.py:148} INFO - result = model(return_loss=False, rescale=True, **data)
[2021-06-23 07:13:17,598] {pod_launcher.py:148} INFO - File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
[2021-06-23 07:13:17,598] {pod_launcher.py:148} INFO - result = self.forward(*input, **kwargs)
[2021-06-23 07:13:17,598] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
[2021-06-23 07:13:17,598] {pod_launcher.py:148} INFO - return old_func(*args, **kwargs)
[2021-06-23 07:13:17,599] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/models/detectors/base.py", line 149, in forward
[2021-06-23 07:13:17,599] {pod_launcher.py:148} INFO - return self.forward_test(img, img_metas, **kwargs)
[2021-06-23 07:13:17,599] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/models/detectors/base.py", line 130, in forward_test
[2021-06-23 07:13:17,599] {pod_launcher.py:148} INFO - return self.simple_test(imgs[0], img_metas[0], **kwargs)
[2021-06-23 07:13:17,599] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 324, in simple_test
[2021-06-23 07:13:17,600] {pod_launcher.py:148} INFO - self.test_cfg.rpn) if proposals is None else proposals
[2021-06-23 07:13:17,600] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/models/detectors/test_mixins.py", line 34, in simple_test_rpn
[2021-06-23 07:13:17,600] {pod_launcher.py:148} INFO - proposal_list = self.rpn_head.get_bboxes(*proposal_inputs)
[2021-06-23 07:13:17,600] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/core/fp16/decorators.py", line 127, in new_func
[2021-06-23 07:13:17,600] {pod_launcher.py:148} INFO - return old_func(*args, **kwargs)
[2021-06-23 07:13:17,600] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/models/anchor_heads/anchor_head.py", line 276, in get_bboxes
[2021-06-23 07:13:17,600] {pod_launcher.py:148} INFO - scale_factor, cfg, rescale)
[2021-06-23 07:13:17,601] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/models/anchor_heads/rpn_head.py", line 92, in get_bboxes_single
[2021-06-23 07:13:17,601] {pod_launcher.py:148} INFO - proposals, _ = nms(proposals, cfg.nms_thr)
[2021-06-23 07:13:17,601] {pod_launcher.py:148} INFO - File "/content/mmdetection/mmdet/ops/nms/nms_wrapper.py", line 54, in nms
[2021-06-23 07:13:17,601] {pod_launcher.py:148} INFO - inds = nms_cuda.nms(dets_th, iou_thr)
[2021-06-23 07:13:17,602] {pod_launcher.py:148} INFO - RuntimeError: CUDA error: no kernel image is available for execution on the device (launch_kernel at /pytorch/aten/src/ATen/native/cuda/Loops.cuh:103)
[2021-06-23 07:13:17,602] {pod_launcher.py:148} INFO - frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7faf44434193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
[2021-06-23 07:13:17,602] {pod_launcher.py:148} INFO - frame #1: void at::native::gpu_index_kernel<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>), &(void at::native::index_kernel_impl<at::native::OpaqueType<8> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>)), 1u>> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>), &(void at::native::index_kernel_impl<at::native::OpaqueType<8> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>)), 1u>> const&) + 0x7bb (0x7faefc45e87b in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,602] {pod_launcher.py:148} INFO - frame #2: <unknown function> + 0x580fc32 (0x7faefc458c32 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,602] {pod_launcher.py:148} INFO - frame #3: <unknown function> + 0x580ff88 (0x7faefc458f88 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,602] {pod_launcher.py:148} INFO - frame #4: <unknown function> + 0x1a7493b (0x7faef86bd93b in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,603] {pod_launcher.py:148} INFO - frame #5: at::native::index(at::Tensor const&, c10::ArrayRef<at::Tensor>) + 0x47e (0x7faef86b96fe in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,603] {pod_launcher.py:148} INFO - frame #6: <unknown function> + 0x1fe06aa (0x7faef8c296aa in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,603] {pod_launcher.py:148} INFO - frame #7: <unknown function> + 0x1fe5173 (0x7faef8c2e173 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,603] {pod_launcher.py:148} INFO - frame #8: <unknown function> + 0x3bffe6a (0x7faefa848e6a in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #9: <unknown function> + 0x1fe5173 (0x7faef8c2e173 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #10: at::Tensor c10::KernelFunction::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(at::Tensor const&, c10::ArrayRef<at::Tensor>) const + 0xa3 (0x7faef3b7ec73 in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #11: c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)#1}::operator()(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > const&) const + 0xc9 (0x7faef3b7c331 in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #12: std::result_of<c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)#1} (ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)>::type c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > >::read<c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)#1}>(c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > const&)#1}&&) const + 0x128 (0x7faef3b7eed2 in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #13: at::Tensor c10::Dispatcher::doCallUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::DispatchTable const&, c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::KernelFunction> > > > const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const + 0x6a (0x7faef3b7c3ba in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #14: c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(c10::DispatchTable const&)#1}::operator()(c10::DispatchTable const&) const + 0x80 (0x7faef3b7936a in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #15: std::result_of<c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(c10::DispatchTable const&)#1} (c10::DispatchTable const&)>::type c10::LeftRight<c10::DispatchTable>::read<c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(c10::DispatchTable const&)#1}>(c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(c10::DispatchTable const&)#1}&&) const + 0x128 (0x7faef3b7f056 in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #16: c10::guts::infer_function_traits<c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(c10::DispatchTable const&)#1}>::type::return_type c10::impl::OperatorEntry::readDispatchTable<c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(c10::DispatchTable const&)#1}>(c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const::{lambda(c10::DispatchTable const&)#1}&&) const + 0x4a (0x7faef3b7c42c in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,604] {pod_launcher.py:148} INFO - frame #17: at::Tensor c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::ArrayRef<at::Tensor> >(c10::OperatorHandle const&, at::Tensor const&, c10::ArrayRef<at::Tensor>) const + 0x7c (0x7faef3b7941a in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,605] {pod_launcher.py:148} INFO - frame #18: at::Tensor::index(c10::ArrayRef<at::Tensor>) const + 0x16f (0x7faef3b74dad in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,605] {pod_launcher.py:148} INFO - frame #19: nms_cuda(at::Tensor, float) + 0x84f (0x7faef3b71a0c in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,605] {pod_launcher.py:148} INFO - frame #20: nms(at::Tensor const&, float) + 0xee (0x7faef3b6087e in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,605] {pod_launcher.py:148} INFO - frame #21: <unknown function> + 0x335ab (0x7faef3b6f5ab in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
[2021-06-23 07:13:17,605] {pod_launcher.py:148} INFO - frame #22: <unknown function> + 0x302b0 (0x7faef3b6c2b0 in /content/mmdetection/mmdet/ops/nms/nms_cuda.cpython-36m-x86_64-linux-gnu.so)
以下是命令 nvidia-smi
的输出以及 mmdetection
的另一个实用程序在 Kubernetes 和 Google Colab 上运行(代码运行良好)
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000001:00:00.0 Off | 0 |
| N/A 23C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
python3 /content/mmdetection/mmdet/utils/collect_env.py
sys.platform: linux
Python: 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0: Tesla P100-PCIE-16GB
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.4.0+cu100
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.0
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.5.0+cu100
OpenCV: 4.5.2
MMCV: 0.4.3
MMDetection: 1.2.0+unknown
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 10.0
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 37C P0 25W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
python3 /content/mmdetection/mmdet/utils/collect_env.py
sys.platform: linux
Python: 3.7.10 (default, May 3 2021, 02:48:31) [GCC 7.5.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GPU 0: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.4.0+cu100
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.0
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.5.0+cu100
OpenCV: 4.5.2
MMCV: 0.4.3
MMDetection: 1.2.0+0f33c08
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 11.0
注意:此处发布的 Google Colab 输出在 Tesla V100 GPU 上运行,但有时我会分配到 Tesla P100(与 Kubernetes 上使用的 GPU 相同)并且代码运行流畅在这两种情况下(在 Google Colab 上),但是在 Kubernetes 上运行时会引发错误。
感谢任何帮助