我有一个基于 pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel
图像的 docker 图像。它可以在 macOS Catalina 上成功构建,但无法在 macOS Big Sur 上构建。在 python 脚本中运行 import encoding
时失败(参见下面的堆栈跟踪)。
似乎问题与包装 c++ 实现的 python 深度学习库有关。
我不想手动修复 C++ 实现,而是更喜欢更通用的解决方案。
我尝试在没有 docker 的情况下运行该项目(在 Big Sur 机器上)并且它成功了。我尝试在另一台装有 macOS Catalina 的机器上构建它,它仍然成功构建。
我真的需要在 Big Sur 的 docker 中构建它。有人可以帮忙吗?
详情:
pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel
Dockerfile:
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel
RUN apt-get -y update && apt-get install -y --no-install-recommends \
curl \
ca-certificates \
&& rm -rf /var/lib/apt/lists/* && apt-get clean
ENV PATH="/miniconda/bin:$PATH"
RUN curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh >> miniconda.sh \
&& bash ./miniconda.sh -b -p /miniconda; rm ./miniconda.sh
WORKDIR /opt/app
COPY conda.yaml ./
RUN apt-get -y update && apt-get install -y build-essential cmake \
&& conda env update --prefix /miniconda --file conda.yaml \
&& conda clean -tipsy \
&& rm -rf /var/lib/apt/lists/* && apt-get clean \
&& rm -rf ~/.cache/pip
RUN apt-get -y update && apt-get install -y --no-install-recommends \
libgl1 libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/* && apt-get clean
COPY app/preload_model.py ./
RUN python preload_model.py
preload_model.py
import encoding #fails here
encoding.models.get_model('DeepLab_ResNeSt200_ADE', pretrained=True)
conda.yaml
channels:
- defaults
- pytorch
dependencies:
- python=3.8
- pytorch=1.6.0
- cudatoolkit=10.1
- scipy=1.5.2
- Flask=1.1.2
- gunicorn=20.0.4
- torchvision=0.7.0
- Pillow=7.2.0
- requests=2.24.0
- numpy=1.19.1
- ca-certificates
- certifi
- pip=20.2.2
- pip:
- brotlipy==0.7.0
- chardet==3.0.4
- click==7.1.2
- future==0.18.2
- itsdangerous==1.1.0
- Jinja2==2.11.2
- nose==1.3.7
- opencv-python==4.4.0.44
- portalocker==2.0.0
- six==1.15.0
- torch-encoding==1.2.1
- tqdm==4.50.0
- Werkzeug==1.0.1
- imagehash==4.2.0
- Flask-Caching==1.9.0
错误:
> [10/17] RUN python preload_model.py:
#14 1.680 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#14 69.42 Traceback (most recent call last):
#14 69.42 File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _run_ninja_build
#14 69.42 subprocess.run(
#14 69.42 File "/miniconda/lib/python3.8/subprocess.py", line 516, in run
#14 69.42 raise CalledProcessError(retcode, process.args,
#14 69.42 subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
#14 69.42
#14 69.42 During handling of the above exception, another exception occurred:
#14 69.42
#14 69.42 Traceback (most recent call last):
#14 69.42 File "preload_model.py", line 1, in <module>
#14 69.42 import encoding
#14 69.42 File "/miniconda/lib/python3.8/site-packages/encoding/__init__.py", line 13, in <module>
#14 69.42 from . import nn, functions, parallel, utils, models, datasets, transforms
#14 69.42 File "/miniconda/lib/python3.8/site-packages/encoding/nn/__init__.py", line 12, in <module>
#14 69.42 from .encoding import *
#14 69.42 File "/miniconda/lib/python3.8/site-packages/encoding/nn/encoding.py", line 18, in <module>
#14 69.42 from ..functions import scaled_l2, aggregate, pairwise_cosine
#14 69.42 File "/miniconda/lib/python3.8/site-packages/encoding/functions/__init__.py", line 2, in <module>
#14 69.42 from .encoding import *
#14 69.42 File "/miniconda/lib/python3.8/site-packages/encoding/functions/encoding.py", line 14, in <module>
#14 69.42 from .. import lib
#14 69.42 File "/miniconda/lib/python3.8/site-packages/encoding/lib/__init__.py", line 9, in <module>
#14 69.42 cpu = load('enclib_cpu', [
#14 69.42 File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 963, in load
#14 69.42 return _jit_compile(
#14 69.42 File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1170, in _jit_compile
#14 69.42 _write_ninja_file_and_build_library(
#14 69.42 File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1276, in _write_ninja_file_and_build_library
#14 69.42 _run_ninja_build(
#14 69.42 File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1529, in _run_ninja_build
#14 69.42 raise RuntimeError(message)
#14 69.42 RuntimeError: Error building extension 'enclib_cpu': [1/7] c++ -MMD -MF encoding_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/encoding_cpu.cpp -o encoding_cpu.o
#14 69.42 [2/7] c++ -MMD -MF rectify_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/rectify_cpu.cpp -o rectify_cpu.o
#14 69.42 FAILED: rectify_cpu.o
#14 69.42 c++ -MMD -MF rectify_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/rectify_cpu.cpp -o rectify_cpu.o
#14 69.42 c++: internal compiler error: Killed (program cc1plus)
#14 69.42 Please submit a full bug report,
#14 69.42 with preprocessed source if appropriate.
#14 69.42 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
#14 69.42 [3/7] c++ -MMD -MF nms_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/nms_cpu.cpp -o nms_cpu.o
#14 69.42 FAILED: nms_cpu.o
#14 69.42 c++ -MMD -MF nms_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/nms_cpu.cpp -o nms_cpu.o
#14 69.42 c++: internal compiler error: Killed (program cc1plus)
#14 69.42 Please submit a full bug report,
#14 69.42 with preprocessed source if appropriate.
#14 69.42 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
#14 69.42 [4/7] c++ -MMD -MF syncbn_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/syncbn_cpu.cpp -o syncbn_cpu.o
#14 69.42 FAILED: syncbn_cpu.o
#14 69.42 c++ -MMD -MF syncbn_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/syncbn_cpu.cpp -o syncbn_cpu.o
#14 69.42 c++: internal compiler error: Killed (program cc1plus)
#14 69.42 Please submit a full bug report,
#14 69.42 with preprocessed source if appropriate.
#14 69.42 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
#14 69.42 [5/7] c++ -MMD -MF roi_align_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/roi_align_cpu.cpp -o roi_align_cpu.o
#14 69.42 In file included from /miniconda/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9:0,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/
.....
roi_align_cpu.cpp:1:
#14 69.42 /miniconda/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:354:7: note: declared here
#14 69.42 T * data() const {
#14 69.42 ^~~~
#14 69.42 In file included from /miniconda/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9:0,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/roi_align_cpu.cpp:1:
#14 69.42 /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/roi_align_cpu.cpp:472:34: warning: ‘T* at::Tensor::data() const [with T = float]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
#14 69.42 bottom_rois.data<scalar_t>(),
#14 69.42 ^
#14 69.42 In file included from /miniconda/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3:0,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
#14 69.42 from /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/roi_align_cpu.cpp:1:
#14 69.42 /miniconda/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:354:7: note: declared here
#14 69.42 T * data() const {
#14 69.42 ^~~~
#14 69.42 [6/7] c++ -MMD -MF operator.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/operator.cpp -o operator.o
#14 69.42 ninja: build stopped: subcommand failed.
#14 69.42
------
executor failed running [/bin/sh -c python preload_model.py]: exit code: 1