CannotStartContainerError:API错误(400):OCI运行时创建失败:container_linux.go:348

时间:2018-10-11 15:42:09

标签: amazon-web-services docker aws-batch

我正在按照here中的教程尝试通过AWS Batch运行脚本。具体来说,入口点脚本是same:它是一个脚本,该脚本从S3存储桶下载要在AWS Batch中执行的代码。但是,无论我如何尝试在AWS上执行它,我总是会收到:

CannotStartContainerError: API error (400): OCI runtime create failed: 
  container_linux.go:348: starting container process caused "exec:
  \"/usr/local/bin/fetch_and_run.sh\": 
  stat /usr/local/bin/fetch_and_run.sh: no such file or directory": unknown

我能够在本地启动相同的容器。

我使用以下命令从awscli启动该过程:

aws batch submit-job --job-name mss_dev --job-definition mapper \
  --job-queue bio-job-queue \
  --container-overrides '{"environment": \
  [{"name": "BATCH_FILE_S3_URL", "value": "s3://test/myjob.sh"}, \
   {"name": "BATCH_FILE_TYPE", "value": "script"}], \
   "command":["/usr/local/bin/fetch_and_run.sh"]}'

我的Dockerfile如下:

FROM amazonlinux:latest

# General dependencies and user
## aws-cli installed twice (here for root, later for user)
RUN yum -y install which unzip tar wget aws-cli curl sudo
RUN yum -y groupinstall 'Development Tools'
RUN yum -y install gcc git curl make zlib-devel bzip2 bzip2-devel readline-devel sqlite sqlite-devel openssl openssl-devel
RUN yum -y install java-1.8.0-openjdk.x86_64
## User and work directory
RUN groupadd -r user && useradd -mr -g user -d /home/user -s /sbin/nologin -c "Docker image user" user
RUN echo "user ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
ENV HOME /home/user
## Change user to user
USER user
ENV USER user
RUN sh -c "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install.sh)" && echo 'export PATH="/home/linuxbrew/.linuxbrew/bin:$PATH"' >>~/.profile
## GNU parallel 10 seconds installation
#WORKDIR $HOME/tools/parallel
#RUN (wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
# RUN brew install gcc
ENV PATH "/home/linuxbrew/.linuxbrew/bin:$PATH"
RUN brew install parallel

# Pyenv
WORKDIR $HOME
RUN git clone git://github.com/yyuu/pyenv.git .pyenv

ENV PYENV_ROOT $HOME/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH

# Python3
RUN pyenv install 3.6.5
RUN pyenv global 3.6.5
RUN pyenv rehash

# Python3 modules
RUN pip install --upgrade pip
RUN pip install --upgrade awscli pandas scipy numpy kneed

# STAR
RUN mkdir -p $HOME/tools/STAR
WORKDIR $HOME/tools/STAR
RUN wget https://github.com/alexdobin/STAR/archive/2.6.1b.tar.gz && tar xvf 2.6.1b.tar.gz

# DropSeq
RUN mkdir -p $HOME/tools/DropSeq
WORKDIR $HOME/tools/DropSeq
RUN wget https://github.com/broadinstitute/Drop-seq/releases/download/v1.13/Drop-seq_tools-1.13.zip && unzip Drop-seq_tools-1.13.zip

# Reference and other files should be downloaded during execution
RUN mkdir -p $HOME/data
RUN mkdir -p $HOME/results
COPY --chown=user:user code /home/user/code

# Copy main files and set entrypoint
WORKDIR /tmp
ADD fetch_and_run.sh /usr/local/bin/fetch_and_run.sh
USER nobody
ENTRYPOINT ["/usr/local/bin/fetch_and_run.sh"]
# To debug
# ENTRYPOINT ["/bin/bash"]

1 个答案:

答案 0 :(得分:0)

罪魁祸首是工作定义(在AWS控制台中,请参阅here中的“创建工作定义”)。 对于ECR存储库URI,我忘记了使用更新后的图像(例如012345678901.dkr.ecr.us-east-1.amazonaws.com/awsbatch/fetch_and_run)的URI,而是使用默认的amazonlinux图像。

主要提示是我能够在本地运行它。