为 jupyter spark notebook 构建 docker 镜像时出错

时间:2021-06-01 21:51:15

标签: docker apache-spark jupyter-notebook jupyter

我正在尝试按照此处的指南在 docker 中构建 Jupyter 笔记本: https://github.com/cordon-thiago/airflow-spark 并收到退出代码错误:8。 我跑了:

$ docker build --rm --force-rm -t jupyter/pyspark-notebook:3.0.1 .

建筑物停在代码处:

RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\?as_json | \
    python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") && \
    echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" | sha512sum -c - && \
    tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner && \
    rm "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz"

错误信息如下:


 => ERROR [4/9] RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz?as_json |     python -c "import sys, json; content=json.load(sys.stdin);   2.3s
------
 > [4/9] RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz?as_json |     python -c "import sys, json; content=json.load(sys.stdin); print(content[
'preferred']+content['path_info'])") &&     echo "F4A10BAEC5B8FF1841F10651CAC2C4AA39C162D3029CA180A9749149E6060805B5B5DDF9287B4AA321434810172F8CC0534943AC005531BB48B6622FBE228DDC *spark-3.0.1-bin-hadoop2.7.
tgz" | sha512sum -c - &&     tar xzf "spark-3.0.1-bin-hadoop2.7.tgz" -C /usr/local --owner root --group root --no-same-owner &&     rm "spark-3.0.1-bin-hadoop2.7.tgz":
------
executor failed running [/bin/bash -o pipefail -c wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\
?as_json |     python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") &&     echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_
VERSION}.tgz" | sha512sum -c - &&     tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner &&     rm "spark-${APACHE_SPARK_VERSION}
-bin-hadoop${HADOOP_VERSION}.tgz"]: exit code: 8

如果有人能在这方面启发我,我真的很感激。谢谢!

1 个答案:

答案 0 :(得分:0)

退出代码 8 是 likely from wget 表示来自服务器的错误响应。例如,Dockerfile 尝试从中获取 wget 的此路径不再有效:https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz

从 repo 上的问题来看,似乎 Apache version 3.0.1 is no longer valid 因此您应该使用 --build-arg 将 APACHE_SPARK 版本覆盖到 3.0.2:

docker build --rm --force-rm \
  --build-arg spark_version=3.0.2 \
  -t jupyter/pyspark-notebook:3.0.2 .

编辑

有关更多信息,请参阅下面的评论,有效的命令是:

docker build --rm --force-rm \
  --build-arg spark_version=3.1.1 \
  --build-arg hadoop_version=2.7 \
  -t jupyter/pyspark-notebook:3.1.1 .  

并更新了 spark 校验和以反映 3.1.1 的版本:https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz.sha512

为了使这个答案在未来具有相关性,它可能需要为最新的 spark/hadoop 版本更新版本和再次校验和。