如何在向Kubernetes本地提交Spark作业(2.3)时使用本地Docker镜像?

时间:2018-03-15 11:32:31

标签: apache-spark docker kubernetes

我正在尝试使用Apache Spark 2.3在Kubernetes本地提交Spark作业。 当我在Docker Hub上使用Docker镜像时(对于Spark 2.2),它可以工作:

bin/spark-submit \
    --master k8s://http://localhost:8080 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
    local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar 

但是,当我尝试构建本地Docker镜像时,

sudo docker build -t spark:2.3 -f kubernetes/dockerfiles/spark/Dockerfile .

并将作业提交为:

bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=spark:2.3 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar

我收到以下错误; 即“找不到存储库docker.io/spark:不存在或没有拉取访问权限,原因= ErrImagePull,additionalProperties = {})”

status: [ContainerStatus(containerID=null, image=spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=rpc error: code = 2 desc = repository docker.io/spark not found: does not exist or no pull access, reason=ErrImagePull, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-03-15 11:09:54 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state: 
     pod name: spark-pi-3a1a6e8ce615395fa7df81eac06d58ed-driver
     namespace: default
     labels: spark-app-selector -> spark-8d9fdaba274a4eb69e28e2a242fe86ca, spark-role -> driver
     pod uid: 5271602b-2841-11e8-a78e-fa163ed09d5f
     creation time: 2018-03-15T11:09:25Z
     service account name: default
     volumes: default-token-v4vhk
     node name: mlaas-p4k3djw4nsca-minion-1
     start time: 2018-03-15T11:09:25Z
     container images: spark:2.3
     phase: Pending
     status: [ContainerStatus(containerID=null, image=spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=Back-off pulling image "spark:2.3", reason=ImagePullBackOff, additionalProperties={}), additionalProperties={}), additionalProperties={})]

另外,我尝试运行本地Docker注册表,如下所述: https://docs.docker.com/registry/deploying/#run-a-local-registry

docker run -d -p 5000:5000 --restart=always --name registry registry:2

sudo docker tag spark:2.3 localhost:5000/spark:2.3

sudo docker push localhost:5000/spark:2.3

我可以成功地做到这一点: docker pull localhost:5000 / spark:2.3

但是,当我提交Spark作业时:

bin/spark-submit \
    --master k8s://http://localhost:8080 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=localhost:5000/spark:2.3 \
    local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar

我再次得到了ErrImagePull:

status: [ContainerStatus(containerID=null, image=localhost:5000/spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=rpc error: code = 2 desc = Error while pulling image: Get http://localhost:5000/v1/repositories/spark/images: dial tcp [::1]:5000: getsockopt: connection refused, reason=ErrImagePull, additionalProperties={}), additionalProperties={}), additionalProperties={})]

在向Kubernetes本地提交作业时,Spark 2.3中有没有办法使用本地Docker镜像?

提前谢谢。

1 个答案:

答案 0 :(得分:0)

我猜你使用类似minikube的东西来设置本地Kubernetes集群,在大多数情况下它使用虚拟机来产生集群。 因此,当Kubernetes尝试从localhost地址提取图像时,它会连接到虚拟机本地地址,而不是计算机地址。此外,本地注册表仅在localhost上绑定,不能从虚拟机访问。

修复的想法是让您的Kubernetes可以访问本地docker注册表,并允许从本地不安全注册表中提取图像。

因此,首先,将您的PC上的docker注册表绑定到所有接口:

docker run -d -p 0.0.0.0:5000:5000 --restart=always --name registry registry:2

然后,检查您的本地IP地址。它将类似于172.X.X.X或10.X.X.X.检查的方式取决于您的操作系统,所以如果您不知道如何获得它,请谷歌。

之后,使用其他选项启动迷你管:

minikube start --insecure-registry="<your-local-ip-address>:5000",其中有一个&#39;你的本地IP地址&#39;是您的本地IP地址。

现在您可以尝试使用注册表的新地址运行spark作业,K8s可以下载您的图像:

spark.kubernetes.container.image=<your-local-ip-address>:5000/spark:2.3