我正在尝试使用Apache Spark 2.3在Kubernetes本地提交Spark作业。 当我在Docker Hub上使用Docker镜像时(对于Spark 2.2),它可以工作:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
但是,当我尝试构建本地Docker镜像时,
sudo docker build -t spark:2.3 -f kubernetes/dockerfiles/spark/Dockerfile .
并将作业提交为:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=spark:2.3 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
我收到以下错误; 即“找不到存储库docker.io/spark:不存在或没有拉取访问权限,原因= ErrImagePull,additionalProperties = {})”
status: [ContainerStatus(containerID=null, image=spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=rpc error: code = 2 desc = repository docker.io/spark not found: does not exist or no pull access, reason=ErrImagePull, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-03-15 11:09:54 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: spark-pi-3a1a6e8ce615395fa7df81eac06d58ed-driver
namespace: default
labels: spark-app-selector -> spark-8d9fdaba274a4eb69e28e2a242fe86ca, spark-role -> driver
pod uid: 5271602b-2841-11e8-a78e-fa163ed09d5f
creation time: 2018-03-15T11:09:25Z
service account name: default
volumes: default-token-v4vhk
node name: mlaas-p4k3djw4nsca-minion-1
start time: 2018-03-15T11:09:25Z
container images: spark:2.3
phase: Pending
status: [ContainerStatus(containerID=null, image=spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=Back-off pulling image "spark:2.3", reason=ImagePullBackOff, additionalProperties={}), additionalProperties={}), additionalProperties={})]
另外,我尝试运行本地Docker注册表,如下所述: https://docs.docker.com/registry/deploying/#run-a-local-registry
docker run -d -p 5000:5000 --restart=always --name registry registry:2
sudo docker tag spark:2.3 localhost:5000/spark:2.3
sudo docker push localhost:5000/spark:2.3
我可以成功地做到这一点: docker pull localhost:5000 / spark:2.3
但是,当我提交Spark作业时:
bin/spark-submit \
--master k8s://http://localhost:8080 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=localhost:5000/spark:2.3 \
local:///home/fedora/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar
我再次得到了ErrImagePull:
status: [ContainerStatus(containerID=null, image=localhost:5000/spark:2.3, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=rpc error: code = 2 desc = Error while pulling image: Get http://localhost:5000/v1/repositories/spark/images: dial tcp [::1]:5000: getsockopt: connection refused, reason=ErrImagePull, additionalProperties={}), additionalProperties={}), additionalProperties={})]
在向Kubernetes本地提交作业时,Spark 2.3中有没有办法使用本地Docker镜像?
提前谢谢。
答案 0 :(得分:0)
我猜你使用类似minikube的东西来设置本地Kubernetes集群,在大多数情况下它使用虚拟机来产生集群。
因此,当Kubernetes尝试从localhost
地址提取图像时,它会连接到虚拟机本地地址,而不是计算机地址。此外,本地注册表仅在localhost上绑定,不能从虚拟机访问。
修复的想法是让您的Kubernetes可以访问本地docker注册表,并允许从本地不安全注册表中提取图像。
因此,首先,将您的PC上的docker注册表绑定到所有接口:
docker run -d -p 0.0.0.0:5000:5000 --restart=always --name registry registry:2
然后,检查您的本地IP地址。它将类似于172.X.X.X或10.X.X.X.检查的方式取决于您的操作系统,所以如果您不知道如何获得它,请谷歌。
之后,使用其他选项启动迷你管:
minikube start --insecure-registry="<your-local-ip-address>:5000"
,其中有一个&#39;你的本地IP地址&#39;是您的本地IP地址。
现在您可以尝试使用注册表的新地址运行spark作业,K8s可以下载您的图像:
spark.kubernetes.container.image=<your-local-ip-address>:5000/spark:2.3