使用Python的spark-on-k8s资源登台服务器

时间:2018-02-10 02:25:21

标签: python apache-spark pyspark kubernetes

我一直关注Running Spark on Kubernetes docs与spark-on-k8s v2.2.0-kubernetes-0.5.0,Kubernetes v1.9.0和Minikube v0.25.0。

我可以使用此命令成功运行Python作业:

bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://10.128.0.4:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
  --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
  local:///opt/spark/examples/src/main/python/pi.py 10

我能够使用此命令成功运行具有本地依赖关系的Java作业(在设置资源登台服务器之后):

bin/spark-submit \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://https://10.128.0.4:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
  ./examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar

是否可以运行具有本地依赖关系的Python作业?我尝试了这个命令但它失败了:

bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://10.128.0.4:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
  ./examples/src/main/python/pi.py 10

我在驱动程序日志中遇到此错误:

Error: Could not find or load main class .opt.spark.jars.RoaringBitmap-0.5.11.jar

事件日志中的这些错误:

MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "spark-pi-1518224354203-init-config" not found
...
MountVolume.SetUp failed for volume "spark-init-secret" : secrets "spark-pi-1518224354203-init-secret" not found

1 个答案:

答案 0 :(得分:0)

修复是通过--jars

将jar作为依赖项提供
bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://10.128.0.4:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
  --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
  ./examples/src/main/python/pi.py 10

我不确定为什么会这样(RoaringBitmap-0.5.11.jar应该存在/opt/spark/jars并且在任何情况下都会添加到类路径中),但这现在解决了我的问题。