我在openstack上有k8s集群。我使用舵图https://github.com/eddieesquivel/kubernetes-spark/tree/master/chart启动了火花簇。我做了一些修改,例如我在Web ui中使用NodePort
而不是LoadBalancer
,因为我不知道如何在openstack和k8s上设置LoadBalancer
。我使用了https://hub.docker.com/r/gettyimages/spark/tags中的docker映像(2.3.1-hadoop-3.0)。
我运行kubectl port-forward <spark master pod name> 8080:8080
并在127.0.0.1:8080上看到了所有3名工作人员。
我运行kubectl exec -it <spark master pod name> -- bash
并运行./bin/spark-submit --master spark://myspark-master:7077 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi /usr/spark-2.3.1/examples/jars/spark-examples_2.11-2.3.1.jar
,但是从一名工作人员的日志文件中,出现以下错误:
Spark Executor Command: "/usr/jdk1.8.0_171/bin/java" "-cp" "/usr/spark-2.3.1/conf/:/usr/spark-2.3.1/jars/*:/usr/hadoop-3.0.0/etc/hadoop/:/usr/hadoop-3.0.0/etc/hadoop/*:/usr/hadoop-3.0.0/share/hadoop/common/lib/*:/usr/hadoop-3.0.0/share/hadoop/common/*:/usr/hadoop-3.0.0/share/hadoop/hdfs/*:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/*:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/*:/usr/hadoop-3.0.0/share/hadoop/yarn/*:/usr/hadoop-3.0.0/share/hadoop/mapreduce/lib/*:/usr/hadoop-3.0.0/share/hadoop/mapreduce/*:/usr/hadoop-3.0.0/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.driver.port=45708" "-Dspark.rpc.askTimeout=10s" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@myspark-worker-d95949b47-szscm:45708" "--executor-id" "48" "--hostname" "10.233.88.5" "--cores" "2" "--app-id" "app-20181231154056-0000" "--worker-url" "spark://Worker@10.233.88.5:42919"
========================================
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1980)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
... 4 more
Caused by: java.io.IOException: Failed to connect to myspark-worker-d95949b47-szscm:45708
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: myspark-worker-d95949b47-szscm
似乎不能通过主机名或Pod名称来相互访问worker和master。但是由于上述错误,Spark尝试通过pod名称或主机名(而不是ips)连接到worker / master。如何修复舵图文件?还有其他解决方案吗?
顺便说一句,我试图将所有主机名及其ip手动添加到每个Pod上的/etc/hosts
上,这是目前不受欢迎的。
谢谢
更新
来自下面的Spark Executor命令:Spark Executor命令:"/usr/jdk1.8.0_171/bin/java" "-Xmx1024M" "-Dspark.driver.port=45708" "-Dspark.rpc.askTimeout=10s" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@myspark-worker-d95949b47-szscm:45708" "--executor-id" "1" "--hostname" "10.233.86.5" "--cores" "2" "--app-id" "app-20181231154056-0000" "--worker-url" "spark://Worker@10.233.86.5:44798"
中使用了myspark-worker-d95949b47-szscm
,--driver-url
,无法解析。该如何解决?