查看https://spark.apache.org/docs/latest/cluster-overview.html中的图片。
在外部kubernetes 中运行的spark群集。但是我将在kubernetes 中运行驱动程序。问题是如何让spark集群了解驱动程序是否正常。
我的kubernetes yaml文件:
kind: List
apiVersion: v1
items:
- kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: counter-uat
spec:
replicas: 1
selector:
matchLabels:
name: spark-driver
template:
metadata:
labels:
name: spark-driver
spec:
containers:
- name: counter-uat
image: counter:0.1.0
command: ["/opt/spark/bin/spark-submit", "--class", "Counter", "--master", "spark://spark.uat:7077", "/usr/src/counter.jar"]
- kind: Service
apiVersion: v1
metadata:
name: spark-driver
labels:
name: spark-driver
spec:
type: NodePort
ports:
- name: port
port: 4040
targetPort: 4040
selector:
name: spark-driver
错误是:
Caused by: java.io.IOException: Failed to connect to /172.17.0.8:44117
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: /172.17.0.8:44117
火花集群正试图到达ip为172.17.0.8的驱动程序。 172.17.0.8可能是kubernetes内部的ip。
如何解决问题?如何修复我的yaml文件?感谢
更新
我添加了以下两个参数:“ - conf”,“spark.driver.bindAddress = 192.168.42.8”,“ - conf”,“spark.driver.host = 0.0.0.0”。
但是从日志中,仍然试图达到172.17.0.8,这是kubernetes内部pod ip。
更新
kind: List
apiVersion: v1
items:
- kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: counter-uat
spec:
replicas: 1
selector:
matchLabels:
name: counter-driver
template:
metadata:
labels:
name: counter-driver
spec:
containers:
- name: counter-uat
image: counter:0.1.0
command: ["/opt/spark/bin/spark-submit", "--class", "Counter", "--master", "spark://spark.uat:7077", "--conf", "spark.driver.bindAddress=192.168.42.8","/usr/src/counter.jar"]
kind: Service
apiVersion: v1
metadata:
name: counter-driver
labels:
name: counter-driver
spec:
type: NodePort
ports:
- name: driverport
port: 42761
targetPort: 42761
nodePort: 30002
selector:
name: counter-driver
另一个错误:
2017-06-23T20:00:07.487656154Z Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (starting from 31319)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
答案 0 :(得分:0)
尝试将spark.driver.host
或spark.driver.bindAddress
设置为"spark.uat"
或"spark-driver.uat"
或Spark中的实际驱动程序主机。这是这类分布式项目的常见问题,其中主服务器告知客户端要连接的位置。如果您没有指定spark.driver.host
,它会尝试自己找出正确的主机,并使用它看到的IP。但在这种情况下,它看到的IP是内部Kubernetes IP,可能无法为客户端正常工作。
您还可以尝试设置SPARK_PUBLIC_DNS
环境变量。它实际上有一个更有希望的描述:
您的Spark程序的主机名将通告给其他计算机。