如何从kubernetes

时间:2017-06-23 15:26:24

标签: apache-spark kubernetes kubectl minikube

查看https://spark.apache.org/docs/latest/cluster-overview.html中的图片。

enter image description here

外部kubernetes 中运行的spark群集。但是我将在kubernetes 中运行驱动程序。问题是如何让spark集群了解驱动程序是否正常。

我的kubernetes yaml文件:

kind: List
apiVersion: v1
items:
- kind: Deployment
  apiVersion: extensions/v1beta1
  metadata:
    name: counter-uat
  spec:
    replicas: 1
    selector:
      matchLabels:
        name: spark-driver
    template:
      metadata:
        labels:
          name: spark-driver
      spec:
        containers:
          - name: counter-uat
            image: counter:0.1.0
            command: ["/opt/spark/bin/spark-submit", "--class", "Counter", "--master", "spark://spark.uat:7077", "/usr/src/counter.jar"]
- kind: Service
  apiVersion: v1
  metadata:
    name: spark-driver
    labels:
      name: spark-driver
  spec:
    type: NodePort
    ports:
    - name: port
      port: 4040
      targetPort: 4040
    selector:
      name: spark-driver

错误是:

Caused by: java.io.IOException: Failed to connect to /172.17.0.8:44117
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: /172.17.0.8:44117

火花集群正试图到达ip为172.17.0.8的驱动程序。 172.17.0.8可能是kubernetes内部的ip。

如何解决问题?如何修复我的yaml文件?感谢

更新

我添加了以下两个参数:“ - conf”,“spark.driver.bindAddress = 192.168.42.8”,“ - conf”,“spark.driver.host = 0.0.0.0”。

但是从日志中,仍然试图达到172.17.0.8,这是kubernetes内部pod ip。

更新

kind: List
apiVersion: v1
items:
- kind: Deployment
  apiVersion: extensions/v1beta1
  metadata:
    name: counter-uat
  spec:
    replicas: 1
    selector:
      matchLabels:
        name: counter-driver
    template:
      metadata:
        labels:
          name: counter-driver
      spec:
        containers:
          - name: counter-uat
            image: counter:0.1.0
            command: ["/opt/spark/bin/spark-submit", "--class", "Counter", "--master", "spark://spark.uat:7077", "--conf", "spark.driver.bindAddress=192.168.42.8","/usr/src/counter.jar"]

kind: Service
apiVersion: v1
metadata:
  name: counter-driver
  labels:
    name: counter-driver
spec:
  type: NodePort
  ports:
  - name: driverport
    port: 42761
    targetPort: 42761
    nodePort: 30002
  selector:
    name: counter-driver

另一个错误:

2017-06-23T20:00:07.487656154Z Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (starting from 31319)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.

1 个答案:

答案 0 :(得分:0)

尝试将spark.driver.hostspark.driver.bindAddress设置为"spark.uat""spark-driver.uat"或Spark中的实际驱动程序主机。这是这类分布式项目的常见问题,其中主服务器告知客户端要连接的位置。如果您没有指定spark.driver.host,它会尝试自己找出正确的主机,并使用它看到的IP。但在这种情况下,它看到的IP是内部Kubernetes IP,可能无法为客户端正常工作。

您还可以尝试设置SPARK_PUBLIC_DNS环境变量。它实际上有一个更有希望的描述:

  

您的Spark程序的主机名将通告给其他计算机。