Spark 2.3提交Kubernetes错误

时间:2018-03-11 23:54:29

标签: apache-spark kubernetes

当我尝试在k8群集上运行spark-submit时,获得以下错误

错误1 :这看起来像一个警告,它不会中断在执行程序窗格内运行的应用程序,但会继续收到此警告

2018-03-09 11:15:21 WARN  WatchConnectionManager:192 - Exec Failure
java.io.EOFException
       at okio.RealBufferedSource.require(RealBufferedSource.java:60)
       at okio.RealBufferedSource.readByte(RealBufferedSource.java:73)
       at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:113)
       at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:97)
       at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
       at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
       at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
       at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)

错误2 :这是间歇性错误,导致执行程序窗格无法运行

org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
    at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
    at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
    at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
    at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver-driver]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
    ... 11 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at okhttp3.Dns$1.lookup(Dns.java:39)
    at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
    at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
    at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
    at okhttp3.RealCall.execute(RealCall.java:69)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
    ... 15 more
2018-03-09 15:00:39 INFO  AbstractConnector:318 - Stopped Spark@5f59185e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-03-09 15:00:39 INFO  SparkUI:54 - Stopped Spark web UI at http://myapp-ef79db3d9f4831bf85bda14145fdf113-driver-svc.default.svc:4040
2018-03-09 15:00:39 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-03-09 15:00:39 INFO  MemoryStore:54 - MemoryStore cleared
2018-03-09 15:00:39 INFO  BlockManager:54 - BlockManager stopped
2018-03-09 15:00:39 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-03-09 15:00:39 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
2018-03-09 15:00:39 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-03-09 15:00:39 INFO  SparkContext:54 - Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2747)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
    at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
    at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
    at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
    at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
    ... 11 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at okhttp3.Dns$1.lookup(Dns.java:39)
    at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
    at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
    at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
    at okhttp3.RealCall.execute(RealCall.java:69)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
    ... 15 more
2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-5bd85c96-d689-4c53-a0b3-1eadd32357cb

注意:能够成功运行应用程序,但spark-submit运行失败,上面的错误2非常频繁。

2 个答案:

答案 0 :(得分:1)

您需要将${SPARK_LOCAL_IP}环境变量设置为Pod IP,并使用--conf spark.driver.host=${SPARK_LOCAL_IP}传递此环境变量。

有关这些变量的更多信息,请参见spark文档。

答案 1 :(得分:0)

我的ubuntu18.04 PC也遇到此问题,但已解决。常规步骤如下:

  1. 通过minikube dashboard -url获取网址,然后进入信息中心。搜索coredns服务以正常启动
  2. 检查coredns服务日志是否由Loop ... detected ...引起
  3. 如果引起此问题,请vim /etc/resolv.conf, 将名称服务器127.0.0.53修改为名称服务器8.8.8.8。
  4. 重新启动minikube stop && minikube start

如果无法启动coredns,则可以在此处进行检查:https://github.com/coredns/coredns/tree/master/plugin/loop