在Kubernetes上看不到Spark的输出或错误消息

时间:2019-05-21 11:47:49

标签: apache-spark kubernetes

尝试使用Kubernetes master运行简单的Spark应用程序。但是我没有得到预期的输出/处理,也没有看到任何错误消息。最后一个pod阶段为“失败”,错误代码为101。pod日志显示了通常的log4j警告,但没有其他内容。

使用hyperv在我的办公室笔记本电脑上的Windows(amd64)上运行minikube v1.0.1。建议将minikube VM上的#cpus和内存增加到3和4 GB。

确保应用程序在Spark Standalone上运行良好。第一个应用程序“ Hello”应该打印“ Hello”消息。第二个应用程序“计算月收入”应该通过JDBC从Teradata读取数据,进行聚合并将结果通过JDBC写回到Teradata表。

还要确保“ hello minikube”工作正常。

在以下所有代码段中,...表示为简洁起见而省略的部分,>>>表示命令提示符。

>>> spark-submit --master k8s://https://153.65.225.219:8443 --deploy-mode cluster --name Hello --class Hello --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=rahulvkulkarni/default:spark-td-run --conf spark.kubernetes.container.image.pullSecrets=regcred local://hello_2.12-0.1.0-SNAPSHOT.jar
log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/05/20 16:59:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
         pod name: hello-1558351748442-driver
...
         phase: Pending
         status: []
...
19/05/20 16:59:13 INFO LoggingPodStatusWatcherImpl: State changed, new state:
         pod name: hello-1558351748442-driver
...
         phase: Failed
         status: [ContainerStatus(containerID=docker://464c9c0e23d543f20954d373218c9cefefc31107711cbd2ada4d93bb31ce4d80, image=rahulvkulkarni/default:spark-td-run, imageID=docker-pullable://rahulvkulkarni/default@sha256:1de9951c4ac9f0b5f26efa3949e1effa779b0605066f2043738402ce20e8179b, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://464c9c0e23d543f20954d373218c9cefefc31107711cbd2ada4d93bb31ce4d80, exitCode=101, finishedAt=2019-05-17T18:26:41Z, message=null, reason=Error, signal=null, startedAt=2019-05-17T18:26:40Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
19/05/20 16:59:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:


         Container name: spark-kubernetes-driver
         Container image: rahulvkulkarni/default:spark-td-run
         Container state: Terminated
         Exit code: 101
19/05/20 16:59:13 INFO Client: Application Hello finished.
...


>>> kubectl logs hello-1558351748442-driver
++ id -u
...
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.5 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class Hello spark-internal
19/05/17 18:26:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.spark.deploy.SparkSubmit$$anon$2).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

退出代码101是什么意思?如何找到实际错误?

然后,我尝试配置log4j以进行详细记录,如How to stop INFO messages displaying on spark console?中所述。重命名并使用了conf目录中提供的log4j.properties模板。但是spark-submit无法找到我已经包含在docker build中的log4j.properties文件。

>>> spark-submit --master k8s://https://153.65.225.219:8443 --deploy-mode cluster --files /opt/spark/conf/log4j.properties --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties" --name "Calculate Monthly Revenue" --class mthRev --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=rahulvkulkarni/default:spark-td-run --conf spark.kubernetes.container.image.pullSecrets=regcred local://mthrev_2.10-0.1-SNAPSHOT.jar <username> <password> <server name>
19/05/20 20:02:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/05/20 20:02:52 INFO LoggingPodStatusWatcherImpl: State changed, new state:
         pod name: calculate-monthly-revenue-1558362771110-driver
...
         Container name: spark-kubernetes-driver
         Container image: rahulvkulkarni/default:spark-td-run
         Container state: Terminated
         Exit code: 1


>>> kubectl logs -c spark-kubernetes-driver calculate-monthly-revenue-1558362771110-driver
++ id -u
...
log4j:ERROR Could not read configuration file from URL [file:/opt/spark/conf/log4j.properties].
java.io.FileNotFoundException: /opt/spark/conf/log4j.properties (No such file or directory)
...
log4j:ERROR Ignoring configuration file [file:/opt/spark/conf/log4j.properties].
19/05/17 21:30:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: C:
        at org.apache.hadoop.fs.Path.initialize(Path.java:205)
        at org.apache.hadoop.fs.Path.<init>(Path.java:171)
        at org.apache.hadoop.fs.Path.<init>(Path.java:93)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:211)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657)
        at org.apache.spark.deploy.DependencyUtils$.org$apache$spark$deploy$DependencyUtils$$resolveGlobPath(DependencyUtils.scala:192)
        at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:147)
        at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:145)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$4.apply(SparkSubmit.scala:355)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$4.apply(SparkSubmit.scala:355)
        at scala.Option.map(Option.scala:146)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 2: C:
        at java.net.URI$Parser.fail(URI.java:2848)
        at java.net.URI$Parser.failExpecting(URI.java:2854)
        at java.net.URI$Parser.parse(URI.java:3057)
        at java.net.URI.<init>(URI.java:746)
        at org.apache.hadoop.fs.Path.initialize(Path.java:202)
        ... 23 more
[INFO  tini (1)] Main child exited normally (with status '1')

我尝试了几种指定log4j.properties文件的变体:Windows笔记本电脑上的本地文件(file:///C$/Users//spark-2.4.3-bin-hadoop2.7/conf/log4j.properties和file:/// C:/Users//spark-2.4.3-bin-hadoop2.7/conf/log4j.properties),Linux容器中的本地文件(file:/// opt / spark / conf / log4j .properties)。但我不断收到消息:

log4j:ERROR Could not read configuration file from URL [file:/C$/Users/<my-username>/spark-2.4.3-bin-hadoop2.7/conf/log4j.properties].

当我尝试不带冒号(C :)的路径(即Linux路径或带有C $的Windows路径)时,IllegalArgumentException异常消失了。

但是我仍然没有得到程序的期望输出,也不知道错误/是什么!

2 个答案:

答案 0 :(得分:0)

在应用程序jar的规范中spark-submit命令中有一个错字。我仅使用两个正斜杠:local://hello_2.12-0.1.0-SNAPSHOT.jar。因此,Spark无法找到它,并且(我认为)无声地忽略了它,因此没有任何工作要做。因此,没有消息。我希望它至少会发出警告。

将其更改为三个斜杠,然后向前移动: 本地:///hello_2.12-0.1.0-SNAPSHOT.jar

我现在有另一个与Kubernetes RBAC有关的问题,我将单独解决。 log4j问题仍然存在,但现在对我来说不是一个问题。

答案 1 :(得分:0)

我通过将配置文件部署到Blob来解决此问题 https:// $(container_name).blob.core.windows.net / jars / log4jconfig1

并给他配置火花提交

--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=https://<container_name>.blob.core.windows.net/jars/log4jconfig1" \

--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=https://<container_name>.blob.core.windows.net/jars/log4jconfig1" \