我有一个Spark Streaming应用程序,该应用程序从Kafka cluster1中使用,并将聚合的数据写入Kafka cluster2。 Spark应用程序和Kafka cluster2部署在OpenShift的两个单独的容器中。
从spark应用程序日志中,似乎确实读取了来自Kafka cluster1的数据,运行了聚合,并调用了输出数据帧的启动操作以将数据写入Kafka cluster2。似乎没有错误。这是日志示例
[2019-04-08 11:47:39,691] INFO Reading stream from server: asdasd1.aa.ss.net:9092,asdasd2.aa.ss.net:9092,asdasd3.aa.ss.net:9092
[2019-04-08 11:47:42,532] INFO Acquired Kakfa stream
[2019-04-08 11:47:43,809] INFO Preparing to run
[2019-04-08 11:47:44,493] INFO Beginning aggregation:
[2019-04-08 11:47:44,898] INFO Writing query: (myapplication)
.
.
.
[2019-04-08 15:15:34,859] INFO Asked to remove non-existent executor 12223 (org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint)
[2019-04-08 15:15:34,859] INFO Trying to remove executor 12223 from BlockManagerMaster. (org.apache.spark.storage.BlockManagerMasterEndpoint)
[2019-04-08 15:15:34,860] INFO Executor updated: app-20190408114740-0000/12225 is now RUNNING (org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint)
[2019-04-08 15:15:36,584] INFO Executor updated: app-20190408114740-0000/12224 is now EXITED (Command exited with code 1) (org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint)
[2019-04-08 15:15:36,584] INFO Executor app-20190408114740-0000/12224 removed: Command exited with code 1 (org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend)
[2019-04-08 15:15:36,584] INFO Executor added: app-20190408114740-0000/12226 on worker-20190408093440-10.225.30.29-33371 (10.225.30.29:33371) with 8 core(s) (org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint)
[2019-04-08 15:15:36,584] INFO Removal of executor 12224 requested (org.apache.spark.storage.BlockManagerMaster)
[2019-04-08 15:15:36,584] INFO Asked to remove non-existent executor 12224 (org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint)
[2019-04-08 15:15:36,584] INFO Trying to remove executor 12224 from BlockManagerMaster. (org.apache.spark.storage.BlockManagerMasterEndpoint)
[2019-04-08 15:15:36,584] INFO Granted executor ID app-20190408114740-0000/12226 on hostPort 10.225.30.29:33371 with 8 core(s), 3.0 GB RAM (org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend)
[2019-04-08 15:15:36,585] INFO Executor updated: app-20190408114740-0000/12226 is now RUNNING (org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint)
但是,我没有在Kafka集群2上看到消息。我在openshift的Kafka窗格上运行了该消息。
./ bin / kafka-console-consumer.sh --bootstrap-server kafka-service:9092 --topic process.mymetrics
其中kafka-service是如下的OpenShift服务
kafka-service ClusterIP xx.xxx.xxx.xxx <none> 9092/TCP
由于没有应用程序错误,因此似乎spark应用程序正在写入数据。您能给我任何有关调试方法的意见吗?请注意,直到几周前,该设置才能正常运行。
Update1: 我测试了将流写入控制台,这有效
df.writeStream
.outputMode(OutputMode.Append())
.format("console")
但是写入Kakfa cluster2无效。