Question

我有一个Spark Streaming应用程序，该应用程序从Kafka cluster1中使用，并将聚合的数据写入Kafka cluster2。 Spark应用程序和Kafka cluster2部署在OpenShift的两个单独的容器中。

从spark应用程序日志中，似乎确实读取了来自Kafka cluster1的数据，运行了聚合，并调用了输出数据帧的启动操作以将数据写入Kafka cluster2。似乎没有错误。这是日志示例

[2019-04-08 11:47:39,691] INFO Reading stream from server: asdasd1.aa.ss.net:9092,asdasd2.aa.ss.net:9092,asdasd3.aa.ss.net:9092
[2019-04-08 11:47:42,532] INFO Acquired Kakfa stream
[2019-04-08 11:47:43,809] INFO Preparing to run
[2019-04-08 11:47:44,493] INFO Beginning aggregation:
[2019-04-08 11:47:44,898] INFO Writing query: (myapplication)
.
.
.
[2019-04-08 15:15:34,859] INFO Asked to remove non-existent executor 12223 (org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint)
[2019-04-08 15:15:34,859] INFO Trying to remove executor 12223 from BlockManagerMaster. (org.apache.spark.storage.BlockManagerMasterEndpoint)
[2019-04-08 15:15:34,860] INFO Executor updated: app-20190408114740-0000/12225 is now RUNNING (org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint)
[2019-04-08 15:15:36,584] INFO Executor updated: app-20190408114740-0000/12224 is now EXITED (Command exited with code 1) (org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint)
[2019-04-08 15:15:36,584] INFO Executor app-20190408114740-0000/12224 removed: Command exited with code 1 (org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend)
[2019-04-08 15:15:36,584] INFO Executor added: app-20190408114740-0000/12226 on worker-20190408093440-10.225.30.29-33371 (10.225.30.29:33371) with 8 core(s) (org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint)
[2019-04-08 15:15:36,584] INFO Removal of executor 12224 requested (org.apache.spark.storage.BlockManagerMaster)
[2019-04-08 15:15:36,584] INFO Asked to remove non-existent executor 12224 (org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint)
[2019-04-08 15:15:36,584] INFO Trying to remove executor 12224 from BlockManagerMaster. (org.apache.spark.storage.BlockManagerMasterEndpoint)
[2019-04-08 15:15:36,584] INFO Granted executor ID app-20190408114740-0000/12226 on hostPort 10.225.30.29:33371 with 8 core(s), 3.0 GB RAM (org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend)
[2019-04-08 15:15:36,585] INFO Executor updated: app-20190408114740-0000/12226 is now RUNNING (org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint)

但是，我没有在Kafka集群2上看到消息。我在openshift的Kafka窗格上运行了该消息。

./ bin / kafka-console-consumer.sh --bootstrap-server kafka-service：9092 --topic process.mymetrics

其中kafka-service是如下的OpenShift服务

kafka-service                      ClusterIP   xx.xxx.xxx.xxx   <none>        9092/TCP

由于没有应用程序错误，因此似乎spark应用程序正在写入数据。您能给我任何有关调试方法的意见吗？请注意，直到几周前，该设置才能正常运行。

Update1：我测试了将流写入控制台，这有效

      df.writeStream                      
        .outputMode(OutputMode.Append())
        .format("console")

但是写入Kakfa cluster2无效。

在OpenShift上使用Kafka调试火花结构化流应用程序

0 个答案: