使用结构化流将数据写入Kafka时出错

时间:2018-09-20 02:21:23

标签: apache-spark spark-structured-streaming

有一个非常简单的代码可以从Kafka读取数据并将数据帧中的值写入另一个Kafka(我使用spark-2.3.0和kafka-0.10.2.1):

object KafkaDSTest {

  def main(args: Array[String]): Unit = {

    val spark = SparkSession.builder.master("local[*]").appName(s"${this.getClass.getSimpleName}").getOrCreate()
    import spark.implicits._

    spark
    .readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "rdpecore4:9092,rdpecore5:9092,rdpecore6:9092")
    .option("subscribe", "source")
    .option("startingOffsets", "latest")
    .load()
    .selectExpr("CAST(value AS STRING) AS value")
    .writeStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "rdpecore4:9092,rdpecore5:9092,rdpecore6:9092")
    .option("topic", "sink")
    .option("checkpointLocation", "hdfs://rdpecore1:9000/tmp/test/checkpoint")
    .outputMode("update")
    .start()
    .awaitTermination()
  }
}

运行此程序时,出现错误:

Query [id = 42f9fa3c-8033-4283-bef8-5cac5d352cc8, runId = a01fad4d-bb83-4b58-83ce-37cc81bbfae0] terminated with error
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:99)
at com.sun.proxy.$Proxy21.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2053)
......
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: null

这个问题困扰了我几天。如果能得到您的帮助,我将不胜感激。

0 个答案:

没有答案