错误 - 必须指定Spark Scala streaming -checkpoint位置

时间:2018-04-10 10:01:03

标签: java scala apache-spark spark-structured-streaming

当我尝试从IntelliJ IDEA运行Scala代码时。

目标:将数据从Kafka传输到HDFS。

获取错误

Exception in thread "main" org.apache.spark.sql.AnalysisException: checkpointLocation must be specified either through option("checkpointLocation", ...) or SparkSession.conf.set("spark.sql.streaming.checkpointLocation", ...);
    at org.apache.spark.sql.streaming.StreamingQueryManager$$anonfun$3.apply(StreamingQueryManager.scala:210)
    at org.apache.spark.sql.streaming.StreamingQueryManager$$anonfun$3.apply(StreamingQueryManager.scala:205)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:204)
    at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:278)
    at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:282)
    at kafka_stream.kafka_stream$.read_from_kafka(kafka_stream.scala:67)
    at kafka_stream.kafka_stream$.main(kafka_stream.scala:24)
    at kafka_stream.kafka_stream.main(kafka_stream.scala)

Process finished with exit code 1

我们尝试了

写入控制台后添加了此代码。

每当我尝试在HDFS中写入数据时,它都在寻找检查点目录。

val query = values //.orderBy("window")
      .repartition(1)
      .writeStream
      .outputMode("append")
      .format("parquet")
      .option("checkpointLaocation","checkpoints")
      .option("path", "hdfs://hostname:8020/tmp/")
      //.option("path", "data")
     .start()
     .awaitTermination()

对此有何帮助表示赞赏?

0 个答案:

没有答案