当我尝试从IntelliJ IDEA运行Scala代码时。
目标:将数据从Kafka传输到HDFS。
获取错误
Exception in thread "main" org.apache.spark.sql.AnalysisException: checkpointLocation must be specified either through option("checkpointLocation", ...) or SparkSession.conf.set("spark.sql.streaming.checkpointLocation", ...);
at org.apache.spark.sql.streaming.StreamingQueryManager$$anonfun$3.apply(StreamingQueryManager.scala:210)
at org.apache.spark.sql.streaming.StreamingQueryManager$$anonfun$3.apply(StreamingQueryManager.scala:205)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:204)
at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:278)
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:282)
at kafka_stream.kafka_stream$.read_from_kafka(kafka_stream.scala:67)
at kafka_stream.kafka_stream$.main(kafka_stream.scala:24)
at kafka_stream.kafka_stream.main(kafka_stream.scala)
Process finished with exit code 1
我们尝试了
写入控制台后添加了此代码。
每当我尝试在HDFS中写入数据时,它都在寻找检查点目录。
val query = values //.orderBy("window")
.repartition(1)
.writeStream
.outputMode("append")
.format("parquet")
.option("checkpointLaocation","checkpoints")
.option("path", "hdfs://hostname:8020/tmp/")
//.option("path", "data")
.start()
.awaitTermination()
对此有何帮助表示赞赏?