我正在尝试运行一个结构化的流媒体应用程序,它将输出文件作为镶木地板写入Google云存储。我没有看到任何错误。但它不会将文件写入GCS位置。我只能看到spark-metadata文件夹。知道如何调试吗?
WindowDuration = "60 minutes";
SlideDuration = "10 minutes";
Data_2 = complete_data;
Data_2 = data_2.withColumn("creationDt", functions.to_timestamp( functions.from_unixtime(col(topics+"."+event_timestamp).divide(1000.0))));
Data_2 = data_2
.withWatermark("creationDt","1 minute")
.groupBy(col(topics+"."+keyField),functions.window(col("creationDt"), windowDuration, slideDuration),col(topics+"."+aggregateByField))
.count();
Query_2 = data_2
.withColumn("startwindow", col("window.start"))
.withColumn("endwindow", col("window.end"))
.withColumn("endwindow_date", col("window.end").cast(DataTypes.DateType))
.writeStream()
.format("parquet")
.partitionBy("endwindow_date")
.option("path",dataFile_2)
.option("truncate", "false")
.outputMode("append")
.option("checkpointLocation", checkpointFile_2).start();
Query_2.awaitTermination()
答案 0 :(得分:0)
我认为问题出在.outputMode("append")
行。 GCS不是文件系统,不支持追加模式。
我猜这条线爆炸了,异常只是吞噬了某个地方: https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFileSystemBase.java#L1175