在以下代码段中,我试图从Spark 3.0的结构化流中获取指向时间戳measuredAt
的最新值的行。
我这样做如下:
final Dataset<Row> readValidDataset = spark.readStream()
.format("delta")
.load(validDeltaTable)
.withWatermark("measuredAt", "2 hours");
readValidDataset
.groupBy("city")
.agg(max("measuredAt").as("measuredAt"))
.join(readValidDataset, "measuredAt")
.writeStream()
.queryName("read-latest")
.format("console")
.start();
运行代码时,我得到
Exception in thread "main" org.apache.spark.sql.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;;
我已经阅读了文档和SO相关主题,但是找不到合适的建议。
任何想法将不胜感激!