如何使用Spark withWatermark SQL代替数据集API

时间:2018-11-20 09:28:03

标签: apache-spark

在文档中,结构化流在数据集api中使用withWatermark,如下所示:

Dataset<Row> windowedCounts = words
.withWatermark("timestamp", "10 minutes")
.groupBy(
    functions.window(words.col("timestamp"), "10 minutes", "5 minutes"),
    words.col("word"))
.count();

但是,我不想使用数据集api,它会像这样触发结构化的提供sql:

select window dt,sum(ord_amount),count(1) from topic2 
group by window(update_time,'10 minutes', '5 minutes') 
**withwatermark(update_time,'60 minutes')**

0 个答案:

没有答案