如何使用SQL for Spark SQL表示时间窗口功能

时间:2018-07-23 22:57:12

标签: apache-spark

我有一个简单的DataFrame,其架构为:

word: string
process_time: timestamp

我按时间窗口进行分组,并依靠分组的DataFrame

val windowedCount = wordsDs
  .groupBy(
    window($"processing_time", "15 seconds")
  ).count()

如何使用Spark SQL的语法将此代码移植到SQL?

1 个答案:

答案 0 :(得分:2)

这几乎是一对一的翻译:

spark.sql("""SELECT window(process_time, "15 seconds"), count(*) 
             FROM wordDs 
             GROUP BY window(process_time, "15 seconds")""")

或:

spark.sql("""WITH tmp AS(SELECT window(process_time, "15 seconds") w FROM wordDs)
             SELECT w, count(*) FROM tmp GROUP BY w""")