我有一个简单的DataFrame
,其架构为:
word: string
process_time: timestamp
我按时间窗口进行分组,并依靠分组的DataFrame
:
val windowedCount = wordsDs
.groupBy(
window($"processing_time", "15 seconds")
).count()
如何使用Spark SQL的语法将此代码移植到SQL?
答案 0 :(得分:2)
这几乎是一对一的翻译:
spark.sql("""SELECT window(process_time, "15 seconds"), count(*)
FROM wordDs
GROUP BY window(process_time, "15 seconds")""")
或:
spark.sql("""WITH tmp AS(SELECT window(process_time, "15 seconds") w FROM wordDs)
SELECT w, count(*) FROM tmp GROUP BY w""")