具有如下表(Data
):
color status freq
red y 1
blue y 1
green y 2
预期输出:red,blue 1 green 2
select color , freq from data where status = 'y' group by(freq)
现在,我们要为red,blue
得到的结果为'freq= 1
,对于green
得到的结果为freq =2
如何获取按频率分组的颜色列表,请更正上述sql查询。
使用first(colour)
时,它仅返回第一种颜色,但期望所有颜色按频率分组。
根据输出更正SQL查询
答案 0 :(得分:0)
尝试一下:
import org.apache.spark.sql.functions._
import spark.implicits._
//import org.apache.spark.sql._
//import org.apache.spark.sql.types._
val df = Seq(
("green","y", 4),
("blue","n", 7),
("red","y", 7),
("yellow","y", 7),
("cyan","y", 7)
).toDF("colour", "status", "freq")
val df2 = df.where("status = 'y'")
.select($"freq", $"colour")
.groupBy("freq")
.agg(collect_list($"colour"))
df2.show(false)
返回:
+----+--------------------+
|freq|collect_list(colour)|
+----+--------------------+
|4 |[green] |
|7 |[red, yellow, cyan] |
+----+--------------------+