我需要按id
和times
进行分组,并显示最大date
Id Key Times date
20 40 1 20190323
20 41 1 20191201
31 33 3 20191209
我的输出应该是:
Id Key Times date
20 41 1 20191201
31 33 3 20191209
答案 0 :(得分:0)
您只需将groupBy
功能应用于Id
,然后join
与原始数据集分组,即可将Key
列添加到您生成的数据帧。请尝试以下代码,
//your original dataframe
val df = Seq((20,40,1,20190323),(20,41,1,20191201),(31,33,3,20191209))
.toDF("Id","Key","Times","date")
df.show()
//output
//+---+---+-----+--------+
//| Id|Key|Times| date|
//+---+---+-----+--------+
//| 20| 40| 1|20190323|
//| 20| 41| 1|20191201|
//| 31| 33| 3|20191209|
//+---+---+-----+--------+
//group by Id column
val maxDate = df.groupBy("Id").agg(max("date").as("maxdate"))
//join with original DF to get rest of the column
maxDate.join(df, Seq("Id"))
.where($"date" === $"maxdate")
.select("Id","Key","Times","date").show()
//output
//+---+---+-----+--------+
//| Id|Key|Times| date|
//+---+---+-----+--------+
//| 31| 33| 3|20191209|
//| 20| 41| 1|20191201|
//+---+---+-----+--------+