scala按id max date查找组

时间:2018-02-23 10:23:27

标签: scala

我需要按idtimes进行分组,并显示最大date

Id  Key  Times  date
20  40    1     20190323
20  41    1     20191201
31  33    3     20191209

我的输出应该是:

Id  Key  Times  date
20  41    1     20191201
31  33    3     20191209

1 个答案:

答案 0 :(得分:0)

您只需将groupBy功能应用于Id,然后join与原始数据集分组,即可将Key列添加到您生成的数据帧。请尝试以下代码,

//your original dataframe
val df = Seq((20,40,1,20190323),(20,41,1,20191201),(31,33,3,20191209))
        .toDF("Id","Key","Times","date")
df.show()

//output
//+---+---+-----+--------+
//| Id|Key|Times|    date|
//+---+---+-----+--------+
//| 20| 40|    1|20190323|
//| 20| 41|    1|20191201|
//| 31| 33|    3|20191209|
//+---+---+-----+--------+

//group by Id column
val maxDate = df.groupBy("Id").agg(max("date").as("maxdate"))

//join with original DF to get rest of the column
maxDate.join(df, Seq("Id"))
 .where($"date" === $"maxdate")
 .select("Id","Key","Times","date").show()

//output
//+---+---+-----+--------+
//| Id|Key|Times|    date|
//+---+---+-----+--------+
//| 31| 33|    3|20191209|
//| 20| 41|    1|20191201|
//+---+---+-----+--------+