Question

数据

first_name,id,age

abc,1,53

bcd,2,68

abc,3,68

将此数据转换为personDF数据框

personDF.groupBy("id").agg(when(lower($"first_name")==="abc",min($"age")).otherwise(max($"age")).alias("min_age")).show()

我希望根据时间条件获得最小年龄和最大年龄。它不起作用。

请让我知道我该怎么做？

Answer 1

您需要按'first_name'列进行分组才能使其正常工作：

df.groupBy("first_name").agg(when(lower($"first_name")==="abc",min($"age")).otherwise(max($"age")).alias("min_age")).show()

+----------+-------+
|first_name|min_age|
+----------+-------+
|       abc|     53|
|       bcd|     68|
+----------+-------+

Answer 2

您不能在对数据框进行分组的同一列上使用聚合函数。这不起作用，因为一个组只有一个您已分组的列的值。

请查看此link以便更好地理解。

对spark中的方法进行聚合

2 个答案: