Question

使用Scala和Spark 1.6.3，我的错误消息是：

org.apache.spark.sql.AnalysisException: expression 'id' is neither present in the group by, nor is it an aggregate function. 
Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

产生错误的代码是：

returnDf.withColumn("colName", max(col("otherCol"))

DataFrame returnDf看起来像：

+---+--------------------+
| id|            otherCol|
+---+--------------------+
|1.0|[0.0, 0.217764172...|
|2.0|          [0.0, 0.0]|
|3.0|[0.0, 0.142646382...|
|4.0|[0.63245553203367...|

using sql syntax时有解决方案。使用我上面使用的语法（即withColumn()函数）的等效解决方案是什么

Answer 1

在使用聚合函数之前，您需要执行groupBy： returnDf.groupBy（col（“ id”））。agg（max（“ otherCol”））

Answer 2

问题在于services.AddMvc(c => c.Filters.Add(typeof(RequestLoggerActionFilter)));是一个聚合函数，它返回一列的最大值，而不是该列每一行中数组的最大值。

要获取数组的最大值，正确的解决方案是使用UDF：

max

表达式“ id”不存在于group by中，也不是聚合函数

2 个答案: