Spark Scala数据帧具有单个group by的多个聚合。 例如
val groupped = df.groupBy("firstName", "lastName").sum("Amount").toDF()
但是如果我需要计数,总和,最大值等
/* Below Does Not Work , but this is what the intention is
val groupped = df.groupBy("firstName", "lastName").sum("Amount").count().toDF()
*/
输出
groupped.show()
--------------------------------------------------
| firstName | lastName| Amount|count | Max | Min |
--------------------------------------------------
答案 0 :(得分:2)
case class soExample(firstName: String, lastName: String, Amount: Int)
val df = Seq(soExample("me", "zack", 100)).toDF
import org.apache.spark.sql.functions._
val groupped = df.groupBy("firstName", "lastName").agg(
sum("Amount"),
mean("Amount"),
stddev("Amount"),
count(lit(1)).alias("numOfRecords")
).toDF()
display(groupped)