应用错误收集

时间：2018-06-28 19:27:10

标签： apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

Spark文档介绍了如何创建untyped user defined aggregate function（code）（又名udaf）和strongly-typed aggregator（code）（又是{{1的子类） }}。

我知道您可以通过org.apache.spark.sql.expressions.Aggregator注册一个在SQL中使用的udaf，然后像spark.udf.register("udafName", udafInstance)一样使用它。

是否也可以在sql中使用聚合器？

答案 0 :(得分：1)

Aggregator API并不是专门为“ {strong}”类型Datasets设计的。您会注意到，它不需要Columns，但总是对整个记录对象起作用。

这真的不适合SQL处理模型：

要与SQL API一起使用，您可以创建UserDefinedAggregateFunction，可以使用standard methods注册。