Question

我创建了一个用户定义的聚合函数。它将所有累积值连接到列表（ArrayType）。它被称为EdgeHistory。

如果我没有指定窗口，它可以正常工作。它返回所有list的数组。但是通过以下示例，它失败了：

case class ExampleRow(n: Int, list: List[(String, String, Float, Float)])

val x = Seq(
  ExampleRow(1, List(("a", "b", 1f, 2f), ("c", "d", 2f, 3f))),
  ExampleRow(2, List(("a", "b", 2f, 4f), ("c", "d", 4f, 6f))),
  ExampleRow(3, List(("a", "b", 4f, 8f), ("c", "d", 8f, 12f)))
)

val df = sc.parallelize(x).toDF()

val edgeHistory = new EdgeHistory()

val y = df.agg(edgeHistory('list).over(Window.orderBy("n").rangeBetween(1, 0)))

它会抛出错误：

STDERR: Exception in thread "main" java.lang.UnsupportedOperationException: EdgeHistory('list) is not supported in a window operation.
    at org.apache.spark.sql.expressions.WindowSpec.withAggregate(WindowSpec.scala:177)
    at org.apache.spark.sql.Column.over(Column.scala:1052)
    at szdavid92.AnalyzeGraphStream$.main(AnalyzeGraphStream.scala:75)

错误消息似乎非常简单。看来你无法在windows中定义UDAF。我理解正确吗？为什么会出现这种限制？

更新

我尝试使用SQL语法，然后收到相关错误

df.registerTempTable("data")
sqlContext.udf.register("edge_history", edgeHistory)

val y = sqlContext.sql(
  """
    |SELECT n, list, edge_history(list) OVER (ORDER BY n ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
    |FROM data
  """.stripMargin)

即

Exception in thread "main" org.apache.spark.sql.AnalysisException: Couldn't find window function edge_history;
    at org.apache.spark.sql.hive.ResolveHiveWindowFunction$$anonfun$apply$1$$anonfun$applyOrElse$1$$anonfun$3.apply(hiveUDFs.scala:288)
    at org.apache.spark.sql.hive.ResolveHiveWindowFunction$$anonfun$apply$1$$anonfun$applyOrElse$1$$anonfun$3.apply(hiveUDFs.scala:288)

我可以在窗口函数中使用UDAF吗？

0 个答案: