Question

鉴于我有一个带有一些列的数据框：

为什么这不起作用？

val output3b = input.withColumn("sum", columnsToConcat.foldLeft(0)((x,y)=>(x+y)))

notebook:16: error: overloaded method value + with alternatives:
 (x: Int)Int <and>
 (x: Char)Int <and>
 (x: Short)Int <and>
 (x: Byte)Int
cannot be applied to (org.apache.spark.sql.Column)
val output3b = input.withColumn("sum", columnsToConcat.foldLeft(0)((x,y)=>(x+y))) // does work
                                                                           ^
notebook:16: error: type mismatch;
found   : Int
required: org.apache.spark.sql.Column
val output3b = input.withColumn("sum", columnsToConcat.foldLeft(0)((x,y)=>(x+y)))

但这是吗？

val output3a = input.withColumn("concat", columnsToConcat.foldLeft(lit(0))((x,y)=>(x+y)))

使用著名的照明功能似乎可以使某些事情变得平滑，但是我不确定为什么。

+---+----+----+----+----+----+------+
| ID|var1|var2|var3|var4|var5|concat|
+---+----+----+----+----+----+------+
|  a|   5|   7|   9|  12|  13|  46.0|
+---+----+----+----+----+----+------+

Answer 1

先决条件：

根据编译器消息和API使用情况，我们可以推断出# concat ["hello"; ", "; "world"];; - : string = "hello, world"是columnsToConcat或等效的货币。
根据约定Seq[o.a.s.sql.Column]，方法需要映射到累加器（初始值）的函数。这是Seq.foldLeft signature
```
foldLeft
```
def foldLeft[B](z: B)(op: (B, A) ⇒ B): B是一种方法，特别是+调用的语法糖。

这表示在以下情况下

：

.+

是

columnsToConcat.foldLeft(0)((x,y)=>(x+y))

，您正在请求columnsToConcat.foldLeft(0)((x: Int, y: Column) => x + y)的{{1}}方法（累加器的推断类型-+），并且由于Int-而没有{{1 }} Int的0方法（该错误已经列出了可用的方法，而且这种方法不存在也就不足为奇了），在当前范围内也不存在{ {1}}到提供这种方法的任何类型。

在第二种情况下，您要问

Int

是

和(org.apache.spark.sql.Column) => Int引用了Int（作为type of lit(0) is Column），并且存在这样的方法which acceptsAny并返回columnsToConcat.foldLeft(lit(0))((x,y)=>(x+y))。由于columnsToConcat.foldLeft(lit(0))((x: Column, y: Column) => x + y)类型约束得到满足

减少Spark / Scala时出现折叠错误

1 个答案: