我有一个要操作的简单数据框:
+---+----+
| id|name|
+---+----+
| 1| a|
| 2| b|
| 3| c|
| 4| d|
| 5| e|
+---+----+
我试图基于“ id”列和当我调用withColumn()时将通过的值(在这种情况下为字符串“ hey”)添加另一列。
根据其他StackOverflow帖子(Adding a new column to a Dataframe by using the values of multiple other columns in the dataframe - spark/scala),我应该能够使用UDF,UserDefinedFunctions,但是在使用下面的代码从IntelliJ进行的UDF调用上出现“不适用”错误
val table = Seq(("1", "a"), ("2", "b"), ("3", "c"), ("4", "d"), ("5", "e")).toDF("id", "name")
def newID(s: String, v: String): String = {
s.concat("-" + v)
}
val newUDF = udf(newID _)
table.show()
val v = "hey"
val newO = table.withColumn("someOp", newUDF($"id", v)) // this works if I
// use the column "name" instead of the String v which looks
// like -> newUDF($"id", $"name")
newO.show()
所以,我可以得到:
+---+----+------+
| id|name|someOp|
+---+----+------+
| 1| a| 1-a|
| 2| b| 2-b|
| 3| c| 3-c|
| 4| d| 4-d|
| 5| e| 5-e|
+---+----+------+
但不是:
+---+----+--------+
| id|name| someOp|
+---+----+--------+
| 1| a| 1-hey|
| 2| b| 2-hey|
| 3| c| 3-hey|
| 4| d| 4-hey|
| 5| e| 5-hey|
+---+----+--------+