I have a problem with using instr()
function in Spark. The definition of function looks like below:
instr(Column str, String substring)
The problem is that I need to use Column type value as second argument. I created example function which get two Column type arguments:
def test_func(val1:Column, val2:Column) : Column = {
val instr_val : Column = instr(val2, val1)
return instr_val
}
val df = sc.parallelize(Seq((123, "940932123"), (940, "123940932"), (932, "940123932"))).toDF("KOL1", "KOL2")
df.withColumn("KOL3", test_func($"A", $"B")).show
It gives error like this:
<console>:322: error: type mismatch;
found : org.apache.spark.sql.Column
required: String
val instr_val : Column = instr(val2, val1)
I tried to use expr() function, but it gives error too. Does anyone know how to fix that?
答案 0 :(得分:0)
instr
并非按您想使用的方式使用,但您始终可以尝试定义udf来完成工作:
scala> val instr2_ : (String, String) => Int = (str, sub) => str.indexOfSlice(sub)
// instr2_: (String, String) => Int = <function2>
scala> val instr2 = udf(instr2_)
// instr2: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,IntegerType,Some(List(StringType, StringType)))
scala> df.withColumn("KOL3", instr2($"KOL2",$"KOL1")).show
// +----+---------+----+
// |KOL1| KOL2|KOL3|
// +----+---------+----+
// | 123|940932123| 6|
// | 940|123940932| 3|
// | 932|940123932| 6|
// +----+---------+----+