当额外的参数抛出NPE时,Spark UDF传递数据帧

时间:2019-05-11 20:58:47

标签: apache-spark

我正在尝试将数据帧作为额外参数传递给udf函数,并通过curring调用,但它会抛出NPE

下面是无效的代码-我将inputDF作为参数传递给UDF

//outputDF -> DataFrame
//inputDF -> DataFrame(i did not mention in below code. assume it exists)
//Add new column("New Column") to outputDF by looking up into the value in inputDF

outputDF.withColumn("New Column", newCol(inputDF)(col("Existing Column")))

//udf
def newCol(df: DataFrame) = udf( (value: String) => df.filter(col("Existing Column") === value).first.get(0).toString)

输出:空指针异常(udf函数中df为空)

下面的代码有效-UDF函数直接在其中访问inputDF

outputDF.withColumn("New Column", newCol(col("Existing Column")))
def newCol = udf( (value: String) => inputDF.filter(col("Existing Column") === value).first.get(0).toString)

我想了解为什么第一种方法不起作用并抛出NPE

0 个答案:

没有答案