Question

我有一个spark scala udf，它将一个参数作为数据框的列，将另一个参数作为List，但是当我运行该函数时，它会抛出错误，将列表参数指向

export const fetchPosts = () => dispatch => { fetch('http://192.168.1.3:6996/recipe') .then(res => res.json()) .then(recipes => dispatch({ type: FETCH_RECIPES, payload: recipes })); }; export const createPost = (postData) => dispatch => { fetch('http://192.168.1.3:6996/recipe', { method: 'POST', headers: { 'content-type': 'application/json' }, body: JSON.stringify(postData) }).then(res => res.json()).then(recipe => dispatch({ type: NEW_RECIPE, payload: recipe })) }; export const fetchPostById= (id) => dispatch => { fetch('http://192.168.1.3:6996' + id) .then(recipe => recipe.json()) .then(recipe => dispatch({ type: FETCH_RECIPE_ID, payload: recipe })) };

我正在使用如下参数运行udf，

udf_name（$“列名”，列表名）

请指导

Answer 1

您需要使用要传递的列表定义UDF的多个实例。由于列表是局部scala变量，因此您可以在调用之前执行此操作（spark会将udf发送给各个执行者）例如

import org.apache.spark.sql.functions._
val df=List("A","B").toDF
def to_be_udf(s: String, l : List[String])=if (l.isEmpty) "" else "has values"
val udf1=udf((s:String) => to_be_udf (s,List("a")))
val udf2=udf((s:String) => to_be_udf (s,List()))
df.select(udf1($"value"),udf2($"value")).show()

+----------+----------+
|UDF(value)|UDF(value)|
+----------+----------+
|has values|          |
|has values|          |
+----------+----------+

Answer 2

您可以使用lit将常量值传递给udf，也可以定义一个返回UDF的方法（我的首选方式）：

def udf_name(List_name:List[String]) = {
  udf((name:String) => {
    // do something 
    List_name.contains(name)
  })
}

val List_name : List[String] = ???

df
  .withColumn("is_name_in_list", udf_name(List_name)($"column_name"))

在Spark Scala函数中将List作为参数传递给错误

2 个答案: