Scala:创建一个基于地图的UDF

时间:2018-05-21 09:05:26

标签: scala apache-spark apache-spark-sql user-defined-functions

数据帧的架构df10

root
|-- ID: string (nullable = true)
|-- KEY: array (nullable = true)
|    |-- element: string (containsNull = true)

代码

val gid1 = 505
val array1: Array[String] = Array("atm_P3", "fee_P6", "c_P8", "card_P4", "iss_P5", "vat_P7")
//simplistic udf
val isSubsetArrayUDF = udf { a : Seq[String] =>  if (!{for (elem <- a) yield array1.contains(elem)}.contains(false) == true) gid1 else 0}
val df11 = df10.withColumn("is_subset_KEY", isSubsetArrayUDF(col("tran_particular")))

我需要分配每个&#39; KEY&#39;在df10 a&#39; GID&#39;使用给定的地图

Map(KEY -> WrappedArray(atm_P3, fee_P6, c_P8, card_P4, iss_P5, vat_P7, cif_P1, cif_P2), GID -> 505)
Map(KEY -> WrappedArray(atm_P3, fee_P6, c_P8, card_P4, iss_P5, vat_P7, cif_P2), GID -> 423)
...

如何使用udf实现这一目标?

0 个答案:

没有答案