数据帧的架构df10
root
|-- ID: string (nullable = true)
|-- KEY: array (nullable = true)
| |-- element: string (containsNull = true)
代码
val gid1 = 505
val array1: Array[String] = Array("atm_P3", "fee_P6", "c_P8", "card_P4", "iss_P5", "vat_P7")
//simplistic udf
val isSubsetArrayUDF = udf { a : Seq[String] => if (!{for (elem <- a) yield array1.contains(elem)}.contains(false) == true) gid1 else 0}
val df11 = df10.withColumn("is_subset_KEY", isSubsetArrayUDF(col("tran_particular")))
我需要分配每个&#39; KEY&#39;在df10 a&#39; GID&#39;使用给定的地图
Map(KEY -> WrappedArray(atm_P3, fee_P6, c_P8, card_P4, iss_P5, vat_P7, cif_P1, cif_P2), GID -> 505)
Map(KEY -> WrappedArray(atm_P3, fee_P6, c_P8, card_P4, iss_P5, vat_P7, cif_P2), GID -> 423)
...
如何使用udf实现这一目标?