这是我的数据框架构:
`root
|-- customerid: string (nullable = true)
|-- event: string (nullable = true)
|-- groupe1: string (nullable = false)
|-- groupe2: string (nullable = false)
|-- groupe3: string (nullable = false)
这是我数据框的一部分
+----------------+--------+--------------------+--------------+----------------+
|customerid| | event | group1 | group2 | groupe3 |
+----------------+--------+--------------------+--------------+----------------+
| 4454545 | |[aaa,0,0,0] |[555,0,88,0,0]| [3190,0,0,0,0] |
| 8878787787 |2019 |[bbb,0,fff,0,0] | [420,0,0,0,0]| [9580,0,0,0,0] |
| 12555888888|2019 |[cccc,0,fff,eee,0] | [385,0,0,0,0]| [4995,0,0,0,0] |
+----------------+--------------------+--------------------+-------------------+
我尝试了以下代码:
val zip = udf((xs: Seq[String], ys: Seq[String], zs: Seq[String]) => (xs, ys, zs).zipped.toSeq)
df.printSchema
val df4=df.withColumn("vars", explode(zip($"groupe1", $"groupe2",$"groupe3"))).select(
$"customerid", $"event",
$"vars._1".alias("groupe1"), $"vars._2".alias("groupe2"),$"vars._2".alias("groupe3"))
我收到此错误:
cannot resolve 'UDF(groupe1, groupe2, groupe3)' due to data type mismatch: argument 1 requires array<string> type, however, '`groupe1`' is of string type. argument 2 requires array<string> type, however, '`groupe2`' is of string type. argument 3 requires array<string> type, however, '`groupe3`' is of string type.;;
答案 0 :(得分:0)
列group1,group2,group3的类型为字符串,因此与具有Seq [string]参数的udf不兼容。也许您应该将udf的输入更改为字符串类型。