DF1是我现在拥有的,我想使DF1看起来像DF2。
所需的输出:
DF1 DF2
+---------+-------------------+ +---------+------------------------------+
| ID | Category | | ID | Category |
+---------+-------------------+ +---------+------------------------------+
| 31898 | Transfer | | 31898 | Transfer (e-Transfer) |
| 31898 | e-Transfer | =====> | 32614 | Transfer (e-Transfer + IMT) |
| 32614 | Transfer | =====> | 33987 | Transfer (IMT) |
| 32614 | e-Transfer + IMT | +---------+------------------------------+
| 33987 | Transfer |
| 33987 | IMT |
+---------+-------------------+
代码:
val df = DF1.groupBy("ID").agg(collect_set("Category").as("CategorySet"))
val DF2 = df.withColumn("Category", $"CategorySet"(0) ($"CategorySet"(1)))
该代码不起作用,如何解决?而且,如果还有其他更好的方法可以执行相同的操作,则我愿意接受。预先谢谢你
答案 0 :(得分:0)
您可以尝试以下方法:
val sliceRight = udf((array : Seq[String], from : Int) => " (" + array.takeRight(from).mkString(",") +")")
val df2 = df.groupBy("ID").agg(collect_set("Category").as("CategorySet"))
df2.withColumn("Category", concat($"CategorySet"(0),sliceRight($"CategorySet",lit(1))))
.show(false)
输出:
+-----+----------------------------+---------------------------+
|ID |CategorySet |Category |
+-----+----------------------------+---------------------------+
|33987|[Transfer, IMT] |Transfer (IMT) |
|32614|[Transfer, e-Transfer + IMT]|Transfer (e-Transfer + IMT)|
|31898|[Transfer, e-Transfer] |Transfer (e-Transfer) |
+-----+----------------------------+---------------------------+
答案 1 :(得分:0)
稍作修改的答案
df.groupBy(“ ID”)。agg(collect_set(col(“ Category”))。as(“ Category”))。withColumn(“ Category”,concat(col(“ Category”)(0), lit(“(“),col(“类别”)(1),lit(“)”))))。显示