我有一个带有如下所示列的数据框,我需要检查所有列的长度,然后将它们相互比较(如果不相同),然后在一行中以列中最大分隔符的数量为基础添加一个deilmiter。
注意:列数可能会有所不同,有时是3,有时是8
输入:
+--------+-------+----------------------------------+------------+-----------+
|ID |Type |column1 |column2 |column3 |
+--------+-------+----------------------------------+------------+-----------+
|1 |ABC | Adventure*Comedy |Adventure |100 |
|2 |ABC | Animation*Drama*War] |War* Drama |300 |
|3 |ABC | Adventure*Drama*Action*Thriller |Drama | |
|4 |ABC | |Action** |100*243*119|
|5 |ABC | |*** | |
+--------++------+----------------------------------+------------+-----------+
val check = udf { (a: String,b: String) => if (a.length == b.length){ 1 }else{ 0 }}
预期输出:
+--------+-------+----------------------------------+------------+-----------+
|ID |Type |column1 |column2 |column3 |
+--------+-------+----------------------------------+------------+-----------+
|1 |ABC | Adventure*Comedy |Adventure* |100* |
|2 |ABC | Animation*Drama*War |War*Drama |300** |
|3 |ABC | Adventure*Drama*Action*Thriller |Drama*** |*** |
|4 |ABC | ** |Action** |100*243*119|
|5 |ABC | *** |*** |*** |
+--------+-------+----------------------------------+------------+-----------+