如何比较由定界符分隔的两列的长度,如果不相等,则如何插入Null

时间:2019-07-15 11:10:13

标签: scala apache-spark

我有一个带有如下所示列的数据框,我需要检查所有列的长度,然后将它们相互比较(如果不相同),然后在一行中以列中最大分隔符的数量为基础添加一个deilmiter。

注意:列数可能会有所不同,有时是3,有时是8

输入:

+--------+-------+----------------------------------+------------+-----------+
|ID      |Type   |column1                           |column2     |column3    |
+--------+-------+----------------------------------+------------+-----------+
|1       |ABC    | Adventure*Comedy                 |Adventure   |100        |
|2       |ABC    | Animation*Drama*War]             |War* Drama  |300        |
|3       |ABC    | Adventure*Drama*Action*Thriller  |Drama       |           |
|4       |ABC    |                                  |Action**    |100*243*119|
|5       |ABC    |                                  |***         |           |
+--------++------+----------------------------------+------------+-----------+

val check = udf { (a: String,b: String) => if (a.length == b.length){ 1 }else{ 0 }}

预期输出:

+--------+-------+----------------------------------+------------+-----------+
|ID      |Type   |column1                           |column2     |column3    |
+--------+-------+----------------------------------+------------+-----------+
|1       |ABC    | Adventure*Comedy                 |Adventure*  |100*       |
|2       |ABC    | Animation*Drama*War              |War*Drama   |300**      |
|3       |ABC    | Adventure*Drama*Action*Thriller  |Drama***    |***        |
|4       |ABC    | **                               |Action**    |100*243*119|
|5       |ABC    | ***                              |***         |***        |
+--------+-------+----------------------------------+------------+-----------+

0 个答案:

没有答案