pyspark如果列具有相同的值

时间:2017-12-06 12:04:10

标签: python python-3.x python-2.7 apache-spark pyspark

我有一个想法,即将不同行中的数据帧值(列表)与相同的键组合在一起。 组合在不同的行中必须具有相同或更多的值,因此我不能仅使用df.groupBy('id')来获得结果。

以下是示例:

+---------+--------------------+
|id       |num_list            |
+---------+--------------------+
|apple    |[11, 12]            |
|apple    |[11, 13 ,14]        |
|apple    |[10, 22, 25]        |
|banana   |[15, 26]            |
|banana   |[15, 29]            |
|banana   |[15, 27]            |
+---------+--------------------+

我们可以找到id=apple有两个记录和两个列表,例如:[11, 12],[11, 13, 14],因此它们会合并到一个新记录id=apple,num_list=[11, 12, 13, 14]

id=apple,num_list=[10, 22, 25]不会合并。

这就是我想要的答案:

+---------+--------------------+
|id       |num_list            |
+---------+--------------------+
|apple    |[11, 12, 13, 14]    |
|apple    |[10, 22, 25]        |
|banana   |[15, 26, 27, 29]    |
+---------+--------------------+

修改

我必须解释一些规则。

就像@Usernamenotfound评论一样,

假设苹果有三个值[11, 12, 14][9, 13 ,14][12,13,27],则答案为[9, 11, 12, 13 ,14, 27]而不是[9, 11, 12, 13 ,14][12,13,27]。< / p>

有一些新的例子:

+--------------------+------------+
|                  id|num_list    |
+--------------------+------------+
|apple               |         [0]|
|apple               |         [0]|
|apple               |         [1]|
|apple               |         [1]|
|apple               |         [2]|
|apple               |         [3]|
|apple               |         [4]|
|apple               |         [5]|
|apple               |         [6]|
|apple               |         [6]|
|apple               |         [7]|
|apple               |      [7, 8]|
|apple               |         [9]|
|apple               |         [9]|
|apple               |         [9]|
|apple               |     [9, 10]|
|apple               | [9, 17, 18]|
|apple               |        [10]|
|apple               |        [10]|
|apple               |        [10]|
+--------------------+------------+

如果我尝试@mayank的代码会得到错误的答案。

+--------------------------------+---------------------------------------------------------------------------------------+
|                              id|num_list                                                                               |
+--------------------------------+---------------------------------------------------------------------------------------+
|apple                           |[0]                                                                                    |
|apple                           |[0]                                                                                    |
|apple                           |[1]                                                                                    |
|apple                           |[2]                                                                                    |
|apple                           |[3]                                                                                    |
|apple                           |[4]                                                                                    |
|apple                           |[5]                                                                                    |
|apple                           |[6]                                                                                    |
|apple                           |[8, 7]                                                                                 |
|apple                           |[9, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 36, 37, 38]|
|apple                           |[12]                                                                                   |
|apple                           |[14]                                                                                   |
|apple                           |[24]                                                                                   |
|apple                           |[31]                                                                                   |
|apple                           |[32]                                                                                   |
|apple                           |[33, 34]                                                                               |
|apple                           |[35]                                                                                   |
|apple                           |[39]                                                                                   |
+--------------------------------+---------------------------------------------------------------------------------------+

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:1)

也许不是最有效的解决方案,但它可以解决您的问题。

subprocess.call