如何在pyspark中创建lambda值对?

时间:2017-09-27 22:01:27

标签: python apache-spark lambda pyspark

我正在尝试转换像这样的pyspark rdd:

之前:

[
    [('169', '5'), ('2471', '6'), ('48516', '10')], 
    [('58', '7'), ('163', '7')], 
    [('172', '5'), ('186', '4'), ('236', '6')]
]

后:

[ 
    [('169', '5'), ('2471', '6')],
    [('169', '5'),('48516', '10')],
    [('2471', '6'), ('48516', '10')],
    [('58', '7'), ('163', '7')],
    [('172', '5'), ('186', '4')],
    [('172', '5'), ('236', '6')],
    [('186', '4'), ('236', '6')]
]

我们的想法是遍历每一行并成对创建新行。我尝试用lambda教程自己找到一个解决方案,但没有任何好处。我可以请求帮助吗?如果这是重复其他问题,我道歉。谢谢!

1 个答案:

答案 0 :(得分:1)

我将flatMap用于itertools.combinations

from itertools import combinations

rdd.flatMap(lambda xs: combinations(xs, 2))