我正在尝试转换像这样的pyspark rdd:
之前:
[
[('169', '5'), ('2471', '6'), ('48516', '10')],
[('58', '7'), ('163', '7')],
[('172', '5'), ('186', '4'), ('236', '6')]
]
后:
[
[('169', '5'), ('2471', '6')],
[('169', '5'),('48516', '10')],
[('2471', '6'), ('48516', '10')],
[('58', '7'), ('163', '7')],
[('172', '5'), ('186', '4')],
[('172', '5'), ('236', '6')],
[('186', '4'), ('236', '6')]
]
我们的想法是遍历每一行并成对创建新行。我尝试用lambda
教程自己找到一个解决方案,但没有任何好处。我可以请求帮助吗?如果这是重复其他问题,我道歉。谢谢!
答案 0 :(得分:1)
我将flatMap
用于itertools.combinations
:
from itertools import combinations
rdd.flatMap(lambda xs: combinations(xs, 2))