将密钥分配给pyspark rdd中的所有值

时间:2018-08-16 09:46:07

标签: apache-spark pyspark rdd

我的格式是rdd:

[(1111, [(0, 174, 12.44, 3.125, u'c29'), (0, 175, 12.48, 6.125, u'c59')], 2222, [(0, 178, 19.41, 2.165, u'c79'), (0, 171, 18.41, 3.125, u'c41')]]

我如何展平中介列表并获得作为元组列表的rdd,其中每个元组都包含相应的键及其值,例如:

[(1111, 0, 174, 12.44, 3.125, u'c29'), (1111, 0, 175, 12.48, 6.125, u'c59'), (2222, 0, 178, 19.41, 2.165, u'c79'), (2222, 0, 171, 18.41, 3.125, u'c41')]

1 个答案:

答案 0 :(得分:1)

flatMap

rdd.flatMap(lambda x: [(x[0], ) + y for y in x[1]])