连接rdd列表中的元素python spark

时间:2017-04-05 21:27:12

标签: apache-spark pyspark rdd

我的RDD如下,

>>> rdd.collect()
[([u'steve'], [u'new', u'york'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]

我如何获得新的RDD,

[([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]

我尝试用JOIN映射到新的rdd,但它不起作用

1 个答案:

答案 0 :(得分:0)

我能解决这个问题,

>>> rdd2=rdd.map(lambda l: [''.join(x) for x in l])
>>> rdd2.map(tuple).collect()
[([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]