我的RDD如下,
>>> rdd.collect()
[([u'steve'], [u'new', u'york'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]
我如何获得新的RDD,
[([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]
我尝试用JOIN映射到新的rdd,但它不起作用
答案 0 :(得分:0)
我能解决这个问题,
>>> rdd2=rdd.map(lambda l: [''.join(x) for x in l])
>>> rdd2.map(tuple).collect()
[([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]