我rdd1
有labels(0,1,4)
,另有rdd2
我有文字。我想将rdd1
映射到rdd2
,以便row1
的{{1}}映射到rdd1
的{{1}},依此类推。
我试过了:
row1
它给了我错误:
rdd2
有人可以指导我吗? 样品输出:rdd1- 标签& rdd2- 文字
rdd2.join(rdd1.map(lambda x: (x[0], x[0:])))
答案 0 :(得分:0)
如果您有rdd1
val rdd1 = sc.parallelize(List(0,0,4,1,4,1))
和rdd2
为
val rdd2 = sc.parallelize(List("i hate painting i have white paint all over my hands.",
"Bawww I need a haircut No1 could fit me in before work tonight. Sigh.",
"I had a great day",
"what is life.",
"He sings so good",
"i need to go to sleep ....goodnight"))
我想用rdd2映射rdd1,使得rdd1的row1与rdd2的row1映射,依此类推。
使用zip功能
一个简单的zip
函数应符合您的要求
rdd1.zip(rdd2)
将输出为
(0,i hate painting i have white paint all over my hands.)
(0,Bawww I need a haircut No1 could fit me in before work tonight. Sigh.)
(4,I had a great day)
(1,what is life.)
(4,He sings so good)
(1,i need to go to sleep ....goodnight)
zipWithIndex并加入
这种方法可以使用zip
为您提供与上述相同的输出(此方法也很昂贵)
rdd1.zipWithIndex().map(_.swap).join(rdd2.zipWithIndex().map(_.swap)).map(_._2)
我希望答案很有帮助