我有一个RDD,其中每个元素都是长度为5的元组。比如说:
(Cricket,Game,Outdoor,India, yes)
(Cricket,Game,Outdoor,Australia, yes)
(Hockey,Game,Outdoor,India,yes)
我想将所有具有相同游戏名称的条目添加为:
(Cricket,[Game,Outdoor,India,yes],[Game,Outdoor,Australia,yes])
我如何在scala中执行此操作?
答案 0 :(得分:0)
val base = sc.parallelize(Seq(("Cricket", "Game", "Outdoor", "India", "yes"), ("Cricket", "Game", "Outdoor", "Australia", "yes"), ("Hockey", "Game", "Outdoor", "India", "yes")))
base.map(x => (x._1, x)).groupByKey().map(x => { (x._1, x._2.map(x => (x._2, x._3, x._4, x._5))) }).foreach(println)
结果:
(Cricket,List((游戏,户外,印度,是),(游戏,户外,澳大利亚,是))) (曲棍球,列表((游戏,室外,印度,是)))