我有两个RDD,例如: firstmapRDD - (0-14,List(0,4,19,19079,42697,444,42748))
secondmapRdd-(0-14,List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44, 45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69, 70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94) )
我想找到路口。
我试过,var interResult = firstmapRDD.intersection(secondmapRdd),它在输出文件中没有显示结果。
我也试过,基于键组合,mapRDD.cogroup(secondMapRDD).filter(x =>),但我不知道如何找到两个值之间的交集,是x => x。 _1.intersect(x._2),有人可以帮助我解释语法吗?
即使这会引发编译时错误,mapRDD.cogroup(secondMapRDD).filter(x => x._1.intersect(x._2))
var mapRDD = sc.parallelize(map.toList)
var secondMapRDD = sc.parallelize(secondMap.toList)
var interResult = mapRDD.intersection(secondMapRDD)
可能是因为ArrayBuffer [List []]值,因为交集不起作用。是否有任何黑客可以删除它?
我试过这个
var interResult = mapRDD.cogroup(secondMapRDD).filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case (k,(l,r)) => (k, l.toList.intersect(r.toList))}
仍然有一个空列表!
答案 0 :(得分:3)
由于您正在查看intersect on values
,因此您需要join
两个RDD,获取所有匹配的值,然后对值进行交叉。
示例代码:
val firstMap = Map(1 -> List(1,2,3,4,5))
val secondMap = Map(1 -> List(1,2,5))
val firstKeyRDD = sparkContext.parallelize(firstMap.toList, 2)
val secondKeyRDD = sparkContext.parallelize(secondMap.toList, 2)
val joinedRDD = firstKeyRDD.join(secondKeyRDD)
val finalResult = joinedRDD.map(tuple => {
val matchedLists = tuple._2
val intersectValues = matchedLists._1.intersect(matchedLists._2)
(tuple._1, intersectValues)
})
finalResult.foreach(println)
输出
(1,List(1, 2, 5))