我有RDD
LabledPoints
。是否可以根据indeces列表选择它的子集?
例如idx=[0,4,5,6,8]
,我希望能够获得一个包含元素0,4,5,6和8的新RDD。
请注意,我对可用的随机样本不感兴趣。
答案 0 :(得分:2)
Yes, you can either:
Choose 1 if the list of values is large, else 2.
Edit to show a code sample for case 1.
val filteringValues = //read the list of values, same as you do your points, just easier
.keyBy(_)
val filtered = parsedData
.keyBy(_.something) // Get the number from your inner structure
.rigthOuterJoin(filteringValues) // This select only from your subset
.flatMap(x => x._2._1) // Map it back to the original type.