查找每个数据子集的排列

时间:2016-11-23 08:33:32

标签: apache-spark

我试图找到数据帧的每一行或RDD的每个元素的排列。

或者:

val df = Seq((1,2,3),(4,5,6)).toDF("A", "B","C")

或:

val rdd = sc.parallelize(List((1,2,3),(4,5,6)))

预期产出:

(1,2,3),(1,3,2),(2,1,3)...

我尝试了以下几种口味但到目前为止没有运气

df.map(row=>row.toSeq.permutations)

val rdd = sc.parallelize(List((1,4),(2,5),(3,6)))
rdd.map(x=>x._1.toSeq.permutations)

1 个答案:

答案 0 :(得分:2)

尝试:

val rdd = sc.parallelize(List((1,2,3),(4,5,6)))
rdd.flatMap(_.productIterator.toList.permutations.collect {
  case List(x: Int, y: Int, z: Int) => (x, y, z) 
})