如何合并RDD元组

时间:2017-05-23 03:47:07

标签: scala apache-spark

我想使用reduceByKey合并多个具有相同键的元组, 这是代码:

val data = Array(DenseMatrix((2.0,1.0,5.0),(4.0,3.0,6.0)),
  DenseMatrix((7.0,8.0,9.0),(10.0,12.0,11.0)))

val init = sc.parallelize(data,2)
//getColumn

def getColumn(v:DenseMatrix[Double]) : Map[Int, IndexedSeq[(Int, Double)]]={
  val r = Random
  val index = 0 to v.size - 1
  def func(x:Int, y:DenseMatrix[Double]):(Int,(Int, Double)) =
  {
    ( x,( r.nextInt(10), y.valueAt(x)))
  }
  val rest = index.map{x=> func(x,v)}.groupBy(x=>x._1).mapValues(x=>x.map(_._2))
  rest

}
val out= init.flatMap{ v=> getColumn(v) }
val reduceOutput = tmp.reduceByKey(_++_)

val out2 = out.map{case(k,v)=>k}.collect()  // keys here are not I want

这里有两张图片,第一张是我想的那些[key,value]对,第二张是真正的键,它们不是我想要的,所以输出不对。

我该怎么办?

enter image description here enter image description here

0 个答案:

没有答案