由一个领域 - 斯卡拉的矢量和数组

时间:2014-06-16 09:55:09

标签: scala vector mahout

我在scala中有一组向量:

import org.apache.mahout.math.{ VectorWritable, Vector, DenseVector }
import org.apache.mahout.clustering.dirichlet.UncommonDistributions

     val data = new ArrayBuffer[Vector]()
     for (i <- 100 to num) {
      data += new DenseVector(Array[Double](

      i % 30,  

      UncommonDistributions.rNorm(100, 100),

      UncommonDistributions.rNorm(100, 100)
      )



 }

假设我想将第一和第三个字段按第一行分组。

有什么更好的方法呢?

2 个答案:

答案 0 :(得分:1)

我建议使用集合中的 groupBy 方法:

http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Vector@groupBy[K](f:A=>K):scala.collection.immutable.Map[K,Repr]

这将根据您指定的鉴​​别器创建一个Vector of Map。

编辑:一些代码示例:

// I created a different Array of Vector as I don't have Mahout dependencies
// But the output is similar
// A List of Vectors with 3 values inside
val num = 100
val data = (0 to num).toList.map(n => {
  Vector(n % 30, n / 100, n * 100)
})

// The groupBy will create a Map of Vectors where the Key is the result of the function
// And here, the function return the first value of the Vector
val group = data.groupBy(v => { v.apply(0) })

// Also a subset of the result:
// group:
// scala.collection.immutable.Map[Int,List[scala.collection.immutable.Vector[Int]]] = Map(0 -> List(Vector(0, 0, 0), Vector(0, 0, 3000), Vector(0, 0, 6000), Vector(0, 0, 9000)), 5 -> List(Vector(5, 0, 500), Vector(5, 0, 3500), Vector(5, 0, 6500), Vector(5, 0, 9500)))

答案 1 :(得分:0)

在列表中使用groupBy函数,然后映射每个组 - 只需一行代码:

 data groupBy (_(0)) map { case (k,v) => k -> (v map (_(2)) sum) }