Question

我在scala中有一组向量：

import org.apache.mahout.math.{ VectorWritable, Vector, DenseVector }
import org.apache.mahout.clustering.dirichlet.UncommonDistributions

     val data = new ArrayBuffer[Vector]()
     for (i <- 100 to num) {
      data += new DenseVector(Array[Double](

      i % 30,  

      UncommonDistributions.rNorm(100, 100),

      UncommonDistributions.rNorm(100, 100)
      )



 }

假设我想将第一和第三个字段按第一行分组。

有什么更好的方法呢？

Answer 1

我建议使用集合中的 groupBy 方法：

http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Vector@groupBy[K](f:A=>K):scala.collection.immutable.Map[K,Repr]

这将根据您指定的鉴别器创建一个Vector of Map。

编辑：一些代码示例：

// I created a different Array of Vector as I don't have Mahout dependencies
// But the output is similar
// A List of Vectors with 3 values inside
val num = 100
val data = (0 to num).toList.map(n => {
  Vector(n % 30, n / 100, n * 100)
})

// The groupBy will create a Map of Vectors where the Key is the result of the function
// And here, the function return the first value of the Vector
val group = data.groupBy(v => { v.apply(0) })

// Also a subset of the result:
// group:
// scala.collection.immutable.Map[Int,List[scala.collection.immutable.Vector[Int]]] = Map(0 -> List(Vector(0, 0, 0), Vector(0, 0, 3000), Vector(0, 0, 6000), Vector(0, 0, 9000)), 5 -> List(Vector(5, 0, 500), Vector(5, 0, 3500), Vector(5, 0, 6500), Vector(5, 0, 9500)))

Answer 2

在列表中使用groupBy函数，然后映射每个组 - 只需一行代码：

 data groupBy (_(0)) map { case (k,v) => k -> (v map (_(2)) sum) }

由一个领域 - 斯卡拉的矢量和数组

2 个答案: