我在scala中有一组向量:
import org.apache.mahout.math.{ VectorWritable, Vector, DenseVector }
import org.apache.mahout.clustering.dirichlet.UncommonDistributions
val data = new ArrayBuffer[Vector]()
for (i <- 100 to num) {
data += new DenseVector(Array[Double](
i % 30,
UncommonDistributions.rNorm(100, 100),
UncommonDistributions.rNorm(100, 100)
)
}
假设我想将第一和第三个字段按第一行分组。
有什么更好的方法呢?
答案 0 :(得分:1)
我建议使用集合中的 groupBy 方法:
这将根据您指定的鉴别器创建一个Vector of Map。
编辑:一些代码示例:
// I created a different Array of Vector as I don't have Mahout dependencies
// But the output is similar
// A List of Vectors with 3 values inside
val num = 100
val data = (0 to num).toList.map(n => {
Vector(n % 30, n / 100, n * 100)
})
// The groupBy will create a Map of Vectors where the Key is the result of the function
// And here, the function return the first value of the Vector
val group = data.groupBy(v => { v.apply(0) })
// Also a subset of the result:
// group:
// scala.collection.immutable.Map[Int,List[scala.collection.immutable.Vector[Int]]] = Map(0 -> List(Vector(0, 0, 0), Vector(0, 0, 3000), Vector(0, 0, 6000), Vector(0, 0, 9000)), 5 -> List(Vector(5, 0, 500), Vector(5, 0, 3500), Vector(5, 0, 6500), Vector(5, 0, 9500)))
答案 1 :(得分:0)
在列表中使用groupBy函数,然后映射每个组 - 只需一行代码:
data groupBy (_(0)) map { case (k,v) => k -> (v map (_(2)) sum) }