如何总结Scala数组的每一列?

时间:2015-10-01 03:07:00

标签: arrays scala

如果我在Scala中有一个数组数组(类似于矩阵),那么总结矩阵每列的有效方法是什么?例如,如果我的数组数组如下所示:

val arr =  Array(Array(1, 100, ...), Array(2, 200, ...), Array(3, 300, ...))

我希望总结每一列(例如,总结所有子数组的第一个元素,总结所有子数组的第二个元素等)并获得如下所示的新数组:

newArr = Array(6, 600, ...)

如何在Spark Scala中有效地完成此操作?

4 个答案:

答案 0 :(得分:5)

Listarr.toList.transpose.map(_.sum) 有合适的.transpose method可以提供帮助,但我无法说出它的效率如何:

.toArray

(如果您特别需要将结果作为数组,则调用0.0,0.0,"HTML Snippet example","<html><p><a href=""http://google.com/"">URL Link</a></p></html>" 0.0,0.0,"HTML Snippet example","<html><p>Paragraph</p><p><blockquote>Blockquote</blockquote></p><br>Line break before and after<br><p>Cite tag: <cite>The Scream</cite> by Edward Munch. Painted in 1893.</p><p>Dfn tag: <dfn>HTML</dfn> is the standard markup language for creating web pages.</p><div align=""right"">Div align right</div><p>em tag: <em>Emphasized text</em></p><p><font size=""100px"" color=""red"">Font tag 100px color red</font></p><h1>H1 tag</h1><h2>H2 tag</h2><h3>H3 tag</h3><h4>H4 tag</h4><h5>H5 tag</h5><h6>H6 tag</h6><p><small>Small text</small></p><p><big>Big</big> Text</p><p><b>Bold</b> Text</p><p><i>Italic text</i></p><p><strike>Strike text</strike></p><p><strong>Strong text</strong></p><p>This text contains <sub>subscript</sub> text.</p><p>This text contains <sup>super</sup> text.</p></html>" 0.0,0.0,"HTML Snippet example","<html><p><tt>Teletype text</tt></p></html>" 0.0,0.0,"HTML Snippet example","<html><p><u>Underlined text</u></p></html>" 。)

答案 1 :(得分:4)

使用breeze Vector

scala> val arr =  Array(Array(1, 100), Array(2, 200), Array(3, 300))
arr: Array[Array[Int]] = Array(Array(1, 100), Array(2, 200), Array(3, 300))

scala> arr.map(breeze.linalg.Vector(_)).reduce(_ + _)
res0: breeze.linalg.Vector[Int] = DenseVector(6, 600)

如果输入稀疏,您可以考虑使用breeze.linalg.SparseVector

答案 2 :(得分:4)

实际上,@ zero323提到的线性代数矢量库通常是更好的选择。

如果你不能使用矢量库,我建议编写一个函数col2sum,它可以将两列相加 - 即使它们的长度不同 - 然后使用Array.reduce将此操作扩展到N列。使用reduce是有效的,因为我们知道总和不依赖于操作的顺序(即1 + 2 + 3 == 3 + 2 + 1 == 3 + 1 + 2 == 6):

def col2sum(x:Array[Int],y:Array[Int]):Array[Int] = {
    x.zipAll(y,0,0).map(pair=>pair._1+pair._2)
}

def colsum(a:Array[Array[Int]]):Array[Int] = {
    a.reduce(col2sum)
}

val z = Array(Array(1, 2, 3, 4, 5), Array(2, 4, 6, 8, 10), Array(1, 9));

colsum(z)

--> Array[Int] = Array(4, 15, 9, 12, 15)

答案 3 :(得分:0)

scala> val arr =  Array(Array(1, 100), Array(2, 200), Array(3, 300 ))
arr: Array[Array[Int]] = Array(Array(1, 100), Array(2, 200), Array(3, 300))

scala> arr.flatten.zipWithIndex.groupBy(c => (c._2 + 1) % 2)
       .map(a => a._1 -> a._2.foldLeft(0)((sum, i) => sum + i._1))

res40: scala.collection.immutable.Map[Int,Int] = Map(2 -> 600, 1 -> 6, 0 -> 15)

展平数组, zipWithIndex 获取索引, groupBy 将新数组映射为列数组,foldLeft将列数组相加。