有人可以帮助我理解为矢量引入的“foreachActive”功能的用法。
我试图了解它在MultivariateOnlineSummarizer类中的用法,以便进行汇总统计。
sample.foreachActive { (index, value) =>
if (value != 0.0) {
if (currMax(index) < value) {
currMax(index) = value
}
if (currMin(index) > value) {
currMin(index) = value
}
val prevMean = currMean(index)
val diff = value - prevMean
currMean(index) = prevMean + diff / (nnz(index) + 1.0)
currM2n(index) += (value - currMean(index)) * diff
currM2(index) += value * value
currL1(index) += math.abs(value)
nnz(index) += 1.0
}
}
答案 0 :(得分:0)
火花DenseVector&amp;中有2种矢量。斯帕塞夫克托
对于DenseVector,所有元素都是活动的,因此foreachActive有效地变为foreach
private[spark] override def foreachActive(f: (Int, Double) => Unit) = {
var i = 0
val localValuesSize = values.size
val localValues = values
while (i < localValuesSize) {
f(i, localValues(i))
i += 1
}
}
SparseVector可以有非活动元素,应该在foreach中手动跳过,或者使用foreachActive,它在引擎盖下执行
private[spark] override def foreachActive(f: (Int, Double) => Unit) = {
var i = 0
val localValuesSize = values.size
val localIndices = indices
val localValues = values
while (i < localValuesSize) {
f(localIndices(i), localValues(i))
i += 1
}
}
因此,这对于Vectors来说是有效的foreach函数,它只过滤掉活动元素,而不管Vector实现。