我正在尝试使用Flink为在线线性分类器学习一些权重。因此,我最初将 Weights 的向量初始化为零。对于每个新实例,我要更新此权重。我一直在阅读Flink's GradientDescent和MultipleLinearRegression的代码,以尝试找到最佳方法。
到目前为止,我尝试过的方法不起作用:
val dimensionsDS = input.map(_.vector.size).reduce((_, b) => b)
val initialWeights = dimensionsDS.map {
dimension =>
val values = Array.fill(dimension)(0.0)
WeightVector(DenseVector(values), .0)
}
val finalWeights = initialWeights.iterate(1) { weightVectorDS =>
input.mapWithBcVariable(weightVectorDS) { (data, wv) =>
import Breeze._
val vector = data.vector.asBreeze
val w = wv.weights.asBreeze
val pred = vector dot w
if (pred * data.label <= 1) {
vector :*= eta * data.label
val wt = w + vector
wt :*= math.min(1.0, 1 / (math.sqrt(lambda) * norm(wt)))
// Truncate
if (wt.toArray.count(_ != 0) > nFeatures) {
val topN = wt.toArray.zipWithIndex.sortBy(-_._1)
for (i <- nFeatures + 1 until wt.size)
wt(topN(i)._2) = 0
WeightVector(DenseVector(wt.toArray), 0.0)
} else wv
} else {
wv
}
}
}
此代码的问题在于,初始权重wv
始终是相同的,map
函数上的每个新实例均未更新。这是有道理的,因为mapWithBcVariable
仅将变量广播到所有节点,而不更新它。
简单的方法是将初始权重设置为可变,但是我认为这不是一个好主意。
有人知道如何为每个input
实例更新权重吗?
这是可变版本:
val values = Array.fill(dimensionsDS)(0.0)
var w0 = WeightVector(DenseVector(values), .0).weights.asBreeze
val finalWeights = input.map { data =>
val vector = data.vector.asBreeze
val pred = vector dot w0
if (pred * data.label <= 1) {
vector :*= eta * data.label
w0 = w0 + vector
w0 :*= math.min(1.0, 1 / (math.sqrt(lambda) * norm(w0)))
// Truncate
if (w0.toArray.count(_ != 0) > nFeatures) {
val topN = w0.toArray.zipWithIndex.sortBy(-_._1)
for (i <- nFeatures + 1 until w0.size)
w0(topN(i)._2) = 0
}
}
w0
}