SVM的损失函数的梯度

时间:2016-07-26 04:52:37

标签: python computer-vision svm linear-regression gradient-descent

我在卷积神经网络上研究this课程。我一直试图为svm实现损失函数的梯度,并且(我有一个解决方案的副本)我无法理解为什么解决方案是正确的。

this页面上,它定义了损失函数的梯度,如下所示: Class course notes of cs231n 在我的代码中,我的分析梯度在代码中实现时与数字梯度匹配,如下所示:

 dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
        if margin > 0:
            continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        dW[:, y[i]] += -X[i]
        dW[:, j] += X[i] # gradient update for incorrect rows
        loss += margin

但是,从注释中可以看出,dW[:, y[i]]每次j == y[i]都应该更改j == y[i],因为我们会在 dW = np.zeros(W.shape) # initialize the gradient as zero # compute the loss and the gradient num_classes = W.shape[1] num_train = X.shape[0] loss = 0.0 for i in xrange(num_train): scores = X[i].dot(W) correct_class_score = scores[y[i]] for j in xrange(num_classes): if j == y[i]: if margin > 0: dW[:, y[i]] += -X[i] continue margin = scores[j] - correct_class_score + 1 # note delta = 1 if margin > 0: dW[:, j] += X[i] # gradient update for incorrect rows loss += margin 时减去损失。我很困惑为什么代码不是:

j == y[i]

J != y[i]时损失会发生变化。为什么在{{1}}时计算它们?

1 个答案:

答案 0 :(得分:5)

我没有足够的声誉发表评论,所以我在这里回答。每当您为x[i]i训练示例计算损失向量并获得一些非零损失时,这意味着您应该将错误的等级(j != y[i])的权重向量移动x[i] ,同时,将权重或超平面移动到j==y[i]附近的正确类(x[i])。根据平行四边形法,w + x位于wx之间。因此,w[y[i]]每次找到x[i]时都会尝试接近loss>0

因此,dW[:,y[i]] += -X[i]dW[:,j] += X[i]是在循环中完成的,但是在更新时,我们将按照递减渐变的方向进行,因此我们实际上是在添加X[i]以更正类权重和从错过分类的权重中消失X[i]