Question

我在理解cs231n HW1的某些代码导致svm丢失时遇到了一些麻烦。损失的表达式为scores[j] - correct_class_score + 1 当我们仅针对正确的分数采用渐变时，是否仅更新正确类别的索引？因此渐变为

dW[:, y[i]]=dW[:, y[i]] - X[i, :y[i]]*num_classes_greater_margin #and not 
dW[:, y[i]]=dW[:, y[i]] - X[i, :]

对于不正确的分数也是如此。为什么他要做*num_classes_greater_margin？

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).
  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.
  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength
  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """

  # Initialize loss and the gradient of W to zero.
  dW = np.zeros(W.shape)
  loss = 0.0
  num_classes = W.shape[1]
  num_train = X.shape[0]

  # Compute the data loss and the gradient.
  for i in range(num_train):  # For each image in training.
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    num_classes_greater_margin = 0

    for j in range(num_classes):  # For each calculated class score for this image.

      # Skip if images target class, no loss computed for that case.
      if j == y[i]:
        continue

      # Calculate our margin, delta = 1
      margin = scores[j] - correct_class_score + 1

      # Only calculate loss and gradient if margin condition is violated.
      if margin > 0:
        num_classes_greater_margin += 1
        # Gradient for non correct class weight.
        dW[:, j] = dW[:, j] + X[i, :]
        loss += margin

    # Gradient for correct class weight.
    dW[:, y[i]] = dW[:, y[i]] - X[i, :]*num_classes_greater_margin

  # Average our data loss across the batch.
  loss /= num_train

  # Add regularization loss to the data loss.
  loss += reg * np.sum(W * W)

  # Average our gradient across the batch and add gradient of regularization term.
  dW = dW /num_train + 2*reg *W
  return loss, dW

SVM损失梯度更新

0 个答案: