Question

CS231n课程的作业1。

计算完损失后，我要求实现scores的梯度其中scores是具有N行（示例数）行和C行（类）列的矩阵。

这是损失计算：

z1 = X.dot(W1) + b1
a1 = np.maximum(0, z1) # pass through ReLU activation function
scores = a1.dot(W2) + b2

# compute the class probabilities
exp_scores = np.exp(scores)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K]

# average cross-entropy loss and regularization
corect_logprobs = -np.log(probs[range(N), y])
data_loss = np.sum(corect_logprobs) / N
reg_loss = 0.5 * reg * np.sum(W1 * W1) + 0.5 * reg * np.sum(W2 * W2)
loss = data_loss + reg_loss

这是梯度计算：（不是我的，但到处看起来都很相似）

#############################################################################
# TODO: Compute the backward pass, computing the derivatives of the weights #
# and biases. Store the results in the grads dictionary. For example,       #
# grads['W1'] should store the gradient on W1, and be a matrix of same size #
#############################################################################
# compute the gradient on scores
dscores = probs
dscores[range(N),y] -= 1      # The line I don't understand
dscores /= N

# W2 and b2
grads['W2'] = np.dot(a1.T, dscores)
grads['b2'] = np.sum(dscores, axis=0)
# next backprop into hidden layer
dhidden = np.dot(dscores, W2.T)
# backprop the ReLU non-linearity
dhidden[a1 <= 0] = 0
# finally into W,b
grads['W1'] = np.dot(X.T, dhidden)
grads['b1'] = np.sum(dhidden, axis=0)
# add regularization gradient contribution
grads['W2'] += reg * W2
grads['W1'] += reg * W1

我的问题是为什么我应该减少dscores？为什么是导数？

Answer 1

我绝不是专家。

但是我认为这行基本上是在只对正确的类递减梯度向量，因此使其更负，因此当您更新权重时，它将帮助分类器预测正确的类。

我可能错了。

梯度计算

1 个答案: