梯度下降的线性回归:两个问题

时间:2020-06-15 14:37:44

标签: python machine-learning scikit-learn linear-regression gradient-descent

我正在尝试了解具有梯度下降的线性回归,并且我在下面的loss_gradients函数中不了解这一部分。

import numpy as np

def forward_linear_regression(X, y, weights):

    # dot product weights * inputs
    N = np.dot(X, weights['W'])

    # add bias
    P = N + weights['B']

    # compute loss with MSE
    loss = np.mean(np.power(y - P, 2))

    forward_info = {}
    forward_info['X'] = X
    forward_info['N'] = N
    forward_info['P'] = P
    forward_info['y'] = y

    return loss, forward_info

这是我无法理解的地方,我已经注释掉了我的问题:

def loss_gradients(forward_info, weights):

    # to update weights, we need: dLdW = dLdP * dPdN * dNdW
    dLdP = -2 * (forward_info['y'] - forward_info['P'])
    dPdN = np.ones_like(forward_info['N'])
    dNdW = np.transpose(forward_info['X'], (1, 0))

    dLdW = np.dot(dNdW, dLdP * dPdN)
    # why do we mix matrix multiplication and dot product like this?
    # Why not dLdP * dPdN * dNdW instead?

    # to update biases, we need: dLdB = dLdP * dPdB
    dPdB = np.ones_like(forward_info[weights['B']])
    dLdB = np.sum(dLdP * dPdB, axis=0)
    # why do we sum those values along axis 0?
    # why not just dLdP * dPdB ?

1 个答案:

答案 0 :(得分:1)

在我看来,这段代码正在等待“批处理”数据。我的意思是,期望当您执行forward_infoloss_gradients时,实际上是在传递一堆(X,y)对。假设您传递了B这样的对。您所有转发信息的第一个维度的尺寸为B。

现在,这两个问题的答案都是相同的:本质上,这些行(使用您预测的公式)为每个B项计算坡度( ),然后将所有渐变,因此您可以获得一个渐变更新。我鼓励您自己设计点积背后的逻辑,因为这在ML中是非常常见的模式,但是一开始就很难掌握。

相关问题