Question

我被Andrew Ng DL-NN课程的任务困扰了。当相对于w（dw）的损失的矩阵梯度必须具有w（.shape == ( 2, 1 )）的相同形状但在计算{{1}时，代码有一个断言我必须包含变量dw的矩阵，其形状为X，因此通过广播(2,2)的形状始终为dw并且永远不会相同作为(2,2) w。

有人可以帮忙吗？

功能

(2,1)

通话功能

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b    
    """
    m = X.shape[1] 

    A = 1 / (1 + np.exp(-(np.dot(w.T, X) + b)))                           
    cost = -(1/m) *(np.dot(Y, np.log(A).T)) + (1 - Y) * np.log(1-A)             
    dz = A - Y
    dw = (1/m)*X*((dz.T))
    db = (1/m)*np.sum(dz)

    #print(X.shape)
    #print(X)
    #print(A.shape)
    #print(Y.shape)
    print(dw.shape)
    #print(dw)
    #print(w.shape)

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())

    grads = {"dw": dw,
             "db": db}
    return grads, cost

错误消息

w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), 
np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

Answer 1

首先，成本是错误的，它应该是以下几行：

cost = -(1./m) * np.sum(Y * np.log(A) + (1 - Y) * np.log(1-A)  )

第二，“*”不是矩阵乘法而是逐点乘法，所以dw应该是

dw = (1/m)*X.dot(dz.T)

我不是在这里检查数学的正确性，只是检查物体的形状。

Gradient Descent：AssertionError：矩阵的形状不匹配

1 个答案: