我被Andrew Ng DL-NN课程的任务困扰了。
当相对于w
(dw
)的损失的矩阵梯度必须具有w
(.shape == ( 2, 1 )
)的相同形状但在计算{{1}时,代码有一个断言我必须包含变量dw
的矩阵,其形状为X
,因此通过广播(2,2)
的形状始终为dw
并且永远不会相同作为(2,2)
w
。
有人可以帮忙吗?
功能
(2,1)
通话功能
def propagate(w, b, X, Y):
"""
Implement the cost function and its gradient for the propagation explained above
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
"""
m = X.shape[1]
A = 1 / (1 + np.exp(-(np.dot(w.T, X) + b)))
cost = -(1/m) *(np.dot(Y, np.log(A).T)) + (1 - Y) * np.log(1-A)
dz = A - Y
dw = (1/m)*X*((dz.T))
db = (1/m)*np.sum(dz)
#print(X.shape)
#print(X)
#print(A.shape)
#print(Y.shape)
print(dw.shape)
#print(dw)
#print(w.shape)
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw,
"db": db}
return grads, cost
错误消息
w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]),
np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
答案 0 :(得分:0)
首先,成本是错误的,它应该是以下几行:
cost = -(1./m) * np.sum(Y * np.log(A) + (1 - Y) * np.log(1-A) )
第二,“*”不是矩阵乘法而是逐点乘法,所以dw应该是
dw = (1/m)*X.dot(dz.T)
我不是在这里检查数学的正确性,只是检查物体的形状。