我已经实现了深度学习网络:转换-> Relu-> Maxpool->展平->密集-> softmax。网络具有6178个参数。我正在尝试在我的深度学习网络上进行梯度检查。当我在单个数据点上执行梯度检查时,它会通过它,而我得到的差是 1.1969471336112197197e-08 。但是,当我在2个数据点上运行它时,会给我带来 0.3254100270774182 的巨大差异。这是我的渐变检查的代码:
def grad_check():
train_set_x, train_set_y, test_set_x, test_set_y, n_class = load_data()
train_set_x = train_set_x[0:2]
train_set_y = train_set_y[:, 0:2]
cnn = make_model(train_set_x, n_class)
print (cnn.layers)
A = cnn.forward(train_set_x)
loss, dA = SoftmaxLoss(A, train_set_y)
assert (A.shape == dA.shape)
grads = cnn.backward(dA)
grads_values = grads_to_vector(grads)
initial_params = cnn.params
parameters_values = params_to_vector(initial_params) # initial parameters
num_parameters = parameters_values.shape[0]
J_plus = np.zeros((num_parameters, 1))
J_minus = np.zeros((num_parameters, 1))
gradapprox = np.zeros((num_parameters, 1))
print (num_parameters)
epsilon = 1e-7
assert (len(grads_values) == len(parameters_values))
for i in tqdm(range(0, num_parameters)):
thetaplus = copy.deepcopy(parameters_values)
thetaplus[i][0] = thetaplus[i][0] + epsilon # parameters
new_param = vector_to_param(thetaplus, initial_params)
difference = compare(new_param, initial_params)
assert ( difference == 1) # make sure only one parameter is changed
cnn.params = new_param
A = cnn.forward(train_set_x)
J_plus[i], _ = SoftmaxLoss(A, train_set_y)
thetaminus = copy.deepcopy(parameters_values)
thetaminus[i][0] = thetaminus[i][0] - epsilon
new_param = vector_to_param(thetaminus, initial_params)
difference = compare(new_param, initial_params)
assert (difference == 1) # make sure only one parameter is changed
cnn.params = new_param
A = cnn.forward(train_set_x)
J_minus[i], _ = SoftmaxLoss(A, train_set_y)
gradapprox[i] = (J_plus[i] - J_minus[i]) / (2 * epsilon)
numerator = np.linalg.norm(gradapprox - grads_values)
denominator = np.linalg.norm(grads_values) + np.linalg.norm(gradapprox)
difference = numerator / denominator
if difference > 2e-7:
print("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(
difference) + "\033[0m")
else:
print("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(
difference) + "\033[0m")
return difference
当我从一个数据点移动到两个数据点时,我无法理解为什么会有如此巨大的跳跃。我什至用3个数据点进行了检查,差异为 0.47068460998434125 。在这方面,我很乐意听取我的建议。