Question

我在Coursera的Andrew Ng教程之后实现了一个简单的神经网络。我执行了梯度检查以检查我的Backprop算法中的梯度计算的校正，并且我的计算梯度与通过其他方法获得的相同，因此我非常有信心我的实现应该没问题。但是，我获得了非常糟糕的结果（45％的准确性）识别数字。

我的怀疑来到这里：如果我在计算内部增量时移除了sigmoid导数，我得到90％的准确度。我不明白为什么我原本会有这么糟糕的结果，为什么这样做我改善了很多。此外，当我删除sigmoid导数时，计算梯度与梯度检查的输出有很大差异（显然，因为我不再计算导数）。

我关注的教程的相关部分就是这一部分：

我的Backprop实现是：

def backpropagation(self, X, y):
    n_elements = X.shape[0]

    DELTA_theta = [np.zeros(t.shape) for t in self.theta]
    DELTA_bias = [np.zeros(b.shape) for b in self.bias]

    for i in range(0, n_elements):
        A = self.forwardpropagation(X[i])
        delta = np.copy(A[-1])
        delta[y[i]] -= 1

        for l in reversed(range(0, self.n_layers - 1)):
            DELTA_theta[l] += np.outer(A[l], delta)
            DELTA_bias[l] += delta

            if l != 0:
                delta = np.dot(self.theta[l], delta) * A[l] * (1 - A[l])
                # delta = np.dot(self.theta[l], delta) THIS GIVES MUCH BETTER RESULTS


    gradient_theta = [d + self.regularization * self.theta[i] for i, d in enumerate(DELTA_theta)]
    gradient_theta = [g / n_elements for g in gradient_theta]
    gradient_bias = [d / n_elements for d in DELTA_bias]

    estimated_gradient_theta, estimated_gradient_bias = self.gradient_checking(X, y)
    diff_theta = [np.amax(g-e) for g, e in zip(gradient_theta, estimated_gradient_theta)]
    diff_bias = [np.amax(g-e) for g, e in zip(gradient_bias, estimated_gradient_bias)]

    print(max(diff_theta))  # Around 1.0e-07
    print(max(diff_bias))   # Around 1.0e-10

    return gradient_theta + gradient_bias

（请注意，我的权重存储在self.theta中，并且矩阵的尺寸与教程中的尺寸不同，因为我的默认值是转置的）

知道为什么会这样吗？我用了这么多时间...... 谢谢！

Sigmoid导数实现Backprop用于神经网络

0 个答案: