计算n层神经网络中的梯度有什么好的实现方法？

Question

计算n层神经网络中的梯度有什么好的实现方法？

重量层：

第一层权重：(n_inputs+1, n_units_layer)-matrix
隐藏图层权重：(n_units_layer+1, n_units_layer)-matrix
最后一层权重：(n_units_layer+1, n_outputs)-matrix

注意：

如果只有一个隐藏层，我们只用两个（重量）层代表网： inputs --first_layer-> network_unit --second_layer-> output
对于具有多个隐藏层的n层网络，我们需要实施（2）步骤。

有点模糊的伪代码：

    weight_layers = [ layer1, layer2 ]             # a list of layers as described above
    input_values  = [ [0,0], [0,0], [1,0], [0,1] ] # our test set (corresponds to XOR)
    target_output = [ 0, 0, 1, 1 ]                 # what we want to train our net to output
    output_layers = []                             # output for the corresponding layers

    for layer in weight_layers:
        output <-- calculate the output     # calculate the output from the current layer
        output_layers <-- output            # store the output from each layer

    n_samples = input_values.shape[0]
    n_outputs = target_output.shape[1]

    error = ( output-target_output )/( n_samples*n_outputs )

    """ calculate the gradient here """

最终实施

The final implementation is available at github。

Answer 1

使用Python和numpy很容易。

您有两种选择：

您可以为num_instances个实例或
您可以计算一个实例的渐变（实际上是1的特殊情况。）。

我现在将提供一些如何实现选项1的提示。我建议您创建一个名为Layer的新类。它应该有两个功能：

forward:
    inputs:
    X: shape = [num_instances, num_inputs]
        inputs
    W: shape = [num_outputs, num_inputs]
        weights
    b: shape = [num_outputs]
        biases
    g: function
        activation function
    outputs:
    Y: shape = [num_instances, num_outputs]
        outputs


backprop:
    inputs:
    dE/dY: shape = [num_instances, num_outputs]
        backpropagated gradient
    W: shape = [num_outputs, num_inputs]
        weights
    b: shape = [num_outputs]
        biases
    gd: function
        calculates the derivative of g(A) = Y
        based on Y, i.e. gd(Y) = g'(A)
    Y: shape = [num_instances, num_outputs]
        outputs
    X: shape = [num_instances, num_inputs]
        inputs
    outputs:
    dE/dX: shape = [num_instances, num_inputs]
        will be backpropagated (dE/dY of lower layer)
    dE/dW: shape = [num_outputs, num_inputs]
        accumulated derivative with respect to weights
    dE/db: shape = [num_outputs]
        accumulated derivative with respect to biases

实施很简单：

def forward(X, W, b):
    A = X.dot(W.T) + b # will be broadcasted
    Y = g(A)
    return Y

def backprop(dEdY, W, b, gd, Y, X):
    Deltas = gd(Y) * dEdY # element-wise multiplication
    dEdX = Deltas.dot(W)
    dEdW = Deltas.T.dot(X)
    dEdb = Deltas.sum(axis=0)
    return dEdX, dEdW, dEdb

X是从您的数据集中获取的，然后您将每个Y作为前向传递中下一层的X传递。

计算输出层的dE/dY（用于softmax激活函数和交叉熵误差函数或用于线性激活函数和误差平方和）Y-T，其中Y是网络的输出（shape = [num_instances，num_outputs]）和T（shape = [num_instances，num_outputs]）是所需的输出。然后你可以反向传播，即每层的dE/dX是前一层的dE/dY。

现在，您可以使用每个图层的dE/dW和dE/db来更新W和b。

以下是C ++的示例：OpenANN。

顺便说一下。您可以比较实例和批量向前传播的速度：

In [1]: import timeit

In [2]: setup = """import numpy
   ...: W = numpy.random.rand(10, 5000)
   ...: X = numpy.random.rand(1000, 5000)"""

In [3]: timeit.timeit('[W.dot(x) for x in X]', setup=setup, number=10)
Out[3]: 0.5420958995819092

In [4]: timeit.timeit('X.dot(W.T)', setup=setup, number=10)
Out[4]: 0.22001314163208008

MLP神经网络：计算梯度（矩阵）

计算n层神经网络中的梯度有什么好的实现方法？

有点模糊的伪代码：

最终实施

1 个答案: