MLP神经网络:计算梯度(矩阵)

时间:2013-06-11 16:35:19

标签: python numpy neural-network gradient

计算n层神经网络中的梯度有什么好的实现方法?

重量层:

  1. 第一层权重:(n_inputs+1, n_units_layer)-matrix
  2. 隐藏图层权重:(n_units_layer+1, n_units_layer)-matrix
  3. 最后一层权重:(n_units_layer+1, n_outputs)-matrix
  4. 注意:

    • 如果只有一个隐藏层,我们只用两个(重量)层代表网: inputs --first_layer-> network_unit --second_layer-> output
    • 对于具有多个隐藏层的n层网络,我们需要实施(2)步骤。

    有点模糊的伪代码:

        weight_layers = [ layer1, layer2 ]             # a list of layers as described above
        input_values  = [ [0,0], [0,0], [1,0], [0,1] ] # our test set (corresponds to XOR)
        target_output = [ 0, 0, 1, 1 ]                 # what we want to train our net to output
        output_layers = []                             # output for the corresponding layers
    
        for layer in weight_layers:
            output <-- calculate the output     # calculate the output from the current layer
            output_layers <-- output            # store the output from each layer
    
        n_samples = input_values.shape[0]
        n_outputs = target_output.shape[1]
    
        error = ( output-target_output )/( n_samples*n_outputs )
    
        """ calculate the gradient here """
    

    最终实施

    The final implementation is available at github

1 个答案:

答案 0 :(得分:2)

使用Python和numpy很容易。

您有两种选择:

  1. 您可以为num_instances个实例或
  2. 并行计算所有内容
  3. 您可以计算一个实例的渐变(实际上是1的特殊情况。)。
  4. 我现在将提供一些如何实现选项1的提示。我建议您创建一个名为Layer的新类。它应该有两个功能:

    forward:
        inputs:
        X: shape = [num_instances, num_inputs]
            inputs
        W: shape = [num_outputs, num_inputs]
            weights
        b: shape = [num_outputs]
            biases
        g: function
            activation function
        outputs:
        Y: shape = [num_instances, num_outputs]
            outputs
    
    
    backprop:
        inputs:
        dE/dY: shape = [num_instances, num_outputs]
            backpropagated gradient
        W: shape = [num_outputs, num_inputs]
            weights
        b: shape = [num_outputs]
            biases
        gd: function
            calculates the derivative of g(A) = Y
            based on Y, i.e. gd(Y) = g'(A)
        Y: shape = [num_instances, num_outputs]
            outputs
        X: shape = [num_instances, num_inputs]
            inputs
        outputs:
        dE/dX: shape = [num_instances, num_inputs]
            will be backpropagated (dE/dY of lower layer)
        dE/dW: shape = [num_outputs, num_inputs]
            accumulated derivative with respect to weights
        dE/db: shape = [num_outputs]
            accumulated derivative with respect to biases
    

    实施很简单:

    def forward(X, W, b):
        A = X.dot(W.T) + b # will be broadcasted
        Y = g(A)
        return Y
    
    def backprop(dEdY, W, b, gd, Y, X):
        Deltas = gd(Y) * dEdY # element-wise multiplication
        dEdX = Deltas.dot(W)
        dEdW = Deltas.T.dot(X)
        dEdb = Deltas.sum(axis=0)
        return dEdX, dEdW, dEdb
    
    第一层的

    X是从您的数据集中获取的,然后您将每个Y作为前向传递中下一层的X传递。

    计算输出层的dE/dY(用于softmax激活函数和交叉熵误差函数或用于线性激活函数和误差平方和)Y-T,其中Y是网络的输出(shape = [num_instances,num_outputs])和T(shape = [num_instances,num_outputs])是所需的输出。然后你可以反向传播,即每层的dE/dX是前一层的dE/dY

    现在,您可以使用每个图层的dE/dWdE/db来更新Wb

    以下是C ++的示例:OpenANN

    顺便说一下。您可以比较实例和批量向前传播的速度:

    In [1]: import timeit
    
    In [2]: setup = """import numpy
       ...: W = numpy.random.rand(10, 5000)
       ...: X = numpy.random.rand(1000, 5000)"""
    
    In [3]: timeit.timeit('[W.dot(x) for x in X]', setup=setup, number=10)
    Out[3]: 0.5420958995819092
    
    In [4]: timeit.timeit('X.dot(W.T)', setup=setup, number=10)
    Out[4]: 0.22001314163208008