用动量反向传播

时间:2017-11-09 21:03:04

标签: python algorithm neural-network backpropagation gradient-descent

我正在关注this tutorial以实现Backpropagation算法。但是,我坚持实施此算法的动力。

如果没有Momentum,这是重量更新方法的代码:

def update_weights(network, row, l_rate):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i - 1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
            neuron['weights'][-1] += l_rate * neuron['delta']

以下是我的实施:

def updateWeights(network, row, l_rate, momentum=0.5):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i-1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                previous_weight = neuron['weights'][j]
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] + momentum * previous_weight
            previous_weight = neuron['weights'][-1]
            neuron['weights'][-1] += l_rate * neuron['delta'] + momentum * previous_weight

这给了我一个Mathoverflow错误,因为权重在多个纪元上呈指数变得过大。我相信我的previous_weight逻辑对于更新是错误的。

1 个答案:

答案 0 :(得分:8)

我会给你一个提示。您在实施中将momentum乘以previous_weight,这是同一步骤中网络的另一个参数。这显然会很快爆发。

你应该做的是记住整个更新向量,  l_rate * neuron['delta'] * inputs[j],在之前的反向传播步骤上添加它。它可能看起来像这样:

velocity[j] = l_rate * neuron['delta'] * inputs[j] + momentum * velocity[j]
neuron['weights'][j] += velocity[j]

...其中velocity是一个与network长度相同的数组,定义的范围大于updateWeights,并用零初始化。有关详细信息,请参阅this post