Question

我正在为一些分类问题实现这个神经网络。我最初尝试反向传播，但收敛时间更长。所以我虽然使用RPROP。在我的测试设置中，RPROP在AND门仿真中工作正常，但从不收敛于OR和XOR门模拟。

我应该如何以及何时更新RPROP的偏见？
这里我的体重更新逻辑：

for（int l_index = 1; l_index＆lt; _total_layers; l_index ++）{ Layer * curr_layer = get_layer_at（l_index）;

    //iterate through each neuron
    for (unsigned int n_index = 0; n_index < curr_layer->get_number_of_neurons(); n_index++) {
        Neuron* jth_neuron = curr_layer->get_neuron_at(n_index);

        double change = jth_neuron->get_change();

        double curr_gradient = jth_neuron->get_gradient();
        double last_gradient = jth_neuron->get_last_gradient();

        int grad_sign = sign(curr_gradient * last_gradient);

        //iterate through each weight of the neuron
        for(int w_index = 0; w_index < jth_neuron->get_number_of_weights(); w_index++){
            double current_weight = jth_neuron->give_weight_at(w_index);
            double last_update_value = jth_neuron->give_update_value_at(w_index);

            double new_update_value = last_update_value;
            if(grad_sign > 0){
                new_update_value = min(last_update_value*1.2, 50.0);
                change = sign(curr_gradient) * new_update_value;
            }else if(grad_sign < 0){
                new_update_value = max(last_update_value*0.5, 1e-6);
                change = -change;
                curr_gradient = 0.0;
            }else if(grad_sign == 0){
                change = sign(curr_gradient) * new_update_value;
            }

            //Update neuron values
            jth_neuron->set_change(change);
            jth_neuron->update_weight_at((current_weight + change), w_index);
            jth_neuron->set_last_gradient(curr_gradient);
            jth_neuron->update_update_value_at(new_update_value, w_index);

            double current_bias = jth_neuron->get_bias();
            jth_neuron->set_bias(current_bias + _learning_rate * jth_neuron->get_delta());
        }
    }
}

Answer 1

原则上，您不会像进行反向传播时那样对待偏见。它似乎正在做learning_rate * delta。

一个错误来源可能是体重变化的符号取决于您如何计算错误。有不同的约定，而(t_i-y_i)代替(y_i - t_i)会导致返回(new_update_value * sgn(grad))而不是-(new_update_value * sign(grad))，因此请尝试切换符号。我还不确定你是如何具体实现的，因为这里没有显示很多东西。但是在Java实现中，这是我的一小部分，可能会有所帮助：

// gradient didn't change sign: 
if(weight.previousErrorGradient * errorGradient > 0) 
    weight.lastUpdateValue = Math.min(weight.lastUpdateValue * step_pos, update_max);
// changed sign:
else if(weight.previousErrorGradient * errorGradient < 0) 
{
    weight.lastUpdateValue = Math.max(weight.lastUpdateValue * step_neg, update_min);
}
else
    weight.lastUpdateValue = weight.lastUpdateValue; // no change           

// Depending on language, you should check for NaN here.

// multiply this with -1 depending on your error signal's sign:
return ( weight.lastUpdateValue * Math.signum(errorGradient) );

另外，请记住，50.0,1e-6，尤其是0.5,1.2是经验性收集的值，因此可能需要进行调整。你应该打印出渐变和重量变化，看看是否有奇怪的事情发生（例如爆炸渐变 - > NaN，尽管你只是测试AND / XOR）。您的last_gradient值也应在第一个时间步长初始化为0。

何时/何时更新RPROP神经网络中的偏差？

1 个答案: