我正在为一些分类问题实现这个神经网络。我最初尝试反向传播,但收敛时间更长。所以我虽然使用RPROP。在我的测试设置中,RPROP在AND门仿真中工作正常,但从不收敛于OR和XOR门模拟。
for(int l_index = 1; l_index< _total_layers; l_index ++){ Layer * curr_layer = get_layer_at(l_index);
//iterate through each neuron
for (unsigned int n_index = 0; n_index < curr_layer->get_number_of_neurons(); n_index++) {
Neuron* jth_neuron = curr_layer->get_neuron_at(n_index);
double change = jth_neuron->get_change();
double curr_gradient = jth_neuron->get_gradient();
double last_gradient = jth_neuron->get_last_gradient();
int grad_sign = sign(curr_gradient * last_gradient);
//iterate through each weight of the neuron
for(int w_index = 0; w_index < jth_neuron->get_number_of_weights(); w_index++){
double current_weight = jth_neuron->give_weight_at(w_index);
double last_update_value = jth_neuron->give_update_value_at(w_index);
double new_update_value = last_update_value;
if(grad_sign > 0){
new_update_value = min(last_update_value*1.2, 50.0);
change = sign(curr_gradient) * new_update_value;
}else if(grad_sign < 0){
new_update_value = max(last_update_value*0.5, 1e-6);
change = -change;
curr_gradient = 0.0;
}else if(grad_sign == 0){
change = sign(curr_gradient) * new_update_value;
}
//Update neuron values
jth_neuron->set_change(change);
jth_neuron->update_weight_at((current_weight + change), w_index);
jth_neuron->set_last_gradient(curr_gradient);
jth_neuron->update_update_value_at(new_update_value, w_index);
double current_bias = jth_neuron->get_bias();
jth_neuron->set_bias(current_bias + _learning_rate * jth_neuron->get_delta());
}
}
}
答案 0 :(得分:0)
原则上,您不会像进行反向传播时那样对待偏见。它似乎正在做learning_rate * delta
。
一个错误来源可能是体重变化的符号取决于您如何计算错误。有不同的约定,而(t_i-y_i)
代替(y_i - t_i)
会导致返回(new_update_value * sgn(grad))
而不是-(new_update_value * sign(grad))
,因此请尝试切换符号。我还不确定你是如何具体实现的,因为这里没有显示很多东西。但是在Java实现中,这是我的一小部分,可能会有所帮助:
// gradient didn't change sign:
if(weight.previousErrorGradient * errorGradient > 0)
weight.lastUpdateValue = Math.min(weight.lastUpdateValue * step_pos, update_max);
// changed sign:
else if(weight.previousErrorGradient * errorGradient < 0)
{
weight.lastUpdateValue = Math.max(weight.lastUpdateValue * step_neg, update_min);
}
else
weight.lastUpdateValue = weight.lastUpdateValue; // no change
// Depending on language, you should check for NaN here.
// multiply this with -1 depending on your error signal's sign:
return ( weight.lastUpdateValue * Math.signum(errorGradient) );
另外,请记住,50.0,1e-6,尤其是0.5,1.2是经验性收集的值,因此可能需要进行调整。你应该打印出渐变和重量变化,看看是否有奇怪的事情发生(例如爆炸渐变 - > NaN,尽管你只是测试AND / XOR)。您的last_gradient
值也应在第一个时间步长初始化为0
。