Question

我正在尝试使用 AND 示例进行delta规则学习，并且我注意到当我不在重量校正中应用sigmoid激活的衍生物时，学习会更快更好地收敛

我正在使用偏见神经元。

如果我理解正确， delta规则应考虑用于权重调整的激活函数的导数：ΔWk（n）=η*（）*'（ℎ）*（）。

其中e（n）= desired_output - neuron_output。

这是我用来计算输出的sigmoid：

public double calc(double sum) {
    return 1 / (1 + Math.pow(Math.E, -sum));
}

根据第33页，此dela rule中的第4步，重量更新应为：

double delta = learningRate * error * estimated * (1 - estimated) * input;

没有：

，效果会更好

estimated * (1 - estimated)

这几乎是使用delta规则进行培训的代码：

@Override
public void train(List<LearningSample> samples, double[] weights, Function<double[], Double> neuronOutput) {

    double[] weightDelta = new double[weights.length];
    for (int i = 0; i < 10000; i++) {
        // Collections.shuffle(samples);
        for (LearningSample sample : samples) {
            // sigmoid of dot product of weights and input vector, including bias
            double estimated = neuronOutput.apply(sample.getInput());
            double error = sample.getDesiredOutput() - estimated;
            // this commented out version actually works better than the one bellow
            // double delta = learningRate * error;
            double delta = learningRate * error * estimated * (1 - estimated);
            // aggregate delta per weight for each sample in epoch
            deltaUpdate(delta, weightDelta, sample.getInput());
        }

        // batch update weights at the end of training epoch
        for (int weight = 0; weight < weights.length; weight++) {
            weights[weight] += weightDelta[weight];
        }

        weightDelta = new double[weights.length];
    }      
}


private void deltaUpdate(double delta, double[] weightsDelta, double[] input) {
    for (int feature = 0; feature < input.length; feature++) {
        weightsDelta[feature] = weightsDelta[feature] + delta * input[feature];
    }
}

AND的训练样本如下所示：

List<LearningSample> samples = new ArrayList<>();
LearningSample sample1 = new LearningSample(new double[] { 0, 0 }, 0);
LearningSample sample2 = new LearningSample(new double[] { 0, 1 }, 0);
LearningSample sample3 = new LearningSample(new double[] { 1, 0 }, 0);
LearningSample sample4 = new LearningSample(new double[] { 1, 1 }, 1);

Bias 1 在构造函数中作为第0个组件注入。

学习后测试输出的顺序：

System.out.println(neuron.output(new double[] { 1,   1, 1 }));
System.out.println(neuron.output(new double[] { 1,   0, 0 }));
System.out.println(neuron.output(new double[] { 1,   0, 1 }));
System.out.println(neuron.output(new double[] { 1,   1, 0 }));

当我从delta计算中省略sigmoid的导数时，这是结果：

10000次迭代

0.9666565909058419
2.05087653022386E-5
0.023803593411627456
0.023803593411627456

35000次迭代

0.9903810162649429
4.6475933225663785E-7
0.006870001301253153
0.006870001301253153

这些是应用衍生物的结果：

10000次迭代

0.8446651307271656
0.004030424878725242
0.129178264332045
0.129178264332045

35000次迭代

0.9218773156128204
4.169603485934177E-4
0.06555977437019253
0.06555977437019253

学习率为：0.021，偏差的起始重量为：-2。

在没有导数的第一个例子中，误差更小并且函数的近似更好。那是为什么？

更新

根据@Umberto的回答，我想验证一些事情：

事故实验delta = learningRate * error * input实际上是有效的，因为这样可以最大限度地减少交叉熵成本函数？的是
交叉熵显然更适合分类，那么什么时候MSE应该用作成本函数？的回归

作为注释，我通过阈值函数运行输出，这里没有显示，所以这是二进制分类。

Answer 1

原因很简单。您最小化不同的成本函数。在您的情况下（从幻灯片），您最小化误差平方。如果您使用我在此处github link的推导中描述的形式使用成本函数（交叉熵），您将获得更快的权重更新。通常在分类问题中（通常你使用sigmoid神经元进行二元分类），平方误差实际上并不是一个好的成本函数。

如果您使用交叉熵，则需要使用learningRate * error * input;（根据您定义错误的方式使用正确的符号）。

作为旁注，你实际在做的是逻辑回归......

希望有所帮助。如果您需要更多信息，请告诉我。检查我的链接，在那里我完整地推导出它背后的数学。

Sigmoid

1 个答案: