Question

我一直致力于研究如何进行基本乘法的神经网络：1 * 0,1 * 1,1 * 2，...... 1 * 9。我将上面的值映射到两个输入：1,0.0; 1,0.1; 1,2.2; ...... 1,0.9。我最初的想法是使用10个输出节点，这样：对于输入1，0.5，你将得到输出0,0,0,0,0,1,0,0,0,0。因此，具有由第二个输入值指示的位置的输出节点应具有值1.

主要问题是网络不会那样学习：

Run: 0; input: 1.0 and 0.0; calc. output: 0 outputs: [0.94048536, 0.75189143, 0.90448654, 0.84964263, 0.92710346, 0.9205896, 0.9166514, 0.9185999, 0.795943, 0.81590974]
Run: 0; input: 1.0 and 0.1; calc. output: 0 outputs: [0.94168836, 0.6586949, 0.88201404, 0.7996992, 0.9125166, 0.9040279, 0.8969305, 0.9007438, 0.72104955, 0.74867773]
Run: 0; input: 1.0 and 0.2; calc. output: 1 outputs: [0.59459364, 0.626284, 0.5367963, 0.37009117, 0.5410054, 0.5135251, 0.45009533, 0.5312547, 0.3342563, 0.34696856]
Run: 0; input: 1.0 and 0.3; calc. output: 2 outputs: [0.30291027, 0.3520163, 0.69235265, 0.24563757, 0.3124528, 0.24798703, 0.2366348, 0.36106965, 0.24116497, 0.23960127]
Run: 0; input: 1.0 and 0.4; calc. output: 3 outputs: [0.14643292, 0.14488237, 0.20816484, 0.7138216, 0.14679325, 0.14442608, 0.14322245, 0.14553475, 0.14529803, 0.14485884]
Run: 0; input: 1.0 and 0.5; calc. output: 4 outputs: [0.111300476, 0.11054926, 0.12775657, 0.17022455, 0.517143, 0.11048156, 0.10996126, 0.11087219, 0.11077163, 0.11069376]
Run: 0; input: 1.0 and 0.6; calc. output: 5 outputs: [0.091701485, 0.091235556, 0.100025855, 0.113540664, 0.079323635, 0.40588042, 0.090913385, 0.09137165, 0.09132192, 0.09130932]
Run: 0; input: 1.0 and 0.7; calc. output: 6 outputs: [0.07876296, 0.07844582, 0.08373607, 0.090930566, 0.07063774, 0.075600885, 0.33777672, 0.07862692, 0.07864317, 0.078639545]
Run: 0; input: 1.0 and 0.8; calc. output: 7 outputs: [0.069422215, 0.069256335, 0.07245374, 0.07753391, 0.06380278, 0.06732645, 0.078282, 0.29474476, 0.06972643, 0.06970157]
Run: 0; input: 1.0 and 0.9; calc. output: 8 outputs: [0.062300637, 0.062199928, 0.06432947, 0.06806077, 0.05821499, 0.0608177, 0.06845653, 0.07952615, 0.26235205, 0.06267065]
MSE avg. slope: 29.733704


Run: 71656; input: 1.0 and 0.0; calc. output: 9 outputs: [0.010279003, 0.02896637, 0.043276917, 0.053958114, 0.0630066, 0.071741395, 0.08170587, 0.095196836, 0.1174955, 0.2159805]
Run: 71656; input: 1.0 and 0.1; calc. output: 9 outputs: [0.0103677455, 0.028743964, 0.042845763, 0.053532414, 0.06259656, 0.07137894, 0.08135815, 0.09476117, 0.11689092, 0.21750201]
Run: 71656; input: 1.0 and 0.2; calc. output: 9 outputs: [0.010355595, 0.031080922, 0.042447396, 0.052969296, 0.06184091, 0.07039538, 0.08002176, 0.092763215, 0.113420375, 0.20233525]
Run: 71656; input: 1.0 and 0.3; calc. output: 9 outputs: [0.010345513, 0.030882018, 0.051248915, 0.052218866, 0.060734913, 0.06885447, 0.07785396, 0.08951517, 0.10782931, 0.1783656]
Run: 71656; input: 1.0 and 0.4; calc. output: 9 outputs: [0.010335368, 0.030656299, 0.050334506, 0.07143845, 0.059316203, 0.06685822, 0.07505537, 0.085399605, 0.100998335, 0.15348937]
Run: 71656; input: 1.0 and 0.5; calc. output: 9 outputs: [0.0103242155, 0.030395139, 0.049267467, 0.06860315, 0.09226974, 0.06452598, 0.071837306, 0.08079469, 0.09371151, 0.13176109]
Run: 71656; input: 1.0 and 0.6; calc. output: 9 outputs: [0.010311599, 0.030096399, 0.048070885, 0.0655594, 0.0853547, 0.114023075, 0.06839305, 0.076017134, 0.08652923, 0.114047475]
Run: 71656; input: 1.0 and 0.7; calc. output: 6 outputs: [0.010297287, 0.02976092, 0.046773087, 0.062431496, 0.0788379, 0.09977333, 0.13687406, 0.07129724, 0.079778984, 0.09988401]
Run: 71656; input: 1.0 and 0.8; calc. output: 7 outputs: [0.010281161, 0.029391168, 0.045403145, 0.05931676, 0.07286063, 0.088408194, 0.11082662, 0.16097555, 0.07361035, 0.088530056]
Run: 71656; input: 1.0 and 0.9; calc. output: 8 outputs: [0.010263171, 0.028990407, 0.04398821, 0.05628488, 0.067463, 0.07921742, 0.09398328, 0.117479034, 0.18689476, 0.07932637]
MSE avg. slope: 20.656487

MSE平均值斜率是我们与预期结果相差多远的指标。计算。 output是一个检测9个输出节点的最大值的函数。输出是9个节点的实际输出。

有趣的是，如果我将我的网络转换为具有单个输出神经元的网络，那么：1 * 0.3只需0.3，那么一切正常：

Run: 0; input: 1.0 and 0.0; calc. output: 0.7 outputs: [0.73059195]
Run: 0; input: 1.0 and 0.1; calc. output: 0.7 outputs: [0.6943731]
Run: 0; input: 1.0 and 0.2; calc. output: 0.6 outputs: [0.59459066]
Run: 0; input: 1.0 and 0.3; calc. output: 0.5 outputs: [0.47402766]
Run: 0; input: 1.0 and 0.4; calc. output: 0.4 outputs: [0.40803587]
Run: 0; input: 1.0 and 0.5; calc. output: 0.4 outputs: [0.40256947]
Run: 0; input: 1.0 and 0.6; calc. output: 0.4 outputs: [0.4390238]
Run: 0; input: 1.0 and 0.7; calc. output: 0.5 outputs: [0.51111925]
Run: 0; input: 1.0 and 0.8; calc. output: 0.6 outputs: [0.61495394]
Run: 0; input: 1.0 and 0.9; calc. output: 0.7 outputs: [0.7283057]
MSE avg. slope: 6.411294


Run: 1059; input: 1.0 and 0.0; calc. output: 0.0 outputs: [0.07036249]
Run: 1059; input: 1.0 and 0.1; calc. output: 0.1 outputs: [0.12572113]
Run: 1059; input: 1.0 and 0.2; calc. output: 0.2 outputs: [0.20584798]
Run: 1059; input: 1.0 and 0.3; calc. output: 0.3 outputs: [0.30965355]
Run: 1059; input: 1.0 and 0.4; calc. output: 0.4 outputs: [0.42281863]
Run: 1059; input: 1.0 and 0.5; calc. output: 0.5 outputs: [0.52661973]
Run: 1059; input: 1.0 and 0.6; calc. output: 0.6 outputs: [0.61548316]
Run: 1059; input: 1.0 and 0.7; calc. output: 0.7 outputs: [0.6934447]
Run: 1059; input: 1.0 and 0.8; calc. output: 0.8 outputs: [0.76261413]
Run: 1059; input: 1.0 and 0.9; calc. output: 0.8 outputs: [0.82081395]
MSE avg. slope: 1.1993817

有谁知道原因可能是什么？将热切期待任何建议。祝大家好日子！

Answer 1

最终我发现了这个问题：在代码中我没有重置每个纪元后平均斜率的值，但我仍然使用它作为学习率。

这导致每一步都发生剧烈的重量变化。这最终导致向0收敛，这意味着平均误差非常低。平均坡度低于初始值，但仍然很大。

一旦我正确地重置它，它就能准确反映网络何时适合（任何地方<0.711）。

我得到了很多帮助： Strange convergence in simple Neural Network
https://www.youtube.com/watch?v=Gvq9sUHPgrc（giant_neural_network 成本函数的斜率。视频） https://www.youtube.com/watch?v=tIeHLnjs5U8（3Blue1Brown Backpropagation video）

仅在使用单个输出节点时进行神经网络学习

1 个答案: