Question

我正在玩神经网络，并希望制作一个干净的类实现来处理任何大小的网络。目前，我正在调试我的学习功能以处理 2 层网络。

使用逻辑激活的当前状态：

它无法学习低于 0.5 的值
它不能处理输入向量的矩阵（只能处理单个输入向量），这可以稍后实现
如果初始权重和偏差导致输出小于 0.5，它可能会朝着 0 学习
假设：在“理想”条件下，它将使用二进制输入的任意组合学习 0.5 和 1 之间的任何值
- 这已经用 2 个和 3 个网络输入进行了测试
无论层数如何，都进行适当的前向传播

相关代码如下：

import numpy as np

def logistic(x, deriv = False):
  '''
  If using the derivative, input must be result of logistic
  '''
  if deriv:
    return x*(1-x)
    
  return 1/(1+np.exp(-x))

def feed_forw(input, weights):
  '''
  ***Wrapper for input.dot(weights)
  Input should be a np.array the same length as number of input nodes
    - A row of input represents the vector of input nodes
    - Different Rows are different input cases
  Weights is a 2D np.array of weights for each input node to each output node
    - dimensions of weights will determine length of output vector
    - top row is weights going from first input to node to all output nodes
    - first col is weights going from all input nodes to first output node
  '''

  return input.dot(weights)

class ANN:
  '''
  Artificial Neural Network of Perceptron Design
  Member Attributes:
    Weights: tuple of np.array
    - # of elements define number of layers
    - shapes of each element define nodes of each connecting pair of connecting layers
    Bias: tuple of np.array
    - added to each node after the first layer on a per layer basis
    - must have same dimensions as output from each corresponding element in Weights
    Target: np.array
    - array representing desired output.
  '''
  
  def __init__(self, weights, bias = 0, target = None):
    self._weights = weights
    self._bias = bias
    self._target = target

  def __str__(self):
    data = ''
    for w,b in zip(self._weights, self._bias):
      data += f'Weight\n{w}\nbias\n{b}\n'

    return f'{data}Seeking\n{self._target}\n'

  def _forwardProp(self, v, activation):
    '''
    Helper function to Learn
    '''
    out = []
    out.append(v.copy())
    for w,b in zip(self._weights, self._bias):
      out.append(feed_forw(out[-1], w) + b)
      out.append(activation(out[-1]))
    return out

  def setTarget(self, target):
    self._target = target

  def learn(self, input, activation, epoch = 10, eta = 1, debug = False):
    '''
    ***Currently only functions with 2-Layer perceptrons***
    input: np.array
    - a matrix representing each of case of input vectors
    - rows are input vectors for a single case
    activation: function object
    - An activation function used to normalize output
    epoch: int
    - test cycles
    eta: int
    - learning parameter
    '''
    for e in range(epoch):
      layers = self._forwardProp(input, activation)
      #layers is a list for keeping track of changes between layers
      #Pattern follows:
      #[input, layer 0 - weighted sum, layer 1 - activation, layer 1(output) - 
      #   weighted sum, layer 2 - activation, layer 2 ... 
      #   weighted sum, output layer - activation, ouput layer]
      
      #Final element is always network output
      error = layers[-1] - self._target

      delta_out = error * activation(layers[-1], deriv = True)
      #derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
      #derivError_out = delta_out * layers[-3].T*self._weights[-1]
      #EDIT
      derivError_out = delta_out * layers[-3].T
      derivError_bias = delta_out * self._bias[-1].T
      self._weights += -eta*derivError_out
      self._bias += -eta*derivError_bias

      if debug:
        print(f'Epoch {e+1}:\nOutput:\n{layers[-1]}\nError is\n{error}\nDelta Out Node:\n{delta_out}')
        print(f'Weight Increment:\n{derivError_out}\nBias Increment:\n{derivError_bias}')
        print(f'State after training rotation:\n{self}')

      #i = 1
      #while i < len(layers) + 1:
        #This loop will count from the last element of layers, will go back by 2
        #...
        #i += 2

用于测试的代码及其输出：

w2 = np.array([[0.03],
               [-0.1]])
b2 = np.array([[0.7]])
nn1 = ANN((w2,), (b2,))
x = np.array([[1,1]])
t = np.array([[0.7]])
nn1.setTarget(t)
nn1.learn(x, logistic, 100, debug = True)
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[-0.04751054]]
Delta Out Node:
[[-0.01077287]]
Weight Increment:
[[-0.00032319]
 [ 0.00107729]]
Bias Increment:
[[-0.00754101]]
State after training rotation:
Weight
[[ 0.03032319]
 [-0.10107729]]
bias
[[0.70754101]]
Seeking
[[0.7]]

Epoch 2:
Output:
[[0.65402678]]
Error is
[[-0.04597322]]
Delta Out Node:
[[-0.01040263]]
Weight Increment:
[[-0.00031544]
 [ 0.00105147]]
Bias Increment:
[[-0.00736028]]
State after training rotation:
Weight
[[ 0.03063863]
 [-0.10212876]]
bias
[[0.71490129]]
Seeking
[[0.7]]
...
Epoch 99:
Output:
[[0.69871509]]
Error is
[[-0.00128491]]
Delta Out Node:
[[-0.00027049]]
Weight Increment:
[[-1.08348447e-05]
 [ 3.61161491e-05]]
Bias Increment:
[[-0.00025281]]
State after training rotation:
Weight
[[ 0.04006734]
 [-0.13355782]]
bias
[[0.93490471]]
Seeking
[[0.7]]

Epoch 100:
Output:
[[0.69876299]]
Error is
[[-0.00123701]]
Delta Out Node:
[[-0.00026038]]
Weight Increment:
[[-1.04328444e-05]
 [ 3.47761479e-05]]
Bias Increment:
[[-0.00024343]]
State after training rotation:
Weight
[[ 0.04007778]
 [-0.13359259]]
bias
[[0.93514815]]
Seeking
[[0.7]]
'''
#This cell is rerun with
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[0.25248946]]
Delta Out Node:
[[0.05725122]]
Weight Increment:
[[ 0.00171754]
 [-0.00572512]]
Bias Increment:
[[0.04007585]]
State after training rotation:
Weight
[[ 0.02828246]
 [-0.09427488]]
bias
[[0.65992415]]
Seeking
[[0.4]]

Epoch 2:
Output:
[[0.64426676]]
Error is
[[0.24426676]]
Delta Out Node:
[[0.05598279]]
Weight Increment:
[[ 0.00158333]
 [-0.00527777]]
Bias Increment:
[[0.0369444]]
State after training rotation:
Weight
[[ 0.02669913]
 [-0.08899711]]
bias
[[0.62297975]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.50544009]]
Error is
[[0.10544009]]
Delta Out Node:
[[0.0263569]]
Weight Increment:
[[ 2.73123106e-05]
 [-9.10410354e-05]]
Bias Increment:
[[0.00063729]]
State after training rotation:
Weight
[[ 0.00100894]
 [-0.00336312]]
bias
[[0.02354185]]
Seeking
[[0.4]]

Epoch 100:
Output:
[[0.50529672]]
Error is
[[0.10529672]]
Delta Out Node:
[[0.02632123]]
Weight Increment:
[[ 2.65564469e-05]
 [-8.85214898e-05]]
Bias Increment:
[[0.00061965]]
State after training rotation:
Weight
[[ 0.00098238]
 [-0.0032746 ]]
bias
[[0.0229222]]
Seeking
[[0.4]]
'''
#Cell is rerun again with
b2 = np.array([[-0.7]])
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.31647911]]
Error is
[[-0.08352089]]
Delta Out Node:
[[-0.01806725]]
Weight Increment:
[[-0.00054202]
 [ 0.00180672]]
Bias Increment:
[[0.01264707]]
State after training rotation:
Weight
[[ 0.03054202]
 [-0.10180672]]
bias
[[-0.71264707]]
Seeking
[[0.4]]

Epoch 2:
Output:
[[0.31347742]]
Error is
[[-0.08652258]]
Delta Out Node:
[[-0.01862047]]
Weight Increment:
[[-0.00056871]
 [ 0.00189569]]
Bias Increment:
[[0.01326982]]
State after training rotation:
Weight
[[ 0.03111072]
 [-0.10370241]]
bias
[[-0.72591689]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.01206264]]
Error is
[[-0.38793736]]
Delta Out Node:
[[-0.0046231]]
Weight Increment:
[[-0.00079352]
 [ 0.00264508]]
Bias Increment:
[[0.01851554]]
State after training rotation:
Weight
[[ 0.17243664]
 [-0.57478879]]
bias
[[-4.02352151]]
Seeking
[[0.4]]

Epoch 100:
Output:
[[0.01182232]]
Error is
[[-0.38817768]]
Delta Out Node:
[[-0.0045349]]
Weight Increment:
[[-0.00078198]
 [ 0.00260661]]
Bias Increment:
[[0.01824629]]
State after training rotation:
Weight
[[ 0.17321862]
 [-0.5773954 ]]
bias
[[-4.04176779]]
Seeking
[[0.4]]
'''

我可以看到，当输出小于 0.5 时，由于某种原因，无论如何这都会使输出降低。如果起始输出小于 0.5，它只会学习一个小于起始输出的值。如果起始输出为 0.5 或更大，它只会学习一个也大于 0.5 的值。然而，我仍然想不出这个问题的解决方案（至少优雅地）。

这是两种争用情况，所以我可以暴力修复。但是，我不会知道我犯了什么错误。

我知道有多种方法可以实现这个网络，甚至在这个 blog 上看到的这个简单得可笑的变体，我仍然无法理解它的数学。但是，在这件事上工作了数周之后，我只能假设这是我无法看到的一些小错误。

此修改向解决方案迈出了一步。

在类定义中更改了以下行。

#derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
#derivError_out = delta_out * layers[-3].T*self._weights[-1]
derivError_out = delta_out * layers[-3].T

变化

当初始输出为 0.5 或更大时，网络可以学习 0 到 1 之间的任何值！
当初始输出小于 0.5 时，网络可以学习任何小于“不大于起始输出”的值
- 这种行为取决于权重，并且网络无法学习的权重似乎有一个上限。当尝试学习大于该限制的值时，它将收敛到 0

Answer 1

立即出现的一个问题是您的 sigmoid 导数似乎不正确。 sigmoid(x)的导数不等于(x)*(1-x)而是sigmoid(x)(1-sigmoid(x))

您可以相应地更改您的实施

def logistic(x, deriv=False):
    """
    If using the derivative, input must be result of logistic
    """
    phi = 1 / (1 + np.exp(-x))

    if deriv:
        return phi * (1 - phi)

    return phi

我认为有两件我非常怀疑的主要事情是做你想做的事：

您正在优化的错误不是理想的成本函数。
梯度下降最小化了您正在制定的 error 项。
看看当前的公式 error = prediction - target 一个正确运行的梯度优化器所能达到的最好结果就是尽可能地给出最小的（如果激活允许它甚至是负面的）预测。

建议：使用一些 L 范数作为误差函数，例如L1 范数
error = | prediction - target |

也许我只是无法正确阅读它，这没关系，但是重量更新了
derivError_out = delta_out * layers[-3].T*self._weights[-1]
看起来很可疑，一旦您构建了误差项 delta_out，您想要反向传播对各个权重的贡献仅取决于它们各自的激活 layers[-3]。

我确实建议如下：（如果您有兴趣，请推导https://imgur.com/gallery/noS4pe4）

def learn(self, input, activation, epoch=10, eta=1, debug=False):
    """ .... optimizing L1-norm .... """
    for e in range(epoch):
        layers = self._forwardProp(input, activation)

        diff = layers[-1] - self._target
        error = np.abs(diff)

        delta_out = np.sign(diff) * activation(layers[-2], deriv=True)
        derivError_out = delta_out * layers[-3].T
        self._weights -= eta * derivError_out
        self._bias -= eta * delta_out

此外：您的学习率 eta 似乎相当高；这肯定会发散并产生一个具有爆炸权重的模型，最终根据初始化产生 0 或 1。

调低以实现更稳定的梯度下降。

例如

nn1.learn(x, logistic, 100, eta=1e-2, debug=True)

Answer 2

可以在互联网上找到用于递增神经网络参数的正确推导。在感知器的情况下，每个偏差都会增加一个由它所连接的输出节点确定的“增量”值。我的偏置节点不是简单地实现这一点，而是由它自身和这个“增量”的乘积递增。

由于我的无能，我把这个表达写成

delta_out = error * activation(layers[-1], deriv = True)
derivError_bias = delta_out * self._bias[-1].T
self._bias -= eta * derivError_bias

而不是self._bias -= eta * delta_out

网络现在可以学习具有任何随机分配的权重和偏差的任何随机分配的目标。

感知器神经网络不会学习特定范围内的值

此修改向解决方案迈出了一步。

2 个答案: