神经网络向后传播的问题

时间:2019-05-02 03:09:36

标签: neural-network backpropagation

我的前瞻猜测功能运行良好,但是当我进行训练时,反向传播无法正确调整权重值,导致所有输出收敛到〜.5。

我浏览了一些不同的编码教程视频,以及大量的理论和数学读物,但这似乎对我的代码没有帮助。

    Class NeuralNetwork:

           def __init__(self,IN,HL,HN,ON,LR):

            """
            Creates a neural network with IN number of input nodes,
            HL number of hidden layers, HN number of hidden nodes,
            ON number of output nodes, and a learning rate of LR
            """

            self.weights = []
            self.layers = HL + 2
            self.sets = HL + 1
            self.learningRate = LR
            if HL == 0:
                inputWeights = np.random.rand(ON,IN)
                inputWeights = np.asmatrix(inputWegihts)
                self.weights.append(inputWEights)
            else:
                inputWeights = np.random.rand(HN,IN)
                inputWeights = np.asmatrix(inputWeights)
                self.weights.append(inputWeights)
                for x in range(self.layers-3):
                    self.bias.append(random.randon())
                    hiddenWeights = np.random.rand(HN,HN) 
                    hiddenWeights = np.asmatrix(hiddenWeights)
                    self.weights.append(hiddenWeights)
                hiddenWeights = np.random.rand(ON,HN)
                hiddenWeights = np.asmatrix(hiddenWeights)
                self.weights.append(hiddenWeights)

        @staticmethod
        def sigmoid(x):

            y = (1/(1+(np.exp(-x))))

            return y

        @staticmethod
        def dsigmoid(x):

            y = x
            y_1 = (1-x)
            z = np.multiply(y,y_1)

            return z        

        def train(self,inputs,targets):

            a = []
            inputs = np.mat(inputs).T
            targets = np.mat(targets).T
            a.append(inputs)
            z = []
            for x in range(self.layers-1):
                z.append(np.dot(self.weights[x],a[x]))
                a.append(NeuralNetwork.sigmoid(z[x]))

            cost = np.square(np.subtract(a[-1],targets))

            error = []
            deltaW = []
            deltaB = []

            """
            This is the start of back propagation.
            """

            error.append(np.subtract(a[-1],targets))
            err = np.multiply(NeuralNetwork.dsigmoid(a[-1]),a[self.layers-2])
            deltaW.append(np.asmatrix(np.multiply(error[-1].T,err.T)))
            deltaB.append(error[-1].T)

            for x in reversed(range(1,self.sets)):
                error_temp = np.multiply(NeuralNetwork.dsigmoid(a[x+1]),self.weights[x])
                error.insert(0,np.multiply(error_temp,error[-1]))
                err = np.multiply(NeuralNetwork.dsigmoid(a[x]),a[x-1])
                deltaW.insert(0,np.multiply(err,error[0]).T)
                deltaB.insert(0,error[0])

            """
            Tweak the weights and biases of the neural network
            """

            for x in range(self.sets):
                self.weights[x] -= self.learningRate*deltaW[x]

    brain = NeuralNetwork(2,1,2,1,.5)

    inputs = [[1,0],[0,1],[0,0],[1,1]]
    target = [[1],[1],[0],[0]]

    for x in range(50000):
        for x in range(5):
            r = random.randint(0,3)
        brain.train(inputs[r],target[r])

    for x in inputs:
        brain.guess(x)

1 个答案:

答案 0 :(得分:0)

到底谁说[NextLayer x CurrentLayer]形式的体重矩阵更容易?这太可怕了!算了!

使输入矩阵具有与现在相同的方式。生成您的体重矩阵和偏差为:

w0 = np.random.normal(0., 1., size=(IN, HN))
w1 = np.random.normal(0., 1., size=(HN, ON))
b0 = np.zeros(HN)
b1 = np.zeros(ON)

像这样,您的前向传播要简单得多:

z, a = [], [inputs]
for _ in range(self.layer):
    z.append(a[-1] @ w[i] + b[i])
    a.append(self.sigmoid(z[-1])

输入*重量+偏差->应用激活功能。没有无意义的转置就不会做任何运算。除此之外,如果您将S形更改为其他任何值,那么在补偿不正确的S形导数时,您将最终回到现在的位置。

@staticmethod
def dsigmoid(x):
    return NeuralNetwork.sigmoid(x) * (1. - NeuralNetwork.sigmoid(x))

是正确的导数。在反向传播中,当您计算出da [i] / dz [i](在您的情况下被证明是dsigmoid)时,参数不是激活。

z[i] = a[i] * w[i] + b[i]
a[i+1]= f(z[i])

计算a [i + 1]相对于z [i]的偏导数,肯定不是f'(a [i + 1]),而是f'(z [i])。以后您会省去很多麻烦。老实说,我不太了解您的反向传播算法。

当您获得网络的猜测时,将其称为“猜测”。您将错误计算为:

cost = .5 * (guess - target) ** 2

至少对于一个培训示例来说,但是好像您在做SGD。现在,第一步是将增量计算为:

delta = (guess - target)

,然后向后遍历各层,对于每次迭代,您将计算dw [i],db [i]和即将到来的(上一层)的新增量:

delta = delta * dsigmoid(z[i])
dw[i] = a[i-1].T @ delta
db[i] = np.sum(delta, axis=0)
delta = delta @ w[i]

基本上就是这些。除此之外,请考虑阅读激活功能。乙状结肠是一种激活功能,尤其会受到梯度消失问题的影响。这意味着,乙状结肠的导数是如此之小,以至于您的体重根本不变。看一下ReLu或Leaky-ReLu,它有很多变体,但是还有Google的新Swish激活功能。对于conv图层,使用Swish在CIFAR-10上我获得了相当不错的结果。