how to adjust the weights in gradient descent

时间:2017-10-23 20:48:05

标签: neural-network gradient-descent

I am currently trying to teach me something about neural networks. So I bought myself this book called Applied Artificial Intelligence written by Wolfgang Beer and I am now stuck at understanding a part of his code. Actually I understand the code I just do not understand one mathematical step behind it... The part looks like this:

    for i in range(iterations):
        guessed = sig(inputs*weights)
        error = output - guessed
        adjustment = error*sig_d(outpus)
        #Why is there no learningrate?
        #Why is the adjustment relative to the error
        #muliplied by the derivative of your main function?
        weights += adjustment

I tried to look up how the gradient descent method works, but I never got the part with ajusting the weights. How does the math behind it work and why do you use the derivative for it? Alo when I started to look in the internet for other solutions I always saw them using a learning rate. I understand the consept of it but why is this method not used in this book? It would realy help me if someone could awnser me these questions...

And thanks for all these rapid responses in the past.

2 个答案:

答案 0 :(得分:1)

enter image description here

为了训练回归模型,我们从任意权重开始并调整权重,以便误差最小。如果我们将误差绘制为权重的函数,我们将获得如上图所示的曲线,其中误差J(θ0,θ1)是权重θ0,θ1的函数。当它的值最小时,我们的错误将在图的最底部成功。红色箭头显示图表中的最小点。为了达到最小点,我们得到误差函数的导数(函数的切线)。切线的斜率是该点的导数,它将为我们提供朝向的方向。我们在最陡下降的方向上降低成本函数。每个步骤的大小由参数α确定,称为学习率。

梯度下降算法是:

repeat until convergence:

θj:=θj −[ Derivative of J(θ0,θ1) in respect of θj]

where
j=0,1 represents the weights' index number.

enter image description here

在上图中,我们绘制误差J(θ1)是权重θ1的函数。我们从θ1的任意值开始并取误差J(θ1)的导数(正切的斜率)来调整权重θ1,这样我们就可以到达误差最小的底部。如果斜率为正,我们必须向左或减小权重θ1。如果斜率为负,我们必须向右或增加θ1。我们必须重复这个过程,直到收敛或达到最低点。

enter image description here

如果学习率α太小,则梯度下降收敛太慢。如果α太大,则梯度下降超调并且不会收敛。

所有数据均来自Andrew Ng在coursera.org上的机器学习课程 https://www.coursera.org/learn/machine-learning/home/welcome

答案 1 :(得分:0)

为什么没有学习?

  • 有许多不同风格的神经网络,有些会使用学习率,有些可能只是保持不变

为什么调整相对于错误

  • 它还有什么相对的?如果有很多错误,那么你可能需要进行大量的调整,如果只有一点点错误,那么你只需要调整一小部分重量。

多功能主导函数的衍生物?

  • 这个没有真正的答案。