I am currently trying to teach me something about neural networks. So I bought myself this book called Applied Artificial Intelligence written by Wolfgang Beer and I am now stuck at understanding a part of his code. Actually I understand the code I just do not understand one mathematical step behind it... The part looks like this:
for i in range(iterations):
guessed = sig(inputs*weights)
error = output - guessed
adjustment = error*sig_d(outpus)
#Why is there no learningrate?
#Why is the adjustment relative to the error
#muliplied by the derivative of your main function?
weights += adjustment
I tried to look up how the gradient descent method works, but I never got the part with ajusting the weights. How does the math behind it work and why do you use the derivative for it? Alo when I started to look in the internet for other solutions I always saw them using a learning rate. I understand the consept of it but why is this method not used in this book? It would realy help me if someone could awnser me these questions...
And thanks for all these rapid responses in the past.
答案 0 :(得分:1)
为了训练回归模型,我们从任意权重开始并调整权重,以便误差最小。如果我们将误差绘制为权重的函数,我们将获得如上图所示的曲线,其中误差J(θ0,θ1)是权重θ0,θ1的函数。当它的值最小时,我们的错误将在图的最底部成功。红色箭头显示图表中的最小点。为了达到最小点,我们得到误差函数的导数(函数的切线)。切线的斜率是该点的导数,它将为我们提供朝向的方向。我们在最陡下降的方向上降低成本函数。每个步骤的大小由参数α确定,称为学习率。
梯度下降算法是:
repeat until convergence:
θj:=θj −[ Derivative of J(θ0,θ1) in respect of θj]
where
j=0,1 represents the weights' index number.
在上图中,我们绘制误差J(θ1)是权重θ1的函数。我们从θ1的任意值开始并取误差J(θ1)的导数(正切的斜率)来调整权重θ1,这样我们就可以到达误差最小的底部。如果斜率为正,我们必须向左或减小权重θ1。如果斜率为负,我们必须向右或增加θ1。我们必须重复这个过程,直到收敛或达到最低点。
如果学习率α太小,则梯度下降收敛太慢。如果α太大,则梯度下降超调并且不会收敛。
所有数据均来自Andrew Ng在coursera.org上的机器学习课程 https://www.coursera.org/learn/machine-learning/home/welcome
答案 1 :(得分:0)