问题非常简单,只有5个样本。
但是Gradient Descent收敛速度非常慢,就像几百万次迭代一样。
为什么,算法中有错误?
P.S。下面的Julia代码:
X = [
1.0 34.6237 78.0247;
1.0 30.2867 43.895;
1.0 35.8474 72.9022;
1.0 60.1826 86.3086;
1.0 79.0327 75.3444
]
Y = [0 0 0 1 1]'
sigmoid(z) = 1 / (1 + e ^ -z)
# Cost function.
function costJ(Theta, X, Y)
m = length(Y)
H = map(z -> sigmoid(z), (Theta' * X')')
sum((-Y)'*log(H) - (1-Y)'*log(1 - H)) / m
end
# Gradient.
function gradient(Theta, X, Y)
m = length(Y)
H = map(z -> sigmoid(z), (Theta' * X')')
(((X'*H - X'*Y)') / m)'
end
# Gradient Descent.
function gradientDescent(X, Y, Theta, alpha, nIterations)
m = length(Y)
jHistory = Array(Float64, nIterations)
for i = 1:nIterations
jHistory[i] = costJ(Theta, X, Y)
Theta = Theta - alpha * gradient(Theta, X, Y)
end
Theta, jHistory
end
gradientDescent(X, Y, [0 0 0]', 0.0001, 1000)
答案 0 :(得分:4)
我认为@colinefang的评论可能是正确的诊断。尝试绘制jHistory
- 它总是减少吗?
您可以做的另一件事是在每次迭代时添加simple linesearch以确保成本总是降低,例如:
function linesearch(g, X, Y, Theta; alpha=1.0)
init_cost = costJ(Theta, X, Y)
while costJ(Theta - alpha*g, X, Y) > init_cost
alpha = alpha / 2.0 # or divide by some other constant >1
end
return alpha
end
然后稍微修改渐变下降函数以在每次迭代时搜索alpha:
for i = 1:nIterations
g = gradient(Theta, X, Y)
alpha = linesearch(g,X,Y,Theta)
Theta = Theta - alpha * g
end
您可以对上述代码进行各种性能增强。我只想告诉你这种味道。