在随机梯度下降中,我们通常将目标函数视为有限数量的函数之和:
f(x)=∑fi(x) where i = 1 : n
在每次迭代中,随机梯度下降而不是计算梯度∇f(x)
,而是均匀地随机采样i
并计算∇fi(x)
。
洞察力是,随机梯度下降使用∇fi(x)
作为∇f(x)
的无偏估计量。
我们将x
更新为:x:=x−η∇fi(x)
,其中η
是学习步骤。
对于优化问题,我发现在R中实现此困难。
stoc_grad<-function(){
# set up a stepsize
alpha = 0.1
# set up a number of iteration
iter = 30
# define the objective function f(x) = sqrt(2+x)+sqrt(1+x)+sqrt(3+x)
objFun = function(x) return(sqrt(2+x)+sqrt(1+x)+sqrt(3+x))
# define the gradient of f(x) = sqrt(2+x)+sqrt(1+x)+sqrt(3+x)
gradient_1 = function(x) return(1/2*sqrt(2+x))
gradient_2 = function(x) return(1/2*sqrt(3+x))
gradient_3 = function(x) return(1/2*sqrt(1+x))
x = 1
# create a vector to contain all xs for all steps
x.All = numeric(iter)
# gradient descent method to find the minimum
for(i in seq_len(iter)){
x = x - alpha*gradient_1(x)
x = x - alpha*gradient_2(x)
x = x - alpha*gradient_3(x)
x.All[i] = x
print(x)
}
# print result and plot all xs for every iteration
print(paste("The minimum of f(x) is ", objFun(x), " at position x = ", x, sep = ""))
plot(x.All, type = "l")
}
算法伪代码: Find pseudo-code here
实际上,我想测试此算法以优化测试功能,例如三峰驼功能。
https://en.wikipedia.org/wiki/Test_functions_for_optimization
其他示例:
答案 0 :(得分:2)
这里似乎有很多困扰您。到目前为止,按照重要性的顺序,这是我到目前为止发现的两个错误:
NaN
的传播,否则您将遇到问题。这是一个梯度下降实现,可以解决您的问题(我在重要更改中添加了代码注释):
# Having the number of iterations, step size, and start value be parameters the
# user can alter (with sane default values) I think is a better approach than
# hard coding them in the body of the function
grad<-function(iter = 30, alpha = 0.1, x_init = 1){
# define the objective function f(x) = sqrt(2+x)+sqrt(1+x)+sqrt(3+x)
objFun = function(x) return(sqrt(2+x)+sqrt(1+x)+sqrt(3+x))
# define the gradient of f(x) = sqrt(2+x)+sqrt(1+x)+sqrt(3+x)
# Note we don't split up the gradient here
gradient <- function(x) {
result <- 1 / (2 * sqrt(2 + x))
result <- result + 1 / (2 * sqrt(1 + x))
result <- result + 1 / (2 * sqrt(3 + x))
return(result)
}
x <- x_init
# create a vector to contain all xs for all steps
x.All = numeric(iter)
# gradient descent method to find the minimum
for(i in seq_len(iter)){
# Guard against NaNs
tmp <- x - alpha * gradient(x)
if ( !is.nan(suppressWarnings(objFun(tmp))) ) {
x <- tmp
}
x.All[i] = x
print(x)
}
# print result and plot all xs for every iteration
print(paste("The minimum of f(x) is ", objFun(x), " at position x = ", x, sep = ""))
plot(x.All, type = "l")
}
正如我之前说的,我们知道最小化问题的解析解:x = -1。因此,让我们看看它是如何工作的:
grad()
[1] 0.9107771
[1] 0.8200156
[1] 0.7275966
...
[1] -0.9424109
[1] -0.9424109
[1] "The minimum of f(x) is 2.70279857718352 at position x = -0.942410938107257"