如何实现随机梯度下降

时间:2019-04-27 12:11:37

标签: r

在随机梯度下降中,我们通常将目标函数视为有限数量的函数之和:

             f(x)=∑fi(x) where i = 1 : n

在每次迭代中,随机梯度下降而不是计算梯度∇f(x),而是均匀地随机采样i并计算∇fi(x)

洞察力是,随机梯度下降使用∇fi(x)作为∇f(x)的无偏估计量。

我们将x更新为:x:=x−η∇fi(x),其中η是学习步骤。

对于优化问题,我发现在R中实现此困难。

stoc_grad<-function(){
  # set up a stepsize
  alpha = 0.1

  # set up a number of iteration
  iter = 30

  # define the objective function f(x) = sqrt(2+x)+sqrt(1+x)+sqrt(3+x)
  objFun = function(x) return(sqrt(2+x)+sqrt(1+x)+sqrt(3+x))

  # define the gradient of f(x) = sqrt(2+x)+sqrt(1+x)+sqrt(3+x)
  gradient_1 = function(x) return(1/2*sqrt(2+x))
  gradient_2 = function(x) return(1/2*sqrt(3+x))
  gradient_3 = function(x) return(1/2*sqrt(1+x))

  x = 1

  # create a vector to contain all xs for all steps
  x.All = numeric(iter)

  # gradient descent method to find the minimum
  for(i in seq_len(iter)){
    x = x - alpha*gradient_1(x)
    x = x - alpha*gradient_2(x)
    x = x - alpha*gradient_3(x)
    x.All[i] = x
    print(x)
  }

  # print result and plot all xs for every iteration
  print(paste("The minimum of f(x) is ", objFun(x), " at position x = ", x, sep = ""))
  plot(x.All, type = "l")  

}

算法伪代码: Find pseudo-code here

实际上,我想测试此算法以优化测试功能,例如三峰驼功能。

https://en.wikipedia.org/wiki/Test_functions_for_optimization

其他示例:

enter image description here

1 个答案:

答案 0 :(得分:2)

这里似乎有很多困扰您。到目前为止,按照重要性的顺序,这是我到目前为止发现的两个错误:

  1. 当您有大量数据时,将使用随机梯度下降法,因为对于这些数据,每次迭代评估所有训练观测值的目标函数在计算上是昂贵的。那不是您要解决的问题。观看简短的入门here
  2. 当您的参数受到支持时,例如x≥-1,除非您防止NaN的传播,否则您将遇到问题。

这是一个梯度下降实现,可以解决您的问题(我在重要更改中添加了代码注释):

# Having the number of iterations, step size, and start value be parameters the
# user can alter (with sane default values) I think is a better approach than
# hard coding them in the body of the function
grad<-function(iter = 30, alpha = 0.1, x_init = 1){

    # define the objective function f(x) = sqrt(2+x)+sqrt(1+x)+sqrt(3+x)
    objFun = function(x) return(sqrt(2+x)+sqrt(1+x)+sqrt(3+x))

    # define the gradient of f(x) = sqrt(2+x)+sqrt(1+x)+sqrt(3+x)
    # Note we don't split up the gradient here
    gradient <- function(x) {
        result <- 1 / (2 * sqrt(2 + x))
        result <- result + 1 / (2 * sqrt(1 + x))
        result <- result + 1 / (2 * sqrt(3 + x))
        return(result)
    }

    x <- x_init

    # create a vector to contain all xs for all steps
    x.All = numeric(iter)

    # gradient descent method to find the minimum
    for(i in seq_len(iter)){
        # Guard against NaNs
        tmp <- x - alpha * gradient(x)
        if ( !is.nan(suppressWarnings(objFun(tmp))) ) {
            x <- tmp
        }
        x.All[i] = x
        print(x)
    }

    # print result and plot all xs for every iteration
    print(paste("The minimum of f(x) is ", objFun(x), " at position x = ", x, sep = ""))
    plot(x.All, type = "l")  

}

正如我之前说的,我们知道最小化问题的解析解:x = -1。因此,让我们看看它是如何工作的:

grad()

[1] 0.9107771
[1] 0.8200156
[1] 0.7275966
...
[1] -0.9424109
[1] -0.9424109
[1] "The minimum of f(x) is 2.70279857718352 at position x = -0.942410938107257"

enter image description here