Question

我无法通过使用随机梯度下降优化方法（这意味着批量大小等于1的小批量梯度下降）来估计多元线性回归模型的常数（截距）。我使用的R函数是：

StochasticGradientDescent <- function(data, alpha, iteration, epsilon){data <- matrix(unlist(data), ncol=ncol(data), byrow=FALSE)

independent.variable<- data[,1:ncol(data)-1]
dependent.variable<- data[,ncol(data)]

#add column of 1s for constant
independent.variable <- cbind(theta0 = 1, independent.variable)

theta_new <- matrix( 0, ncol = ncol(independent.variable))
theta_old <- matrix( 1, ncol = ncol(independent.variable))

#Cost function
CostFunction <- function (independent.variable, dependent.variable, theta){
  1/(2*(NROW(dependent.variable))) * sum(((independent.variable %*% t(theta)) - dependent.variable)^2);
}

thetas <- vector( mode = "list", length = iteration ) 
thetas[[1]] <- theta_new
J <- numeric( length = iteration )
J[1] <- CostFunction(independent.variable, dependent.variable, theta_old )

derivative <- function(independent.variable, dependent.variable, theta){
  idx <- sample.int(NROW(independent.variable), 1)
  descent <- (t(independent.variable[idx, , drop = FALSE]) %*% ((independent.variable[idx, , drop = FALSE] %*% t(theta)) - dependent.variable[idx, drop = FALSE]))
  return( t(descent) )
}

#stopping criterion
step <- 1 
while(any(abs(theta_new - theta_old) > epsilon) & step <= iteration )
{
  step <- step + 1

  # gradient descent 
  theta_old <- theta_new
  theta_new <- theta_old - alpha * derivative(independent.variable, dependent.variable, theta_old)

  # record keeping 
  thetas[[step]] <- theta_new
  J[step] <- CostFunction(independent.variable, dependent.variable, theta_new)
}

costs <- data.frame( costs = J )
theta <- data.frame( do.call( rbind, thetas ), row.names = NULL )

return( list( costs = costs, theta = theta))
}

我模拟人工数据。

x1 <- runif(1000000,1,100);
x2 <- runif(1000000,1,200);
y <- 5+4*x1+3*x2;

lm包的QR分解给出了这个结果：

fit <- lm(y ~ x1+x2);
summary(fit)
#
#Call:
#  lm(formula = y ~ x1 + x2)
#
#Residuals:
#      Min         1Q     Median         3Q        Max 
#-7.386e-09  0.000e+00  0.000e+00  0.000e+00  9.484e-10 
#
#Coefficients:
#               Estimate Std. Error   t value Pr(>|t|)    
#(Intercept)   5.000e+00  2.162e-14 2.313e+14   <2e-16 ***
#  x1          4.000e+00  2.821e-16 1.418e+16   <2e-16 ***
#  x2          3.000e+00  1.403e-16 2.138e+16   <2e-16 ***
#  ---
#  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 8.062e-12 on 999997 degrees of freedom
#Multiple R-squared:      1,    Adjusted R-squared:      1 
#F-statistic: 3.292e+32 on 2 and 999997 DF,  p-value: < 2.2e-16

我的初始值是0。学习率选择为0.00005。迭代次数为5000.此处停止标准epsilon（用户定义的小值）为0.000001。如果训练参数的两次迭代之间的差异小于该值，则算法将停止。我得到的结果如下：

data<- data.frame(cbind(x1, x2, y))
results <- StochasticGradientDescent( data = data, alpha = 0.00005, iteration = 5000, epsilon = .000001)
results$theta[ nrow(results$theta), ]
#        theta0      V2       V3
#5001 0.2219142 4.04408 2.999861

正如您所看到的，系数估计与实际非常接近。然而，θ0（截距/常数）的系数估计甚至不接近。此外，我在迭代周期结束时得到这些值，这是不好的。我无法有效收敛。我试过但我真的无法弄清楚为什么会这样。有人可以帮帮我吗？

使用随机梯度下降估计线性回归中的常数

0 个答案: