我无法通过使用随机梯度下降优化方法(这意味着批量大小等于1的小批量梯度下降)来估计多元线性回归模型的常数(截距)。我使用的R函数是:
StochasticGradientDescent <- function(data, alpha, iteration, epsilon){data <- matrix(unlist(data), ncol=ncol(data), byrow=FALSE)
independent.variable<- data[,1:ncol(data)-1]
dependent.variable<- data[,ncol(data)]
#add column of 1s for constant
independent.variable <- cbind(theta0 = 1, independent.variable)
theta_new <- matrix( 0, ncol = ncol(independent.variable))
theta_old <- matrix( 1, ncol = ncol(independent.variable))
#Cost function
CostFunction <- function (independent.variable, dependent.variable, theta){
1/(2*(NROW(dependent.variable))) * sum(((independent.variable %*% t(theta)) - dependent.variable)^2);
}
thetas <- vector( mode = "list", length = iteration )
thetas[[1]] <- theta_new
J <- numeric( length = iteration )
J[1] <- CostFunction(independent.variable, dependent.variable, theta_old )
derivative <- function(independent.variable, dependent.variable, theta){
idx <- sample.int(NROW(independent.variable), 1)
descent <- (t(independent.variable[idx, , drop = FALSE]) %*% ((independent.variable[idx, , drop = FALSE] %*% t(theta)) - dependent.variable[idx, drop = FALSE]))
return( t(descent) )
}
#stopping criterion
step <- 1
while(any(abs(theta_new - theta_old) > epsilon) & step <= iteration )
{
step <- step + 1
# gradient descent
theta_old <- theta_new
theta_new <- theta_old - alpha * derivative(independent.variable, dependent.variable, theta_old)
# record keeping
thetas[[step]] <- theta_new
J[step] <- CostFunction(independent.variable, dependent.variable, theta_new)
}
costs <- data.frame( costs = J )
theta <- data.frame( do.call( rbind, thetas ), row.names = NULL )
return( list( costs = costs, theta = theta))
}
我模拟人工数据。
x1 <- runif(1000000,1,100);
x2 <- runif(1000000,1,200);
y <- 5+4*x1+3*x2;
lm包的QR分解给出了这个结果:
fit <- lm(y ~ x1+x2);
summary(fit)
#
#Call:
# lm(formula = y ~ x1 + x2)
#
#Residuals:
# Min 1Q Median 3Q Max
#-7.386e-09 0.000e+00 0.000e+00 0.000e+00 9.484e-10
#
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 5.000e+00 2.162e-14 2.313e+14 <2e-16 ***
# x1 4.000e+00 2.821e-16 1.418e+16 <2e-16 ***
# x2 3.000e+00 1.403e-16 2.138e+16 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 8.062e-12 on 999997 degrees of freedom
#Multiple R-squared: 1, Adjusted R-squared: 1
#F-statistic: 3.292e+32 on 2 and 999997 DF, p-value: < 2.2e-16
我的初始值是0。学习率选择为0.00005。迭代次数为5000.此处停止标准epsilon(用户定义的小值)为0.000001。如果训练参数的两次迭代之间的差异小于该值,则算法将停止。我得到的结果如下:
data<- data.frame(cbind(x1, x2, y))
results <- StochasticGradientDescent( data = data, alpha = 0.00005, iteration = 5000, epsilon = .000001)
results$theta[ nrow(results$theta), ]
# theta0 V2 V3
#5001 0.2219142 4.04408 2.999861
正如您所看到的,系数估计与实际非常接近。然而,θ0(截距/常数)的系数估计甚至不接近。此外,我在迭代周期结束时得到这些值,这是不好的。我无法有效收敛。我试过但我真的无法弄清楚为什么会这样。有人可以帮帮我吗?