我正在尝试使用R中的梯度下降(批量更新)执行线性回归。我使用UCI机器学习库中的Bike-Sharing-Dataset创建了以下代码:
data <- read.csv("Bike-Sharing-Dataset/hour.csv")
# Select the useable features
data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")]
# Examine the data structure
str(data1)
summary(data1)
# Linear regression
# Set seed
set.seed(100)
# Split the data
trainingObs<-sample(nrow(data1),0.70*nrow(data1),replace=FALSE)
# Create the training dataset
trainingDS<-data1[trainingObs,]
# Create the test dataset
testDS<-data1[-trainingObs,]
# Create the variables
y <- trainingDS$cnt
X <- as.matrix(trainingDS[-ncol(trainingDS)])
int <- rep(1, length(y))
# Add intercept column to X
X <- cbind(int, X)
# Solve for beta
betas <- solve(t(X) %*% X) %*% t(X) %*% y
# Round the beta values
betas <- round(betas, 2)
print(betas)
gradientR <- function(y, X, epsilon, eta, iters){
epsilon = 0.0001
X = as.matrix(data.frame(rep(1,length(y)),X))
N = dim(X)[1]
print("Initialize parameters...")
theta.init = as.matrix(rnorm(n=dim(X)[2], mean=0,sd = 1)) # Initialize theta
theta.init = t(theta.init)
e = t(y) - theta.init%*%t(X)
grad.init = -(2/N)%*%(e)%*%X
theta = theta.init - eta*(1/N)*grad.init
l2loss = c()
for(i in 1:iters){
l2loss = c(l2loss,sqrt(sum((t(y) - theta%*%t(X))^2)))
e = t(y) - theta%*%t(X)
grad = -(2/N)%*%e%*%X
theta = theta - eta*(2/N)*grad
if(sqrt(sum(grad^2)) <= epsilon){
break
}
}
print("Algorithm converged")
print(paste("Final gradient norm is",sqrt(sum(grad^2))))
values<-list("coef" = t(theta), "l2loss" = l2loss)
return(values)
}
gradientR(y, X, eta = 100, iters = 1000)
但是,当我尝试运行此算法时,我收到以下错误:
[1]“初始化参数...”if(sqrt(sum(grad ^ 2))&lt; = epsilon){:缺少需要TRUE / FALSE的值
我需要帮助理解此错误以及如何解决它。另外,有没有一种更有效的方法来实现算法而不使用任何R的标准包和库?