线性回归梯度下降批量更新算法错误

时间:2017-09-06 15:03:53

标签: r linear-regression gradient-descent

我正在尝试使用R中的梯度下降(批量更新)执行线性回归。我使用UCI机器学习库中的Bike-Sharing-Dataset创建了以下代码:

data <- read.csv("Bike-Sharing-Dataset/hour.csv")

# Select the useable features
data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")]

# Examine the data structure
str(data1)

summary(data1)

# Linear regression
# Set seed
set.seed(100)

# Split the data
trainingObs<-sample(nrow(data1),0.70*nrow(data1),replace=FALSE)

# Create the training dataset
trainingDS<-data1[trainingObs,]

# Create the test dataset
testDS<-data1[-trainingObs,]

# Create the variables
y <- trainingDS$cnt
X <- as.matrix(trainingDS[-ncol(trainingDS)])

int <- rep(1, length(y))

# Add intercept column to X
X <- cbind(int, X)

# Solve for beta
betas <- solve(t(X) %*% X) %*% t(X) %*% y

# Round the beta values
betas <- round(betas, 2)

print(betas)

gradientR <- function(y, X, epsilon, eta, iters){
  epsilon = 0.0001
  X = as.matrix(data.frame(rep(1,length(y)),X))
  N = dim(X)[1]
  print("Initialize parameters...")
  theta.init = as.matrix(rnorm(n=dim(X)[2], mean=0,sd = 1)) # Initialize theta
  theta.init = t(theta.init)
  e = t(y) - theta.init%*%t(X)
  grad.init = -(2/N)%*%(e)%*%X
  theta = theta.init - eta*(1/N)*grad.init
  l2loss = c()
  for(i in 1:iters){
    l2loss = c(l2loss,sqrt(sum((t(y) - theta%*%t(X))^2)))
    e = t(y) - theta%*%t(X)
    grad = -(2/N)%*%e%*%X
    theta = theta - eta*(2/N)*grad
    if(sqrt(sum(grad^2)) <= epsilon){
      break
    }
  }
  print("Algorithm converged")
  print(paste("Final gradient norm is",sqrt(sum(grad^2))))
  values<-list("coef" = t(theta), "l2loss" = l2loss)
  return(values)
}

gradientR(y, X, eta = 100, iters = 1000)

但是,当我尝试运行此算法时,我收到以下错误:

  

[1]“初始化参数...”if(sqrt(sum(grad ^ 2))&lt; =   epsilon){:缺少需要TRUE / FALSE的值

我需要帮助理解此错误以及如何解决它。另外,有没有一种更有效的方法来实现算法而不使用任何R的标准包和库?

0 个答案:

没有答案