R编程 - 不删除右列

时间:2018-02-22 09:10:05

标签: r variables linear-regression

我写信是为了粘贴我的代码。 我正在关注R中的在线课程,我试图自动化多变量回归。我试图检查发生了什么,在开始时它起作用,但是当涉及到最后两个变量时,它进入一个循环并且不会消除它们,即使它进入if。 最后,我有这个错误

Error in if (maxVar > sl) { : missing value where TRUE/FALSE needed

这是代码

backwardElimination <-function(training,sl) {
  numVar=length(training)
  funzRegressor = lm(formula = profit ~.,
               data = training)
  p = summary(funzRegressor)$coefficients[,4]
  maxVar = max(p)
  if (maxVar > sl){
    for (j in c(1:numVar)){
      if (maxVar == p[j]) {
        training = training[, -j]
        backwardElimination(training,sl)
      }
    }
  }
  return(summary(funzRegressor))
}

提前致谢

编辑:这是我的其余代码

#importing dataset
dataset = read.csv('50_Startups.csv')


# Encoding categorical data
dataset$State = factor(dataset$State,
                         levels = c('New York', 'California', 'Florida'),
                         labels = c(1, 2, 3))

#splitting in train / test set 
library(caTools)
set.seed(123)
split = sample.split(dataset$Profit, SplitRatio = 4/5)
trainingSet = subset(dataset, split == TRUE)
testSet = subset(dataset, split == FALSE)
#Transforming state in dummy variables
trainingSet$State = factor(trainingSet$State)
dummies = model.matrix(~trainingSet$State)
trainingSet = cbind(trainingSet,dummies)
profit = trainingSet$Profit
trainingSet = trainingSet[, -4]
trainingSet = trainingSet[, -4]
trainingSet = cbind(trainingSet,profit)
#calling the function
SL = 0.05
backwardElimination(trainingSet, SL)

3 个答案:

答案 0 :(得分:0)

此错误表示您的NA声明中有if而不是布尔值。

if (NA) {}
## Error in if (NA) { : missing value where TRUE/FALSE needed

您的p包含NAslNA

答案 1 :(得分:0)

您的拦截也会在下一步建模中反馈出来,您需要在转移到下一次迭代之前将其删除。

答案 2 :(得分:0)

我可以使用R内置数据集state.x77

复制您的错误
dataset <- as.data.frame(state.x77)
dataset$State <- rownames(dataset) 
dataset$profit <- rnorm(nrow(dataset))

backwardElimination <-function(training,sl) {
    if (!"profit" %in% names(training)) return(NULL)

    numVar=length(training)
    funzRegressor = lm(formula = profit ~.,
        data = training)
    p = summary(funzRegressor)$coefficients[,4]
    maxVar = max(p)
    #print(funzRegressor)

    if (maxVar > sl){
        for (j in c(1:numVar)){
            if (maxVar == p[j]) {
                training = training[, -j]
                backwardElimination(training,sl)
            }
        }
    }
    return(summary(funzRegressor))
}
backwardElimination(dataset, 0.05)

你的一些测试版中有NAs,所有p值都变为NaN。你需要在州内倒退吗?否则,您可以删除State列以删除错误。

当您到达递归中的边界情况时会出现另一个错误,您可以修复:)