如何使用R动态回归和预测多个项目?

时间:2011-08-06 16:23:54

标签: r

我正在尝试编写一个回归多个项目的函数,然后尝试根据模型预测数据:

"tnt" <- function(train_dep, train_indep, test_dep, test_indep) 
{
    y <- train_dep
    x <- train_indep
    mod <- lm (y ~ x)
    estimate <- predict(mod, data.frame(x=test_indep))
    rmse <- sqrt(sum((test_dep-estimate)^2)/length(test_dep)) 
    print(summary(mod))
    print(paste("RMSE: ", rmse))        
}

如果我通过上述内容,则会失败:

train_dep = vector1
train_indep <- cbind(vector2, vector3)
test_dep = vector4
test_indep <- cbind(vector5, vector6)
tnt(train_dep, train_indep, test_dep, test_indep)

将上面的内容更改为类似下面的内容,但我希望动态完成此操作,以便将任意数量的列传递给它:

x1 = x[,1]
x2 = x[,2]
mod <- lm(y ~ x1+x2)
estimate <- predict(mod, data.frame(x1=test_indep[,1], x2=test_indep[,2]))

看起来这可能会有所帮助,但我仍然对这个过程的其余部分感到困惑:http://finzi.psych.upenn.edu/R/Rhelp02a/archive/70843.html

2 个答案:

答案 0 :(得分:2)

使用评论中的as.formula建议进行修改。上面关于将all作为一个data.frame传递并在公式中使用.表示法的罗马评论可能是最好的解决方案,但我在paste中实现了它,因为你应该知道如何使用{ {1}}和paste: - )。

as.formula

答案 1 :(得分:2)

请改为尝试:

tnt <- function(train_dep, train_indep, test_dep, test_indep) 
{   dat<- as.data.frame(cbind(y=train_dep, train_indep))
    mod <- lm (y ~ . , data=dat ) 
    newdat <- as.data.frame(test_indep)
   names(newdat) <- names(dat)[2:length(dat)]

 estimate <- predict(mod, newdata=newdat )
 rmse <- sqrt(sum((test_dep-estimate)^2)/length(test_dep)) 
 print(summary(mod))
 print(paste("RMSE: ", rmse))        
}


Call:
lm(formula = y ~ ., data = dat)

Residuals:
1 2 3 
0 0 0 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0          0      NA       NA    
V2                 1          0     Inf   <2e-16 ***
V3                NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0 on 1 degrees of freedom
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic:   Inf on 1 and 1 DF,  p-value: < 2.2e-16 

[1] "RMSE:  0"
Warning message:
In predict.lm(mod, newdata = newdat) :
  prediction from a rank-deficient fit may be misleading
> 

警告是因为你提供的确切适合