我正在尝试编写一个回归多个项目的函数,然后尝试根据模型预测数据:
"tnt" <- function(train_dep, train_indep, test_dep, test_indep)
{
y <- train_dep
x <- train_indep
mod <- lm (y ~ x)
estimate <- predict(mod, data.frame(x=test_indep))
rmse <- sqrt(sum((test_dep-estimate)^2)/length(test_dep))
print(summary(mod))
print(paste("RMSE: ", rmse))
}
如果我通过上述内容,则会失败:
train_dep = vector1
train_indep <- cbind(vector2, vector3)
test_dep = vector4
test_indep <- cbind(vector5, vector6)
tnt(train_dep, train_indep, test_dep, test_indep)
将上面的内容更改为类似下面的内容,但我希望动态完成此操作,以便将任意数量的列传递给它:
x1 = x[,1]
x2 = x[,2]
mod <- lm(y ~ x1+x2)
estimate <- predict(mod, data.frame(x1=test_indep[,1], x2=test_indep[,2]))
看起来这可能会有所帮助,但我仍然对这个过程的其余部分感到困惑:http://finzi.psych.upenn.edu/R/Rhelp02a/archive/70843.html
答案 0 :(得分:2)
使用评论中的as.formula
建议进行修改。上面关于将all作为一个data.frame传递并在公式中使用.
表示法的罗马评论可能是最好的解决方案,但我在paste
中实现了它,因为你应该知道如何使用{ {1}}和paste
: - )。
as.formula
答案 1 :(得分:2)
请改为尝试:
tnt <- function(train_dep, train_indep, test_dep, test_indep)
{ dat<- as.data.frame(cbind(y=train_dep, train_indep))
mod <- lm (y ~ . , data=dat )
newdat <- as.data.frame(test_indep)
names(newdat) <- names(dat)[2:length(dat)]
estimate <- predict(mod, newdata=newdat )
rmse <- sqrt(sum((test_dep-estimate)^2)/length(test_dep))
print(summary(mod))
print(paste("RMSE: ", rmse))
}
Call:
lm(formula = y ~ ., data = dat)
Residuals:
1 2 3
0 0 0
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0 0 NA NA
V2 1 0 Inf <2e-16 ***
V3 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0 on 1 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: Inf on 1 and 1 DF, p-value: < 2.2e-16
[1] "RMSE: 0"
Warning message:
In predict.lm(mod, newdata = newdat) :
prediction from a rank-deficient fit may be misleading
>
警告是因为你提供的确切适合