如何使用从R中的不同数据集创建的模型预测新数据集的结果?

时间:2014-04-17 19:07:40

标签: r linear-regression predict

我可能会遗漏一些关于预测的内容 - 但我的多元线性回归似乎按预期工作:

> bigmodel <- lm(score ~ lean + gender + age, data = mydata)
> summary(bigmodel)

Call:
lm(formula = score ~ lean + gender + age, data = mydata)

Residuals:
    Min      1Q  Median      3Q     Max 
-25.891  -4.354   0.892   6.240  18.537 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 70.96455    3.85275  18.419   <2e-16 ***
lean         0.62463    0.05938  10.518   <2e-16 ***
genderM     -2.24025    1.40362  -1.596   0.1121    
age          0.10783    0.06052   1.782   0.0764 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9 on 195 degrees of freedom
Multiple R-squared:  0.4188,    Adjusted R-squared:  0.4098 
F-statistic: 46.83 on 3 and 195 DF,  p-value: < 2.2e-16

> head(predict(bigmodel),20)
       1        2        3        4        5        6        7        8        9       10 
75.36711 74.43743 77.02533 78.76903 79.95515 79.09251 80.38647 81.65807 80.14846 78.96234 
      11       12       13       14       15       16       17       18       19       20 
82.39052 82.04468 81.05187 81.26753 84.50240 81.80667 80.92169 82.40895 81.76197 82.94809

但在阅读?predict.lm之后,我无法绕过预测。对于我的原始数据集,此输出看起来不错 - 但是如果我想针对不同的数据集运行预测而不是我用来创建bigmodel的数据集呢?

例如,如果我将.csv文件导入到名为newmodel的R中,其中200人完成了倾向,性别和年龄 - 我如何使用bigmodel中的回归公式来生成预测对于newmodel

谢谢!

1 个答案:

答案 0 :(得分:3)

如果您阅读了predict.lm的文档,则会看到以下内容。因此,使用newdata参数传递您导入的newmodel数据以获得预测。

predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
        interval = c("none", "confidence", "prediction"),
        level = 0.95, type = c("response", "terms"),
        terms = NULL, na.action = na.pass,
        pred.var = res.var/weights, weights = 1, ...)
Arguments

object  
Object of class inheriting from "lm"

newdata 
An optional data frame in which to look for variables with which to predict. 
If omitted, the fitted values are used.

更新。关于使用预测导出数据的问题,以下是您可以执行此操作的方法。

predictions = cbind(newmodel, pred = predict(bigmodel, newdata = newmodel))
write.csv(predictions, 'predictions.csv', row.names = F)

更新2.完全可重复的完整解决方案

bigmodel <- lm(mpg ~ wt, data = mtcars)
newdata = data.frame(wt = runif(20, min = 1.5, max = 6))

cbind(
  newdata,
  mpg = predict(bigmodel, newdata = newdata)
)