Question

我一直在做一个教科书问题，要求我确定某个x的95％置信区间。这本书带有一个R手动复制表，但它告诉我附上（）数据帧。我知道你不应该使用attach（）（参见：http://www.r-bloggers.com/to-attach-or-not-attach-that-is-the-question/）。因此，我一直将变量名称直接列为DataFrame $ Variable，到目前为止它一直运行正常，直到我开始使用predict（）。如果我按照教科书的r-instruction手册，会发生以下情况：

> attach(TextPrices)
> new.data <- data.frame(Pages=450)
> TextPrices.lm1 <- lm(Price ~ Pages)
> predict(TextPrices.lm1, new.data, int="confidence")
       fit      lwr      upr
1 62.87549 51.73074 74.02024
> predict(TextPrices.lm1, new.data, int="prediction")
       fit       lwr      upr
1 62.87549 0.9035981 124.8474

哪个是完美的。匹配我在谷歌上发现的相同问题（http://www.r-tutor.com/elementary-statistics/simple-linear-regression/confidence-interval-linear-regression）。然而，使用DataFram $ Variable进行操作会让一切变得混乱，我不知道为什么。

> TextPrices.lm1 <- lm(TextPrices$Price ~ TextPrices$Pages)
> new.data <- data.frame(TextPrices$Pages = 450)
Error: unexpected '=' in "new.data <- data.frame(TextPrices$Pages ="
> new.data <- data.frame(Pages = 450)
> predict(TextPrices.lm1, new.data, interval="confidence")

上面的代码给了我30行fit，lwr和upr。附带警告信息：

Warning message:
'newdata' had 1 row but variables found have 30 rows

我很确定问题出在我输入代码的方式，不知道是什么意思。

Answer 1

由于您的数据框显然是机密的，我们可以从以下内容开始构建数据框：

text_prices <- data.frame(pages = round(runif(30, 100, 600), 0), 
                          price = round(runif(30, 10, 120), 2))

接下来，我们尝试按照您的方式制作模型：

text_prices.lm1 <- lm(text_prices$price ~ text_price$pages)
new_data <- data.frame(pages = 450)
predict(text_prices.lm1, new_data, interval = "confidence")
#         fit      lwr       upr
# 1  81.56752 58.11610 105.01894
# 2  75.35715 61.54237  89.17193
# 3  72.56597 58.21001  86.92194
# .
# .
# .
# 29 79.96259 59.83313 100.09205
# 30 74.76402 61.16544  88.36261
# Warning message:
# 'newdata' had 1 row but variables found have 30 rows

同样的错误。因此，考虑到它在我们附加数据时起作用的事实，但现在不是，可能问题来自于我们错误地将数据输入lm的事实。让我们以不同的方式尝试：

text_prices.lm1 <- lm(data = text_prices, price ~ pages)
new_data <- data.frame(pages = 450)
predict(text_prices.lm1, new_data, interval = "confidence")
#        fit      lwr      upr
# 1 78.46233 61.06646 95.85821

我不完全确定为什么这会修复解决方案，但这就是你不需要attach数据的方法。

为什么predict（）要求我附加（）？

1 个答案: