Question

有人能告诉我两个lm（）函数和它们的输出之间的区别吗？

lm(cars$dist ~ cars$speed)
lm(dist ~ speed, data = cars)

Answer 1

拟合模型没有真正的区别：

> data(cars)
> m1 <- lm(cars$dist ~ cars$speed)
> m2 <- lm(dist ~ speed, data = cars)
> all.equal(m1, m2)
[1] "Component “coefficients”: Names: 1 string mismatch"                                                  
[2] "Component “effects”: Names: 1 string mismatch"                                                       
[3] "Component “qr”: Component “qr”: Attributes: < Component “dimnames”: Component 2: 1 string mismatch >"
[4] "Component “call”: target, current do not match when deparsed"                                        
[5] "Component “terms”: formulas differ in contents"                                                      
[6] "Component “model”: Names: 2 string mismatches"                                                       
[7] "Component “model”: Attributes: < Component “terms”: formulas differ in contents >"

这些差异都归因于模型中变量的派生名称。

但是，第二种形式更有用。例如，从第一个模型预测完全是屁股的痛苦：

df <- with(cars, data.frame(speed = c(30, 40)))
predict(m1, newdata = df)
predict(m2, newdata = df)

> predict(m1, newdata = df)
        1         2         3         4         5         6         7         8 
-1.849460 -1.849460  9.947766  9.947766 13.880175 17.812584 21.744993 21.744993 
        9        10        11        12        13        14        15        16 
21.744993 25.677401 25.677401 29.609810 29.609810 29.609810 29.609810 33.542219 
       17        18        19        20        21        22        23        24 
33.542219 33.542219 33.542219 37.474628 37.474628 37.474628 37.474628 41.407036 
       25        26        27        28        29        30        31        32 
41.407036 41.407036 45.339445 45.339445 49.271854 49.271854 49.271854 53.204263 
       33        34        35        36        37        38        39        40 
53.204263 53.204263 53.204263 57.136672 57.136672 57.136672 61.069080 61.069080 
       41        42        43        44        45        46        47        48 
61.069080 61.069080 61.069080 68.933898 72.866307 76.798715 76.798715 76.798715 
       49        50 
76.798715 80.731124 
Warning message:
'newdata' had 2 rows but variables found have 50 rows 
> predict(m2, newdata = df)
       1        2 
100.3932 139.7173

第二个版本是正确的版本，根据m1从适合的模型中获取合适的数据框并非易事。

帮自己一个忙，并使用data参数使用第二种形式。

两个lm（）函数和它们的输出之间的区别

1 个答案: