Question

我的数据如下：

#>   group.name x y
#> 1          a 1 2
#> 2          a 2 4
#> 3          a 3 6
#> 4          b 1 4
#> 5          b 2 3
#> 6          b 3 2
#> 7          c 1 2
#> 8          c 2 5
#> 9          c 3 8

df <- data.frame(stringsAsFactors=FALSE,
   group.name = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
            x = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
            y = c(2, 4, 6, 4, 3, 2, 2, 5, 8)
)

我试图为每个“ group.name”运行一个线性模型，所以我尝试了以下代码：

group_by group.name
创建嵌套df
map()对嵌套df的lm函数。

但是我遇到一个错误。谁能解释我在做什么错？谢谢。

library(tidyverse) 
models <- df %>%
  group_by(group.name) %>%
  nest() %>% 
  map(~ lm(y ~ x, data = .))


#> Error in eval(predvars, data, env): invalid 'envir' argument of type 'character'
models
#> Error in eval(expr, envir, enclos): object 'models' not found

Answer 1

提供给map的数据不是预期的格式。尝试使用group_split

library(dplyr)
library(purrr)

df %>%
  group_split(group.name,keep = FALSE) %>%
  map(~lm(y ~ x, data = .))


#[[1]]

#Call:
#lm(formula = y ~ x, data = .)

#Coefficients:
#(Intercept)            x  
#          0            2  


#[[2]]

#Call:
#lm(formula = y ~ x, data = .)

#Coefficients:
#(Intercept)            x  
#          5           -1  


#[[3]]

#Call:
#lm(formula = y ~ x, data = .)

#Coefficients:
#(Intercept)            x  
#         -1            3

Answer 2

尝试以下操作：

df %>% group_by(group.name) %>% summarise(mod=list(lm(y~x))) ->df1
df1$mod[[1]]

#Call:
#lm(formula = y ~ x)

#Coefficients:
#(Intercept)            x  
#          0            2

Answer 3

我觉得更直观的解决方案是将模型保留在数据框中，直到您要提取它们为止。

models_df <- df %>%
  nest(-group.name) %>% 
  mutate(models = map(data, ~lm(y ~ x, data = .)))

如下所示：

# A tibble: 3 x 3
  group.name data             models
  <chr>      <list>           <list>
1 a          <tibble [3 × 2]> <lm>  
2 b          <tibble [3 × 2]> <lm>  
3 c          <tibble [3 × 2]> <lm>

然后，如果要提取模型，请执行以下操作：

models_df %>% 
  pull(models)

为您提供型号列表：

[[1]]

Call:
lm(formula = y ~ x, data = .)

Coefficients:
(Intercept)            x  
          0            2  

[[2]]

Call:
lm(formula = y ~ x, data = .)

Coefficients:
(Intercept)            x  
          5           -1  

[[3]]

Call:
lm(formula = y ~ x, data = .)

Coefficients:
(Intercept)            x  
         -1            3

Answer 4

类似的问题就是tidyverse package broom存在的原因。

require(broom)
df %>% 
    group_by(group.name) %>%
    do(tidy(lm(data = ., formula = y ~ x)))

df %>% 
    group_by(group.name) %>%
    do(glance(lm(data = ., formula = y ~ x)))

第一个代码块显示出一个最适合参数的数据框架。

# A tibble: 6 x 6
# Groups:   group.name [3]
  group.name term        estimate std.error  statistic    p.value
  <chr>      <chr>          <dbl>     <dbl>      <dbl>      <dbl>
1 a          (Intercept)       0   0.        NaN       NaN       
2 a          x                 2   0.        Inf         0.      
3 b          (Intercept)       5.  1.02e-15    4.91e15   1.30e-16
4 b          x                -1.  4.71e-16   -2.12e15   3.00e-16
5 c          (Intercept)      -1.  1.36e-15   -7.37e14   8.64e-16
6 c          x                 3.  6.28e-16    4.78e15   1.33e-16

第二个代码块从拟合操作中得出摘要统计信息的数据框。

关键是所有拟合的结果都以方便的数据帧结构进行格式化。不在列表，命名列表或具有任意结构的S3或S4对象中。一旦模型为数据帧格式，即可对建模结果进行下游处理，使用熟悉的tidyverse工具。如果您正在做很多此类事情，则可能需要给扫帚一下。（缺点是引入了另一个依赖关系，如果您已经编写了很多代码来解析模型拟合列表的结构，则需要重新调整它们。）

Answer 5

在大多数情况下，尤其是在这种简单情况下，将分组变量合并到模型中将更加简单。

md <- lm(y ~ x*group.name - 1, data = df) 
summary(md)

添加-1会删除截距，然后将由变量group.namea，group.nameb等给定截距。摘要

lm(formula = y ~ x * group.name - 1, data = df)

Residuals:
         1          2          3          4          5          6          7          8          9 
 1.052e-15 -2.104e-15  1.052e-15  4.019e-16 -8.038e-16  4.019e-16 -2.313e-17  4.626e-17 -2.313e-17 

Coefficients:
                Estimate Std. Error    t value Pr(>|t|)    
x              2.000e+00  1.126e-15  1.776e+15   <2e-16 ***
group.namea   -1.820e-15  2.433e-15 -7.480e-01    0.509    
group.nameb    5.000e+00  2.433e-15  2.055e+15   <2e-16 ***
group.namec   -1.000e+00  2.433e-15 -4.110e+14   <2e-16 ***
x:group.nameb -3.000e+00  1.593e-15 -1.883e+15   <2e-16 ***
x:group.namec  1.000e+00  1.593e-15  6.278e+14   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.593e-15 on 3 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 1.169e+31 on 6 and 3 DF,  p-value: < 2.2e-16

给出所有3个模型。我们有

a组的模型：y = -1.82*10^-15 + 2 * x
b组的模型：y = 10 + (2 - 3) * x = 10 - 1 x
c组的模型：y = -1 + (2 + 1) *x = -1 + 3 * x

线性模型的nest（）后跟map（）

5 个答案: