我有一些这样的数据
group_name | x | y
------------------
a | 1 | 2
a | 2 | 4
a | 3 | 6
b | 1 | 4
b | 2 | 3
b | 3 | 2
c | 1 | 2
c | 2 | 5
c | 3 | 8
我想按group_name对其进行分组,并使用Dplyr的summary函数为每个组创建一个包含线性模型lm(y〜x)的列。可能吗如果不是,那么为每个组创建模型的替代方法是什么?
提前谢谢
答案 0 :(得分:2)
适应https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html中的示例:
library(tidyverse); library(broom)
df %>%
nest(-group_name) %>%
mutate(fit = map(data, ~lm(y ~ x, data = .x)),
tidied = map(fit, tidy)) %>%
unnest(tidied)
group_name term estimate std.error statistic p.value
1 a (Intercept) 0 0.000000e+00 NaN NaN
2 a x 2 0.000000e+00 Inf 0.000000e+00
3 b (Intercept) 5 1.017536e-15 4.913830e+15 1.295567e-16
4 b x -1 4.710277e-16 -2.123017e+15 2.998656e-16
5 c (Intercept) -1 1.356715e-15 -7.370745e+14 8.637116e-16
6 c x 3 6.280370e-16 4.776789e+15 1.332736e-16
编辑:获得预测的一种方法是使用augment
中的broom
:
library(tidyverse); library(broom)
df %>%
nest(-group_name) %>%
mutate(fit = map(data, ~lm(y ~ x, data = .x)),
predictions = map(fit, augment)) %>%
unnest(predictions)
group_name y x .fitted .se.fit .resid .hat .sigma .rownames .cooksd .std.resid
1 a 2 1 2 0.000000e+00 0.000000e+00 0.8333333 NaN <NA> NA NA
2 a 4 2 4 0.000000e+00 0.000000e+00 0.3333333 NaN <NA> NA NA
3 a 6 3 6 0.000000e+00 0.000000e+00 0.8333333 NaN <NA> NA NA
4 b 4 1 4 6.080942e-16 2.719480e-16 0.8333333 NaN 4 2.50 1
5 b 3 2 3 3.845925e-16 -5.438960e-16 0.3333333 NaN 5 0.25 -1
6 b 2 3 2 6.080942e-16 2.719480e-16 0.8333333 Inf 6 2.50 1
7 c 2 1 2 8.107923e-16 -3.625973e-16 0.8333333 NaN 7 2.50 -1
8 c 5 2 5 5.127900e-16 7.251946e-16 0.3333333 NaN 8 0.25 1
9 c 8 3 8 8.107923e-16 -3.625973e-16 0.8333333 Inf 9 2.50 -1
答案 1 :(得分:0)
这是一种方法。
我不得不稍微更改一下您的测试数据,因为我认为存在完美的共线性问题。
df <- data.frame(stringsAsFactors=FALSE,
group.name = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
x = c(1, 2, 3.5, 1, 2.5, 3, 1, 2, 3.5),
y = c(2, 4, 6, 4, 3, 2, 2, 5, 8)
)
library(dplyr)
groups <- unique(df$group.name)
groups
for (i in groups){
df_subgroup <- filter(df, group.name==i)
print(paste("group", i))
model <- lm(y ~ x, data = df_subgroup)
print(summary(model))
}
这就是您得到的。我使用stargazer软件包使输出更易于阅读,但是您可以只使用summary(model)
#> [1] "group a"
#>
#> ===============================================
#> Dependent variable:
#> ---------------------------
#> y
#> -----------------------------------------------
#> x 1.579*
#> (0.182)
#>
#> Constant 0.579
#> (0.437)
#>
#> -----------------------------------------------
#> Observations 3
#> R2 0.987
#> Adjusted R2 0.974
#> Residual Std. Error 0.324 (df = 1)
#> F Statistic 75.000* (df = 1; 1)
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
#> [1] "group b"
#>
#> ===============================================
#> Dependent variable:
#> ---------------------------
#> y
#> -----------------------------------------------
#> x -0.923
#> (0.266)
#>
#> Constant 5.000*
#> (0.620)
#>
#> -----------------------------------------------
#> Observations 3
#> R2 0.923
#> Adjusted R2 0.846
#> Residual Std. Error 0.392 (df = 1)
#> F Statistic 12.000 (df = 1; 1)
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
#> [1] "group c"
#>
#> ===============================================
#> Dependent variable:
#> ---------------------------
#> y
#> -----------------------------------------------
#> x 2.368*
#> (0.273)
#>
#> Constant -0.132
#> (0.656)
#>
#> -----------------------------------------------
#> Observations 3
#> R2 0.987
#> Adjusted R2 0.974
#> Residual Std. Error 0.487 (df = 1)
#> F Statistic 75.000* (df = 1; 1)
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01