如何对群体中的子群应用回归?

时间:2017-11-04 16:34:14

标签: r linear-regression

假设我有以下数据框

weight <- c(100, 137, 158, 225, 149)
age <- c(15, 18, 21, 31, 65)
gender <- c("Female, "Male, "Male", "Male", "Female")
table <- data.frame(weight, age, gender)

如果我想进行线性回归以了解体重如何预测年龄,以及检查它,我会这样做:

allData <- lm(age ~ weight, data = table)
summary(allData)

如果我想检查体重如何仅为女性预测年龄,我该怎么办?如同,仅使用女性数据群来查看体重如何预测年龄?我想的是:

FemaleData <- lm(age ~ weight, data=table (gender="Female"))

1 个答案:

答案 0 :(得分:0)

library(dplyr)
library(broom)

# example dataset
weight <- c(100, 137, 158, 225, 149, 148)
age <- c(15, 18, 21, 31, 65, 64)
gender <- c("Female", "Male", "Male", "Male", "Female", "Female")
table <- data.frame(weight, age, gender)

# build model for each gender value and store it in a column
table %>%
  group_by(gender) %>%                                  # for each gender value
  do(model = summary(lm(age ~ weight, data = .))) %>%   # build a model
  ungroup() -> tbl_models

# check how your new dataset looks like
tbl_models

# # A tibble: 2 x 2
#     gender            model
#   * <fctr>           <list>
#   1 Female <S3: summary.lm>
#   2   Male <S3: summary.lm>

# access / view model for Females
tbl_models %>% filter(gender == "Female") %>% pull(model)

# [[1]]
# 
# Call:
#   lm(formula = age ~ weight, data = .)
# 
# Residuals:
#   1          2          3 
# -0.0002125 -0.0101997  0.0104122 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)    
#   (Intercept) -8.706e+01  4.943e-02   -1761 0.000361 ***
#   weight       1.021e+00  3.681e-04    2773 0.000230 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.01458 on 1 degrees of freedom
# Multiple R-squared:      1,   Adjusted R-squared:      1 
# F-statistic: 7.69e+06 on 1 and 1 DF,  p-value: 0.0002296

# build model for each gender value and store it as a tidy dataset
table %>%
  group_by(gender) %>%
  do(tidy(lm(age ~ weight, data = .))) %>%
  ungroup()

# # A tibble: 4 x 6
#   gender        term    estimate    std.error   statistic      p.value
#   <fctr>       <chr>       <dbl>        <dbl>       <dbl>        <dbl>
# 1 Female (Intercept) -87.0609860 0.0494272875 -1761.39518 0.0003614292
# 2 Female      weight   1.0206120 0.0003680516  2773.01334 0.0002295769
# 3   Male (Intercept)  -2.3370680 0.2181313917   -10.71404 0.0592475719
# 4   Male      weight   0.1480985 0.0012299556   120.40961 0.0052869963