对多个摘要统计信息使用prop_test

时间:2020-10-22 00:01:52

标签: r tidyverse

我有一个这样的数据框:

education = c(1, 1, 0, 0, 0,0) 
college = c(1, 0, 1, 1,1,0) 
income = c(55, 55, 12, 15, 90, 230) 
age = c(1, 0, 1, 1,1,0) 
female = c(1, 1, 1, 0,1,0) 
group = c(0, 0, 0, 1,1,1)
df = data.frame(group, female, age, education, income, college)

我想做的是得到一个表,其中包含组1和组0的每个变量(收入,大学,教育程度,女性,年龄)的平均值。然后,我想获得prop测试的p值,这意味着对于每个收入,年龄,女性,组1和组0都相等。Example final output here

我想我能做的是:

balance_stats <- df %>%
  group_by(as.factor(group)) %>% 
  summarise(across(c("income", "education",  "age", "female",~mean(.x, na.rm = TRUE)))

total_stats <- df %>% 
  summarise(across(c("income", "education",  "age", "female",~mean(.x, na.rm = TRUE)))

然后取出balance_stats$group0balance_stats$group1并进行prop.test( x=c(balance_stats$group0mbalance_stats$group1), n=total_stats)

但是它没有按预期运行。请提供任何帮助

1 个答案:

答案 0 :(得分:0)

尝试一下?

library(dplyr)
library(tidyr)

df %>%
  arrange(group) %>%
  summarise(across(.cols = c("female", "age", "education", "income", "college"),
                   .fns = list(mean0 = ~mean(.x[seq(1, sum(group == 0))]),
                               mean1 = ~mean(.x[seq(1 + sum(group == 0), n())]),
                               p = ~t.test(x = .x[seq(1, sum(group == 0))],
                                           y = .x[seq(1 + sum(group == 0), n())])[["p.value"]]),
                   .names = "{col}_{fn}")) %>%
  pivot_longer(cols = everything(),
               names_to = c("variable", ".value"),
               names_pattern = "(.+)_(.+)")

# A tibble: 5 x 4
  variable   mean0   mean1     p
  <chr>      <dbl>   <dbl> <dbl>
1 female     1       0.333 0.184
2 age        0.667   0.667 1    
3 education  0.667   0     0.184
4 income    40.7   112.    0.377
5 college    0.667   0.667 1   

p.s。您在问题中提到了prop.test,但随附的示例指出了t.test。我在这里使用后者,因为数据集中的一个变量(收入)不是二进制的,所以我不确定在这种情况下如何解释prop.test的比例(成功的概率) 。如果您的实际用例不同,则可以相应地更改代码。