Question

我有一个这样的数据框：

education = c(1, 1, 0, 0, 0,0) 
college = c(1, 0, 1, 1,1,0) 
income = c(55, 55, 12, 15, 90, 230) 
age = c(1, 0, 1, 1,1,0) 
female = c(1, 1, 1, 0,1,0) 
group = c(0, 0, 0, 1,1,1)
df = data.frame(group, female, age, education, income, college)

我想做的是得到一个表，其中包含组1和组0的每个变量（收入，大学，教育程度，女性，年龄）的平均值。然后，我想获得prop测试的p值，这意味着对于每个收入，年龄，女性，组1和组0都相等。

我想我能做的是：

balance_stats <- df %>%
  group_by(as.factor(group)) %>% 
  summarise(across(c("income", "education",  "age", "female",~mean(.x, na.rm = TRUE)))

total_stats <- df %>% 
  summarise(across(c("income", "education",  "age", "female",~mean(.x, na.rm = TRUE)))

然后取出balance_stats$group0和balance_stats$group1并进行prop.test( x=c(balance_stats$group0mbalance_stats$group1), n=total_stats)

但是它没有按预期运行。请提供任何帮助

Answer 1

尝试一下？

library(dplyr)
library(tidyr)

df %>%
  arrange(group) %>%
  summarise(across(.cols = c("female", "age", "education", "income", "college"),
                   .fns = list(mean0 = ~mean(.x[seq(1, sum(group == 0))]),
                               mean1 = ~mean(.x[seq(1 + sum(group == 0), n())]),
                               p = ~t.test(x = .x[seq(1, sum(group == 0))],
                                           y = .x[seq(1 + sum(group == 0), n())])[["p.value"]]),
                   .names = "{col}_{fn}")) %>%
  pivot_longer(cols = everything(),
               names_to = c("variable", ".value"),
               names_pattern = "(.+)_(.+)")

# A tibble: 5 x 4
  variable   mean0   mean1     p
  <chr>      <dbl>   <dbl> <dbl>
1 female     1       0.333 0.184
2 age        0.667   0.667 1    
3 education  0.667   0     0.184
4 income    40.7   112.    0.377
5 college    0.667   0.667 1

p.s。您在问题中提到了prop.test，但随附的示例指出了t.test。我在这里使用后者，因为数据集中的一个变量（收入）不是二进制的，所以我不确定在这种情况下如何解释prop.test的比例（成功的概率）。如果您的实际用例不同，则可以相应地更改代码。

对多个摘要统计信息使用prop_test

1 个答案: