我有一个这样的数据框:
education = c(1, 1, 0, 0, 0,0)
college = c(1, 0, 1, 1,1,0)
income = c(55, 55, 12, 15, 90, 230)
age = c(1, 0, 1, 1,1,0)
female = c(1, 1, 1, 0,1,0)
group = c(0, 0, 0, 1,1,1)
df = data.frame(group, female, age, education, income, college)
我想做的是得到一个表,其中包含组1和组0的每个变量(收入,大学,教育程度,女性,年龄)的平均值。然后,我想获得prop测试的p值,这意味着对于每个收入,年龄,女性,组1和组0都相等。
我想我能做的是:
balance_stats <- df %>%
group_by(as.factor(group)) %>%
summarise(across(c("income", "education", "age", "female",~mean(.x, na.rm = TRUE)))
total_stats <- df %>%
summarise(across(c("income", "education", "age", "female",~mean(.x, na.rm = TRUE)))
然后取出balance_stats$group0
和balance_stats$group1
并进行prop.test( x=c(balance_stats$group0mbalance_stats$group1), n=total_stats)
但是它没有按预期运行。请提供任何帮助
答案 0 :(得分:0)
尝试一下?
library(dplyr)
library(tidyr)
df %>%
arrange(group) %>%
summarise(across(.cols = c("female", "age", "education", "income", "college"),
.fns = list(mean0 = ~mean(.x[seq(1, sum(group == 0))]),
mean1 = ~mean(.x[seq(1 + sum(group == 0), n())]),
p = ~t.test(x = .x[seq(1, sum(group == 0))],
y = .x[seq(1 + sum(group == 0), n())])[["p.value"]]),
.names = "{col}_{fn}")) %>%
pivot_longer(cols = everything(),
names_to = c("variable", ".value"),
names_pattern = "(.+)_(.+)")
# A tibble: 5 x 4
variable mean0 mean1 p
<chr> <dbl> <dbl> <dbl>
1 female 1 0.333 0.184
2 age 0.667 0.667 1
3 education 0.667 0 0.184
4 income 40.7 112. 0.377
5 college 0.667 0.667 1
p.s。您在问题中提到了prop.test
,但随附的示例指出了t.test
。我在这里使用后者,因为数据集中的一个变量(收入)不是二进制的,所以我不确定在这种情况下如何解释prop.test
的比例(成功的概率) 。如果您的实际用例不同,则可以相应地更改代码。