我有一个数据框如下
record_id group enzyme1 enzyme2 ... ...
<factor> <dbl> <dbl> ... ...
1 control 34.5 32.3 ... ...
2 control 32.1 34.1 ... ...
3 treatment 123.1 12.1 ... ...
基本上是一个名为group
的分组变量,多个因变量,如enzyme1
等。
我可以运行t检验并将其包装成如下:
tidy(t.test(enzyme1 ~ group))
我希望基本上将所有t测试输出叠加在一起,看起来像这样
estimate statistic p.value parameter conf.low conf.high
enzyme 1 197.7424 0.3706244 0.7119 75.3982 -865.0291 1260.514
enzyme 2 XXX.XX X.xxx 0.XXXX XX.XXXX -XX.XXX XX.XXX
等等。
任何想法?
答案 0 :(得分:3)
我们可以利用purrr::map_df()
中的library(tidyverse)
,就像这样:
library(broom)
library(tidyverse) # purrr is in here
data(mtcars)
#reproducible data to simulate your case
mtcars2 <- filter(mtcars, cyl %in% c(4, 6))
mtcars2$cyl <- as.factor(mtcars2$cyl)
# capture the columns you want to t.test
cols_not_cyl <- names(mtcars2)[-2]
# turn those column names into formulas
formulas <- paste(cols_not_cyl, "~ cyl") %>%
map(as.formula) %>% # needs to be class formula
set_names(cols_not_cyl) # useful for map_df()
# do the tests, then stack them all together
map_df(formulas, ~ tidy(t.test(formula = ., data = mtcars2)),
.id = "column_id")
答案 1 :(得分:2)
通过使用map计算所有测试然后减少绑定它们:
df <- data.frame(record_id = 1:50, group = sample(c("control", "treatment"), 50, replace = TRUE),
enzyme1 = rnorm(50),
enzyme2 = rnorm(50))
library(broom)
library(dplyr)
library(purrr)
map(paste0("enzyme", 1:2), ~tidy(t.test(as.formula(paste0(.x, "~ group")),
data = df))) %>%
reduce(bind_rows)
答案 2 :(得分:2)
还可以尝试这样的 tidyverse 方法:
df %>%
summarise_at(vars(starts_with('enzyme')), funs(list(tidy(t.test(. ~ group))))) %>%
map(1) %>% bind_rows(.id='enzymes')
# enzymes estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
#1 enzyme1 -104.3 33.3 137.6 -7.168597 0.08610502 1.013697 -283.37000 74.77000 Welch Two Sample t-test two.sided
#2 enzyme2 19.6 33.2 13.6 11.204574 0.01532388 1.637394 10.22717 28.97283 Welch Two Sample t-test two.sided
数据:
df <- read.table(text = "record_id group enzyme1 enzyme2
1 control 34.5 32.3
2 control 32.1 34.1
3 treatment 123.1 12.1
4 treatment 152.1 15.1 ", header=T)
答案 3 :(得分:0)
您可以创建一个空的data.frame
,然后使用rbind()
将您的信息循环添加到其中。
以下是使用虹膜数据集的示例:
df=data.frame()
for(i in 1:(length(colnames(iris))-1)){ ##change your length to whatever colnames you have
variableName = colnames(iris)[i] ##loop through the desired colnames
df<-rbind(df,cbind(variableName, tidy(t.test(Petal.Width~Species,data=iris[1:99,]))))
}