performing t-test on dplyr group_by columns

时间:2015-06-30 13:23:04

标签: r dataframe dplyr

In extension to this question, How can I incorporate t.test for each column(con1,con2) there based on the resulting two groups(A,B) in the new data frame i.e., t.test(df$con1[df$cat1=='A'],df$con1[df$cat1=='B']) and t.test(df$con2[df$cat1=='A'],df$con2[df$cat1=='B'])

# Random generation of values for categorical data
set.seed(33)
df <- data.frame(cat1 = sample( LETTERS[1:2], 100, replace=TRUE ), 
                cat2 = sample( LETTERS[3:5], 100, replace=TRUE ),
                cat3 = sample( LETTERS[2:4], 100, replace=TRUE ),
                con1 = runif(100,0,100),
                con2 = runif(100,23,45))

# Introducing null values 
df$con1[c(23,53,92)] <- NA
df$con2[c(33,46)] <- NA

# List of functions 
df %>% group_by(cat1) %>% 
 summarise_each(funs(mean(., na.rm = TRUE),
                     sd(., na.rm = TRUE)), 
                starts_with("con"))

And in this case with groups corresponding to A,B i.e., t.test(df$con1[df$cat1=='A' & df$cat2=='C'],df$con1[df$cat1=='B' & df$cat2=='C']), ... t.test(df$con2[df$cat1=='A' & df$cat2=='E'],df$con2[df$cat1=='B' & df$cat2=='E'])

df %>% group_by(cat1, cat2) %>% 
     summarise_each(funs(mean(., na.rm = TRUE),
                         sd(., na.rm = TRUE)), 
                    starts_with("con"))

1 个答案:

答案 0 :(得分:1)

我遇到了同样的问题。我想出的最佳解决方案是将用于区分t检验的2个样本的变量分配给列表:

groups<-c(a,b)

接下来,您可以使用lapply和deployer:

t_test_summary <- lapply(groups, function(x){

t.test(filter(df, col_a== con1 & col_b ==x) %>% select(col_wanted),filter(df, col_a== con2 & col_b ==x) %>% select(col_wanted))

}

我还没有完全根据你的例子进行调整,但它应该让你非常接近。在我的问题中,我需要对多个时间段(加载到列表中并插入到lapply中以过滤col_b)对2个不同的样本(由col_a区分)运行t检验。听起来几乎和你的问题完全一样。