通过t检验的循环提取值

时间:2016-05-19 00:27:28

标签: r

我试图迭代~60列,目标是执行按案例/控制状态加权的t检验,并将输出捕获为列表。这是我到目前为止的尝试 - 请注意,我的数据框称为生物标记,第3-59列代表我感兴趣的变量,由第2列加权(称为案例):

tests <- list()
column_biomarkers <- colnames(biomarkers[3:59])

for (i in column_biomarkers){
  tests[[i]] <-  t.test(biomarkers$i[case == 1],biomarkers$i[case == 0],pool.sd=FALSE,na.rm=TRUE)
  }

sapply(tests, function(x) {
  c(x$estimate[1],
    x$estimate[2],
    ci.lower = x$conf.int[1],
    ci.upper = x$conf.int[2],
    p.value = x$p.value)
})

但是,我这次尝试会导致以下错误:

  

var(x)中的错误:&#39; x&#39;是NULL

任何建议都将不胜感激!我是使用R的新手。

示例数据:

structure(list(subject = 1:10, case = c(1L, 0L, 0L, 1L, 1L, 0L, 
0L, 0L, 0L, 1L), biomarker_1 = c(308.29999, 2533.3, 2723.3, 3125.3, 
853, 6442.2998, 1472.5, 170.5, 64.5, 2624.8), biomarker_2 = c(4930.7998, 
2401, 5158.5, 6526, 3774.2, 5753, 1955.2, 1332.2, 1296.8, 5859.2998
), biomarker_3 = c(4810, 3279.5, 7929.5, 8353, 4074.2, 7940.5, 
1545.7, 2189.2, 1488.7, 6352.5)), .Names = c("subject", "case", 
"biomarker_1", "biomarker_2", "biomarker_3"), row.names = c(NA, 
10L), class = "data.frame")

2 个答案:

答案 0 :(得分:2)

考虑将数据框拆分为两个分组,并使用mapply()多变量应用函数在对象之间按元素运行操作)在列中运行t检验。

controldf <- df[df$case==1, 3:ncol(df)]
treatmentdf <- df[df$case==0, 3:ncol(df)]

tfct <- function(v1, v2){
             t.test(v1, v2, pool.sd=FALSE, na.rm=TRUE)
        }

ttests <- mapply(tfct, controldf, treatmentdf)
ttests

#             biomarker_1               biomarker_2              
# statistic   -0.4310577                2.287416                 
# parameter   7.943542                  7.987304                 
# p.value     0.677885                  0.05152236               
# conf.int    Numeric,2                 Numeric,2                
# estimate    Numeric,2                 Numeric,2                
# null.value  0                         0                        
# alternative "two.sided"               "two.sided"              
# method      "Welch Two Sample t-test" "Welch Two Sample t-test"
# data.name   "v1 and v2"               "v1 and v2"    
#      
#             biomarker_3              
# statistic   1.169058                 
# parameter   7.995322                 
# p.value     0.2760513                
# conf.int    Numeric,2                
# estimate    Numeric,2                
# null.value  0                        
# alternative "two.sided"              
# method      "Welch Two Sample t-test"
# data.name   "v1 and v2"     

甚至将结果迁移到数据框:

# Transposed data frame output of results
testdf <- data.frame(t(ttests))
head(testdf)

#              statistic parameter    p.value              conf.int
# biomarker_1 -0.4310577  7.943542   0.677885   -3219.767, 2206.667
# biomarker_2   2.287416  7.987304 0.05152236 -19.24659, 4598.82973
# biomarker_3   1.169058  7.995322  0.2760513   -1785.201, 5455.684
#                       estimate null.value alternative
# biomarker_1   1727.85, 2234.40          0   two.sided
# biomarker_2 5272.575, 2982.783          0   two.sided
# biomarker_3 5897.425, 4062.183          0   two.sided
#                              method data.name
# biomarker_1 Welch Two Sample t-test v1 and v2
# biomarker_2 Welch Two Sample t-test v1 and v2
# biomarker_3 Welch Two Sample t-test v1 and v2    

答案 1 :(得分:0)

这是另一种可能的解决方案。

tests <- list()
column_biomarkers <- colnames(biomarkers[3:5])
for (i in column_biomarkers){
  tests[[i]] <-t.test(biomarkers[[i]][biomarkers$case == 1],biomarkers[[i]][biomarkers$case == 0],pool.sd=FALSE,na.rm=TRUE)
}

R不喜欢生物标记$ i [biomarkers $ case == 1],R不接受我作为有效的列名,因此使用[[]]符号似乎有效。