我有两个数据帧,我想在匹配的列上进行t.test。两个数据帧都是大数据帧的子集,因此所有的同名都是相同的并匹配(ncol = ~20000)和nrow(df1)= 25和nrow(df2)= 23。
示例:
treatment<-matrix(rnorm(50), ncol=10)
control<-matrix(rnorm(50), ncol=10)
treatment
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.23442246 1.02256703 1.0499998 0.2913643 -1.2083822 0.3778403
[2,] -0.68888047 -0.03961717 -0.9978793 -0.9792061 -0.1831634 0.6140542
[3,] -1.88273887 -0.49701513 0.1845197 0.4385338 1.2249121 0.5444027
[4,] 1.21359446 0.87333933 0.5615304 0.3803339 1.1294489 -0.8777454
[5,] -0.02908159 -1.50296138 0.4624656 0.1335046 1.1665818 -0.4475185
[,7] [,8] [,9] [,10]
[1,] 0.5987723 0.5910937 0.4334874 -1.4198250
[2,] 0.2027346 0.8078187 -1.0573069 1.0727554
[3,] 0.5490159 0.5109912 1.7247428 1.7745333
[4,] 0.3044544 0.6476548 1.1959365 -0.1220841
[5,] 1.8681375 0.8451147 0.4283893 0.1044125
control
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.6712834 -0.3775649 0.7741285 0.51224345 0.24128336 1.02580198
[2,] 0.3894112 -0.1835289 0.4982122 1.73512459 0.08991013 -0.04406897
[3,] 1.7068503 0.7909355 -0.3341426 0.08780239 -1.11563321 2.09984105
[4,] -0.7634818 -1.3672888 0.2161816 -0.65170516 0.81247509 1.68008404
[5,] 0.5787616 0.1704100 -0.3166737 0.90167409 -2.34854292 0.31571255
[,7] [,8] [,9] [,10]
[1,] -1.6111883 0.1019497 -0.1975491 -0.3776000
[2,] 0.7533329 1.1540590 1.0050663 2.0137347
[3,] 1.2224161 1.4411853 -0.4801494 -0.3891034
[4,] 0.1905461 0.9767801 -0.1442578 -0.9946735
[5,] -1.9581454 -0.2874181 -1.0421440 -0.6177782
我做了一些关于SO的搜索并遇到了mapply():
mapply(t.test,treatment,control)
Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
not enough 'x' observations
但是当我在单列上进行测试时:
t.test(treatment[,1],control[,1])
Welch Two Sample t-test
data: treatment[, 1] and control[, 1]
t = -1.1541, df = 7.492, p-value = 0.284
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.2577187 0.7635152
sample estimates:
mean of x mean of y
-0.2305368 0.5165649
这里有什么问题?
答案 0 :(得分:2)
treatment
和control
,作为matrix
个对象,基本上是vector
(如c(1,2,3)
),因此mapply
尝试运行t.test
比较每个个体的数字。 E.g:
treatment[1]
#[1] 0.7545039
control[1]
#[1] -0.3926361
t.test(treatment[1],control[1])
#Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
# not enough 'x' observations
如果您将矩阵转换为data.frame
个对象,则每列将被视为单个对象,mapply
将正常工作:
mapply(t.test,as.data.frame(treatment),as.data.frame(control))
# V1
#statistic -0.7829546
#parameter 7.698139
#p.value 0.4570611
#etc etc
在这种情况下,我几乎肯定使用Map
更适合于可读性:
Map(t.test,as.data.frame(treatment),as.data.frame(control))
#$V1
#
# Welch Two Sample t-test
#
#data: dots[[1L]][[1L]] and dots[[2L]][[1L]]
#t = -0.783, df = 7.698, p-value = 0.4571
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -1.525349 0.756036
#sample estimates:
# mean of x mean of y
#-0.31246928 0.07218723