对这个问题的不雅和示例表示道歉。我是一名医生,他在R中的深度编码非常出色,但我想要变得更好
我需要在R中的数据集上执行多个Wilcoxon测试。(我知道多重比较的危险;事实上,这用于从一组LME分析中继续使用,以便使用Hodges-Lehman估计)。
我的数据包含多个变量,在多个科目的多个时间点测量。我想有一种比较不同时间点的方法,为每次比较创建一个新的'htest'对象。
这是我的数据帧结构的MWE近似值:
example.data <- data.frame(
matrix(data=c(
'A',0,0,24,0,
'A',1,1,20,-1,
'A',2,2,18,-1.4,
'A',3,0.5,21,-0.6,
'B',0,0,22,0,
'B',1,1.2,19,-2.2,
'B',2,1.8,20,-3,
'B',3,0.3,21,-1,
'C',0,0,24,0,
'C',1,0.8,22,0.1,
'C',2,2.2,16,-0.6,
'C',3,1,23,-0.2,
'D',0,0,33,0,
'D',1,6,31,-0.4,
'D',2,6.3,27,-0.3,
'D',3,2.2,31,-0.1),
nrow=16,byrow=T))
colnames(example.data) <- c('Subject','Timepoint','Variable1','Variable2','Variable3')
example.data$Timepoint = factor(example.data$Timepoint,levels=c(0,1,2,3))
example.data[,3:5] = sapply(example.data[,3:5],as.numeric)
我能想到的最好方法是使用一个非常丑陋的for循环,看起来像这样:
## Step 2 - Multiple Wilcoxons
variablenames <- names(example.data)[-c(1,2)]
for (obj in variablenames[3:5]){
obj.wilcoxon.Timepoint1 <- toString(paste(obj,'.wilcoxon.Timepoint1',sep='')) # create 100percent object name
obj.wilcoxon.Timepoint2 <- toString(paste(obj,'.wilcoxon.timepoint2',sep='')) # create 100percent object name
obj.wilcoxon.Timepoint3 <- toString(paste(obj,'.wilcoxon.timepoint3',sep='')) # create 100percent object name
assign(eval(obj.wilcoxon.Timepoint1),wilcox.test(example.data[example.data$Timepoint==0,which(variablenames == obj)],example.data[example.data$Timepoint==1,which(variablenames == obj)],conf.int=T,paired=T))
assign(eval(obj.wilcoxon.Timepoint2),wilcox.test(example.data[example.data$Timepoint==0,which(variablenames == obj)],example.data[example.data$Timepoint==2,which(variablenames == obj)],conf.int=T,paired=T))
assign(eval(obj.wilcoxon.Timepoint3),wilcox.test(example.data[example.data$Timepoint==0,which(variablenames == obj)],example.data[example.data$Timepoint==3,which(variablenames == obj)],conf.int=T,paired=T))
}
我确信这是一种优雅的,矢量化的方式,但我该怎么办?
答案 0 :(得分:1)
首先:
example.data[,3:5] = sapply(example.data[,3:5],as.numeric)
应该是
example.data[,3:5] = apply(example.data[,3:5],2,as.numeric)
以下内容应该为您提供更紧凑的解决方案。
首先,加载这两个库。根据Roland的建议,reshape2
将数据转换为长格式,dplyr
是plyr
的更快版本。
library(reshape2)
library(dplyr)
将数据转换为所需格式
baseline = melt(example.data %.% filter(Timepoint==0) %.% select(-Timepoint),
"Subject", value.name = "base")
comparison = melt(example.data %.% filter(Timepoint!=0), c("Subject", "Timepoint"))
join.data = left_join(comparison, baseline)
您可以看到join.data
的样子:
> join.data
Subject Timepoint variable value base
1 A 1 Variable1 1.0 0
2 A 2 Variable1 2.0 0
3 A 3 Variable1 0.5 0
4 B 1 Variable1 1.2 0
5 B 2 Variable1 1.8 0
6 B 3 Variable1 0.3 0
7 C 1 Variable1 0.8 0
8 C 2 Variable1 2.2 0
9 C 3 Variable1 1.0 0
10 D 1 Variable1 6.0 0
11 D 2 Variable1 6.3 0
12 D 3 Variable1 2.2 0
13 A 1 Variable2 20.0 24
14 A 2 Variable2 18.0 24
15 A 3 Variable2 21.0 24
16 B 1 Variable2 19.0 22
17 B 2 Variable2 20.0 22
18 B 3 Variable2 21.0 22
19 C 1 Variable2 22.0 24
20 C 2 Variable2 16.0 24
21 C 3 Variable2 23.0 24
22 D 1 Variable2 31.0 33
23 D 2 Variable2 27.0 33
24 D 3 Variable2 31.0 33
25 A 1 Variable3 -1.0 0
26 A 2 Variable3 -1.4 0
27 A 3 Variable3 -0.6 0
28 B 1 Variable3 -2.2 0
29 B 2 Variable3 -3.0 0
30 B 3 Variable3 -1.0 0
31 C 1 Variable3 0.1 0
32 C 2 Variable3 -0.6 0
33 C 3 Variable3 -0.2 0
34 D 1 Variable3 -0.4 0
35 D 2 Variable3 -0.3 0
36 D 3 Variable3 -0.1 0
最后,主菜
res = join.data %.% group_by(variable) %.% do(
function(df) {
df %.% group_by(Timepoint) %.% do (
function(d) wilcox.test(d$base, d$value, conf.int=TRUE, paired=TRUE)
)
})
res
是一个列表清单:res[[i]][[t]]
是变量i
在时间点't
例如,res[[1]][[2]]
是变量1在时间点2的结果。
或者,您可以执行传统的split
res = lapply(split(join.data, join.data$variable),
function(df){
lapply(split(df, df$Timepoint), function(d){
wilcox.test(d$base, d$value, conf.int= TRUE, paired=TRUE)
})
})
答案 1 :(得分:0)
由于wilcox.test
没有矢量化,因此您无法在没有循环的情况下执行此操作。但是,您仍然可以比使用assign
和eval
做得更好。这更像是R-ish:
library(reshape2)
#long format is better:
example.data <- melt(example.data, id.vars=c("Subject", "Timepoint"))
library(plyr)
#split-apply-combine
res <- dlply(example.data, .(Subject),
function(df) lapply(unique(df[df$Timepoint!="0", "Timepoint"]),
function(i, DF) {
wilcox.test(DF[DF$Timepoint=="0", "value"],
DF[DF$Timepoint==i, "value"],
conf.int=FALSE, paired=TRUE)
}, DF=df))
请注意,我设置了conf.int=FALSE
以避免wilcox.test
出现的错误,这可能是由于数据有限造成的。
您可以使用以下方式访问主题B的第二个测试:
res[["B"]][[2]]