自动化数据帧R中子集之间的成对比较

时间:2014-02-14 04:00:09

标签: r dataframe subset plyr

我有一个data.frame有几个变量X1,X2 ......和一个分组变量“site”我想找到X1与site == 1比例大于X1的比例= site == 2,我可以做具有固定数量的站点级别和每次变量的每个变量,但我想概括任意数量的级别和几个变量,以下是一个示例:

# Generate data
set.seed(20130226)

n <- 100
x1 <- matrix(c(rnorm(n, mean = 2),rnorm(n, mean = 5)),ncol=2)
x2 <- matrix(c(rnorm(n, mean = 1), rnorm(n, mean = 4)),ncol=2)
x3 <- matrix(c(rnorm(n, mean = 3), rnorm(n, mean = 3)),ncol=2)
xx <- data.frame(x1,site=1)
xx <- rbind(xx, data.frame(x2,site=2))
xx <- rbind(xx, data.frame(x3,site=3))

# comparisons

s <- unique(xx$site)
me1 <- with(xx,xx[site==s[1],])
me2<- with(xx,xx[site==s[2],])
me3<- with(xx,xx[site==s[3],])

Pg1.gt.g2 <- sum(me1[,c("X1")]>me2[,c("X1")])/nrow(me1)
Pg1.gt.g3 <- sum(me1[,c("X1")]>me3[,c("X1")])/nrow(me1)
Pg2.gt.g3 <- sum(me2[,c("X1")]>me3[,c("X1")])/nrow(me1)

# build table
comp1 <- data.frame(Group=c(paste(s[1],">",s[2]),paste(s[1],">",s[3]),paste(s[2],">",s[3])),  P=c(Pg1.gt.g2, Pg1.gt.g3,Pg2.gt.g3))

print(comp1)

我不知道如何为不同数量的组和几个变量执行此操作,可能使用plyr

谢谢!

1 个答案:

答案 0 :(得分:1)

我会将数据重新整形为矩阵,其中每列代表一个组:

# Unique sites
s <- unique(xx$site)

# Columns are each group, data are X1 values
mat <- do.call(cbind, lapply(split(xx, xx$site), function(x) x$X1))

# Compare all pairs of sites
do.call(rbind, apply(combn(seq_along(s), 2), 2,
                     function(x) data.frame(g1=s[x[1]], g2=s[x[2]],
                                            prop=sum(mat[,x[1]] > mat[,x[2]])/nrow(mat))))

#   g1 g2 prop
# 1  1  2 0.83
# 2  1  3 0.20
# 3  2  3 0.09