我有一个数据(delisle)和以下代码,我的大数据矩阵运行需要几天时间。什么是ddply的替代品?有人请指导或帮忙吗?
TYPE SAMPLE probeA probeB probeC
CatA 52 1.2 3.2 3.4
CatA 52 2.2 4.2 3.4
CatA 58 1.5 6.5 7.8
CatA 58 8.3 6.5 9.5
CatA 94 1.5 4.3 6.4
CatB 52 2.2 2.2 3.4
CatB 58 2.5 4.5 6.8
CatB 58 6.2 6.0 5.3
CatB 94 2.5 5.3 6.4
我为每个探针计算每个" SAMPLE",使用ddply计算catA和catB之间的倍数变化。
输出应为:
SAMPLE probe FC
52 probeA mean(CatA)/mean(CatB)
52 probeB mean(CatA)/mean(CatB)
58 probeA mean(CatA)/mean(CatB)
58 probeB mean(CatA)/mean(CatB)
对于大数据(20K行和5K列),我的代码 EXTREMELY SLOW:
probenames <- as.vector(colnames(delisle))
for (i in 3:ncol(delisle))
{
probe = probenames[i]
Stats <- function(gs) {
typeA.sub <- gs[which(gs$TYPE=="CatA"),]
typeB.sub <- gs[which(gs$TYPE=="CatB"),]
fc.AB = mean(typeA.sub[,i])/mean(typeB.sub[,i])
fc.AC =
fc.BC =
data.frame(probe,fc.AB, fc.AC, fc.BC)
}
output <- ddply(.data=delisle, .variables="SAMPLE", .progress=progress_text(style=3), Stats)
write.table(output,"SAMPLETYPE.txt",quote=F,sep="\t",append=T,col.names=F)
}
答案 0 :(得分:0)
这是否能以快速的方式为您提供预期的结果?
library(tidyverse)
d %>%
select(-probeC) %>%
gather(key, value, -TYPE, -SAMPLE) %>%
group_by(SAMPLE, key, TYPE) %>%
summarise(a = mean(value)) %>%
spread(TYPE, a) %>%
mutate(res = CatA/CatB)
Source: local data frame [6 x 5]
Groups: SAMPLE, key [6]
SAMPLE key CatA CatB res
<int> <chr> <dbl> <dbl> <dbl>
1 52 probeA 1.7 2.20 0.7727273
2 52 probeB 3.7 2.20 1.6818182
3 58 probeA 4.9 4.35 1.1264368
4 58 probeB 6.5 5.25 1.2380952
5 94 probeA 1.5 2.50 0.6000000
6 94 probeB 4.3 5.30 0.8113208